Comment on page


Run CLIP on frames in a video.
CLIP is a zero-shot classification model that you can use to:
  1. 1.
    Classify images;
  2. 2.
    Cluster images;
  3. 3.
    Compare the similarity between a text prompt and an image;
  4. 4.
    Compare the similarity between two images, and more.
The Roboflow Video Inference API can return raw CLIP embeddings for the frames in your video (in either 512 or 768 dimensions, depending on the model you select) or compare text or image vectors and return a cosine similarity score for each frame.

Use CLIP with the Video Inference API

Use a Fine-Tuned Model with the Video Inference API

First, install the Roboflow Python package:
pip install roboflow
Next, create a new Python file and add the following code:
from roboflow import Roboflow, CLIPModel
rf = Roboflow(api_key="API_KEY")
model = CLIPModel()
job_id, signed_url, expire_time = model.predict_video(
results = model.poll_until_video_results(job_id)
Above, replace:
  • API_KEY: with your Roboflow API key
  • PROJECT_NAME: with your Roboflow project ID.
  • MODEL_ID: with your Roboflow model ID.
Last modified 23d ago