Comment on page
Run CLIP on frames in a video.
CLIP is a zero-shot classification model that you can use to:
- 1.Classify images;
- 2.Cluster images;
- 3.Compare the similarity between a text prompt and an image;
- 4.Compare the similarity between two images, and more.
The Roboflow Video Inference API can return raw CLIP embeddings for the frames in your video (in either 512 or 768 dimensions, depending on the model you select) or compare text or image vectors and return a cosine similarity score for each frame.
First, install the Roboflow Python package:
pip install roboflow
Next, create a new Python file and add the following code:
from roboflow import Roboflow, CLIPModel
rf = Roboflow(api_key="API_KEY")
model = CLIPModel()
job_id, signed_url, expire_time = model.predict_video(
results = model.poll_until_video_results(job_id)
API_KEY: with your Roboflow API key
PROJECT_NAME: with your Roboflow project ID.
MODEL_ID: with your Roboflow model ID.
Last modified 23d ago