Use CLIP
Run CLIP on frames in a video.
CLIP is a zero-shot classification model that you can use to:
Classify images;
Cluster images;
Compare the similarity between a text prompt and an image;
Compare the similarity between two images, and more.
The Roboflow Video Inference API can return raw CLIP embeddings for the frames in your video (in either 512 or 768 dimensions, depending on the model you select) or compare text or image vectors and return a cosine similarity score for each frame.
Use CLIP with the Video Inference API
First, install the Roboflow Python package:
Next, create a new Python file and add the following code:
Above, replace:
API_KEY
: with your Roboflow API keyPROJECT_NAME
: with your Roboflow project ID.MODEL_ID
: with your Roboflow model ID.
Last updated