Foundation Models
Foundation models are large, pre-trained models that can be used on their own, or as part of a vision project, to solve a computer vision problem.
You can deploy the following foundation models on your own hardware with Inference:
Gaze (LC2S-Net): Detect the direction in which someone is looking.
CLIP: Classify images and compare the similarity of images and text.
DocTR: Read characters in images.
Grounding DINO: Detect objects in images using text prompts.
Segment Anything (SAM): Segment objects in images.
CogVLM: A large multimodal model (LMM).
To learn how to deploy these foundation models, refer to the Roboflow Inference documentation.
Last updated