Foundation Models

Foundation models are large, pre-trained models that can be used on their own, or as part of a vision workflow, to solve a computer vision problem.

You can use Roboflow cloud APIs to run the following models:

YOLO-World

YOLO-World is a zero-shot object detection model that allows you to perform object detection without any training, just by describing the items you want to detect.

CLIP

CLIP understands images and text together, allowing it to associate them in a semantically meaningful way by being trained on a vast amount of internet text and images, built by OpenAI. Available through the Roboflow API and on-device using Roboflow Inference.

OCR

Use DocTR to turn words and text within images into machine-readable text.

You can also deploy these models on your own hardware with Roboflow Inference.

PreviousKeypoint Detection NextCLIP

Last updated 3 months ago

Was this helpful?