Foundation Models
Foundation models are large, pre-trained models that can be used on their own, or as part of a vision workflow, to solve a computer vision problem.
You can use Roboflow cloud APIs to run the following models:
YOLO-World
YOLO-World is a zero-shot object detection model that allows you to perform object detection without any training, just by describing the items you want to detect.
CLIP
CLIP understands images and text together, allowing it to associate them in a semantically meaningful way by being trained on a vast amount of internet text and images, built by OpenAI. Available through the Roboflow API and on-device using Roboflow Inference. View an example application that uses the on-device CLIP API.
OCR
Use DocTR to turn words and text within images into machine-readable text.
You can also deploy these models on your own hardware with Roboflow Inference.
Last updated