Deploy a Model or Workflow
Learn how to deploy models trained on or uploaded to Roboflow.
Last updated
Was this helpful?
Learn how to deploy models trained on or uploaded to Roboflow.
Last updated
Was this helpful?
You can deploy any model trained on or uploaded to Roboflow, and any Workflow, with Roboflow's deployment offerings.
Our deployment offerings fit into two categories:
: These options leverage Roboflow's cloud infrastructure to run your models, eliminating the need for you to manage your own hardware or software.
: These options allow you to deploy models on your own hardware, providing greater control over your environment and resources.
The following table summarizes the key features, benefits, and limitations of each deployment option:
Run workflows and models directly on Roboflow's infrastructure through an infinitely-scalable API.
Scalable, easy to use, no infrastructure management.
Limited control over resources, potential for higher latency for demanding applications.
Dedicated GPUs and CPUs for running workflows and models.
Support for GPU models, Video Streaming, Custom Python Blocks.
Limited to US-based data centers. Not autoscaling like Serverless API
Batch Processing
Managed pool of servers processing your images and videos with selected workflow.
Fully managed solution offering high data throughput and cost efficiency, seamlessly scalable to your data volume with GPU support.
Non-real-time processing and no support for Custom Python Blocks.
Self-Hosted Deployments
Run Inference on your own hardware.
Full control over resources and environment, potential for lower latency.
Requires infrastructure management and expertise.
Serverless Hosted API V2
Run workflows and models directly on Roboflow's infrastructure on GPU hardware.
Support for GPU models.
Limited control over resources, potential for higher latency for demanding applications or during periods of high load
The best deployment option for you depends on your specific needs and requirements. Consider the following factors when making your decision:
Scalability: If your application needs to handle varying levels of traffic or data volume, the serverless API offers excellent scalability for real-time use-cases - otherwise Batch Processing is suggested option.
Latency: If you need low latency or video processing, dedicated deployments or self-hosted deployments with powerful hardware might be the best choice.
GPUs: If you need to run models that require a GPU (e.g. SAM2, CogVML, etc) you need to use a Dedicated Deployment with GPU machine type or self hosted on hardware that has GPUs available. (Serverless GPU API coming soon)
Control: Self-hosted deployments provide the most control over your environment and resources.
Expertise: Self-hosted deployments require more technical expertise to set up and manage.
There is great guide on how to choose the best deployment method for your use case in the inference getting started guide at: