Roboflow Docs
DashboardForum
  • Build Vision Models with Roboflow
  • Quickstart
  • Roboflow Enterprise
  • Workspaces
    • Create a Workspace
    • Delete a Workspace
    • Add Team Members
    • Role-Based Access Control
  • Usage Based Pricing
  • Workflows
    • Create a Workflow
    • Build a Workflow
    • Test a Workflow
    • Deploy a Workflow
    • Workflow Examples
      • Multimodal Model Workflow
    • Share a Workflow
      • Workflow Sharing Configuration
    • Advance Workflow Topics
      • JSON Editor
  • Datasets
    • Create a Project
    • Upload Data
      • Import Data from Cloud Providers
        • AWS S3 Bucket
        • Azure Blob Storage
        • Google Cloud Storage
      • Upload Video
      • Import from Roboflow Universe
    • Manage Batches
    • Search a Dataset
    • Create a Dataset Version
    • Preprocess Images
    • Create Augmented Images
    • Add Tags to Images
    • Manage Classes
    • Edit Keypoint Skeletons
    • Create an Annotation Attribute
    • Export Versions
    • Dataset Analytics
    • Merge Projects
    • Delete an Image
    • Delete a Version
    • Delete a Project
    • Project Folders
  • Annotate
    • Annotation Tools
    • Use Roboflow Annotate
      • Annotate Keypoints
      • Label Assist (AI Labeling)
      • Enhanced Smart Polygon with SAM (AI Labeling)
      • Smart Polygon (AI Labeling)
      • Keyboard Shortcuts
      • Comment on an Image
      • Annotation History
      • Similarity Search
      • Box Prompting (AI Labeling)
    • Automated Annotation with Auto Label
    • Collaborate on Annotations
    • Annotation Insights
    • Labeling Best Practices
  • Train
    • Train a Model in Roboflow
      • Train from Scratch
      • Train from a Universe Checkpoint
      • Python Package
      • Roboflow Notebooks (GitHub)
    • Train from Azure Vision
    • Train from Google Cloud
    • View Training Results
    • Evaluate Trained Models
    • Custom Training Notebooks
  • Deploy
    • Deployment Overview
      • Roboflow Managed Deployments Overview
    • Serverless Hosted API
      • Object Detection
      • Classification
      • Instance Segmentation
        • Semantic Segmentation
      • Keypoint Detection
      • Foundation Models
        • CLIP
        • OCR
        • YOLO-World
      • Video Inference
        • Use a Fine-Tuned Model
        • Use CLIP
        • Use Gaze Detection
        • API Reference
        • Video Inference JSON Output Format
      • Pre-Trained Model APIs
        • Blur People API
        • OCR API
        • Logistics API
        • Image Tagging API
        • People Detection API
        • Fish Detection API
        • Bird Detection API
        • PPE Detection API
        • Barcode Detection API
        • License Plate Detection API
        • Ceramic Defect Detection API
        • Metal Defect Detection API
    • Serverless Hosted API V2
    • Dedicated Deployments
      • How to create a dedicated deployment (Roboflow App)
      • How to create a dedicated deployment (Roboflow CLI)
      • How to use a dedicated deployment
      • How to manage dedicated deployment using HTTP APIs
    • SDKs
      • Python inference-sdk
      • Web Browser
        • inferencejs Reference
        • inferencejs Requirements
      • Lens Studio
        • Changelog - Lens Studio
      • Mobile iOS
      • Luxonis OAK
    • Upload Custom Weights
    • Download Roboflow Model Weights
    • Enterprise Deployment
      • License Server
      • Offline Mode
      • Kubernetes
      • Docker Compose
    • Model Monitoring
      • Alerting
  • Roboflow CLI
    • Introduction
    • Installation and Authentication
    • Getting Help
    • Upload Dataset
    • Download Dataset
    • Run Inference
  • API Reference
    • Introduction
    • Python Package
    • REST API Structure
    • Authentication
    • Workspace and Project IDs
    • Workspaces
    • Workspace Image Query
    • Batches
    • Annotation Jobs
    • Projects
      • Initialize
      • Create
      • Project Folders API
    • Images
      • Upload Images
      • Image Details
      • Upload Dataset
      • Upload an Annotation
      • Search
      • Tags
    • Versions
      • View a Version
      • Create a Project Version
    • Inference
    • Export Data
    • Train a Model
    • Annotation Insights
      • Annotation Insights (Legacy Endpoint)
    • Model Monitoring
      • Custom Metadata
      • Inference Result Stats
  • Support
    • Share a Workspace with Support
    • Account Deletion
    • Frequently Asked Questions
Powered by GitBook
On this page
  • Workflows
  • Endpoints
  • Benchmarks

Was this helpful?

  1. Deploy

Serverless Hosted API V2

Run Workflows and Model Inference on GPU-accelerated infrastructure in the Roboflow cloud.

PreviousPre-Trained Model APIsNextDedicated Deployments

Last updated 1 month ago

Was this helpful?

Serverless Hosted API V2 is similar to the but provides a GPU-accelerated inference endpoint for model inference. This allows for inferring on GPU-only models such as Florence2, SAM2 etc. and usually reduces the computational latency for Workflow and Inference requests.

Workflows

In order to use Serverless Hosted API V2 for Workflows select it as the deployment option in the Workflow editor, like so:

Endpoints

The Serverless Hosted API V2 has only one single endpoint

https://serverless.roboflow.com

for all models and workflows; this is in contrast to the V1 API which had many different endpoints based on the type of models being inferred.

Also note that the Semantic Segmentation models are not currently supported in V2.

Model Type
Serverless Hosted API V2
Hosted API V1

Object detection, Keypoint detection

https://serverless.roboflow.com

https://detect.roboflow.com

Instance Segmentation

https://serverless.roboflow.com

https://outline.roboflow.com

Classification

https://serverless.roboflow.com

https://classify.roboflow.com

Semantic Segmentation

Currently not supported

https://segment.roboflow.com

Foundataion models e.g. CLIP, OCR, YOLO-World etc.

https://serverless.roboflow.com

https://infer.roboflow.com

Benchmarks

The end-to-end latency of requests sent to the Serverless Hosted API V2 depends on several factors:

  1. Model architecture, which has a bearing on the execution time

  2. Size and resolution of the images that impact upload time and model inference time during execution

  3. Network latency and bandwidth, which affects request upload time and response download time.

  4. Service subscription and usage by other users at any specific time which could result in queueing latency

Model
V2 (E2E)
V2 (Exec)
V1 (E2E)
V1 (Exec)

yolov8x-640

401 ms

29 ms

4084 ms

821 ms

yolov8m-640

757 ms

21 ms

572 ms

265 ms

yolov8n-640

384 ms

17 ms

312 ms

63 ms

yolov8x-1280

483 ms

97 ms

6431 ms

3032 ms

yolov8m-1280

416 ms

52 ms

1841 ms

1006 ms

yolov8n-1280

428 ms

35 ms

464 ms

157 ms

We encourage users to run their own benchmarks for their model inferences and workflows to get real metrics on their specific usecases.

This sets the inference backend to use Serverless GPU inference. We note that the Serverless Hosted Inference V2 does not currently support (1) (making it unsuitable for video inference). (2) If your workflows use these features then we recommend checkout out.

We show some representative benchmarks performed on the Serverless Hosted API V2 and the Hosted API V1 in the table below. The results for Serverless Hosted API V2 and Hosted Inference (V1) show the end-to-end latency (E2E) as well as the execution time (Exec). These numbers are for information only, we encourage users to perform their own benchmarks using or their own custom benchmarks.

Stream API
Dynamic Python Blocks
Dedicated Deployments
our inference benchmark tools
Serverless Hosted API
Choose Serveless Hosted API V2
Timing diagram