Serverless Hosted API V2
Run Workflows and Model Inference on GPU-accelerated infrastructure in the Roboflow cloud.
Last updated
Was this helpful?
Run Workflows and Model Inference on GPU-accelerated infrastructure in the Roboflow cloud.
Last updated
Was this helpful?
Models deployed to Roboflow have a REST API available through which you can run inference on images. This deployment method is ideal for environments where you have a persistent internet connection on your deployment device.
The API associated with your project scales with you: as you grow your project and have greater inference requirements, your API will grow.
Serverless Hosted API V2 is our newest API offering. It is faster than V1 and works with models that require a GPU such as Florence-2 and SAM-2.
The end-to-end latency of requests sent to the Serverless Hosted API V2 depends on several factors:
Model architecture, which has a bearing on the execution time
Size and resolution of the images that impact upload time and model inference time during execution
Network latency and bandwidth, which affects request upload time and response download time.
Service subscription and usage by other users at any specific time which could result in queueing latency
yolov8x-640
401 ms
29 ms
4084 ms
821 ms
yolov8m-640
757 ms
21 ms
572 ms
265 ms
yolov8n-640
384 ms
17 ms
312 ms
63 ms
yolov8x-1280
483 ms
97 ms
6431 ms
3032 ms
yolov8m-1280
416 ms
52 ms
1841 ms
1006 ms
yolov8n-1280
428 ms
35 ms
464 ms
157 ms
We encourage users to run their own benchmarks for their model inferences and workflows to get real metrics on their specific usecases.
We show some representative benchmarks performed on the Serverless Hosted API V2 and the Hosted API V1 in the table below. The results for Serverless Hosted API V2 and Hosted Inference (V1) show the end-to-end latency (E2E) as well as the execution time (Exec). These numbers are for information only, we encourage users to perform their own benchmarks using or their own custom benchmarks.