Serverless Hosted API V2

Run Workflows and Model Inference on GPU-accelerated infrastructure in the Roboflow cloud.

Models deployed to Roboflow have a REST API available through which you can run inference on images. This deployment method is ideal for environments where you have a persistent internet connection on your deployment device.

The API associated with your project scales with you: as you grow your project and have greater inference requirements, your API will grow.

Serverless Hosted API V2 is our newest API offering. It is faster than V1 and works with models that require a GPU such as Florence-2 and SAM-2.

Use the API in a Workflow

Use with the REST API

Benchmarks

The end-to-end latency of requests sent to the Serverless Hosted API V2 depends on several factors:

Model architecture, which has a bearing on the execution time
Size and resolution of the images that impact upload time and model inference time during execution
Network latency and bandwidth, which affects request upload time and response download time.
Service subscription and usage by other users at any specific time which could result in queueing latency

We show some representative benchmarks performed on the Serverless Hosted API V2 and the Hosted API V1 in the table below. The results for Serverless Hosted API V2 and Hosted Inference (V1) show the end-to-end latency (E2E) as well as the execution time (Exec). These numbers are for information only, we encourage users to perform their own benchmarks using our inference benchmark tools or their own custom benchmarks.

Model

V2 (E2E)

V2 (Exec)

V1 (E2E)

V1 (Exec)

yolov8x-640

401 ms

29 ms

4084 ms

821 ms

yolov8m-640

757 ms

21 ms

572 ms

265 ms

yolov8n-640

384 ms

17 ms

312 ms

63 ms

yolov8x-1280

483 ms

97 ms

6431 ms

3032 ms

yolov8m-1280

416 ms

52 ms

1841 ms

1006 ms

yolov8n-1280

428 ms

35 ms

464 ms

157 ms

We encourage users to run their own benchmarks for their model inferences and workflows to get real metrics on their specific usecases.

Limits

For our Serverless Hosted API V2, you can upload files up to 20MB. Given that this is a new API, you may run into limitations with higher resolution images. Should you run into an issue, please reach out to your enterprise support contact or post a message to the forum.

In the cases that requests are too large, we recommend downsizing any attached images. This usually will not result in poor performance as images are downsized regardless after they've been received on our servers to the input size that the model architecture accepts. Some of our SDKs, like the Python SDK, automatically downsize images to the model architecture's input size before they are sent to the API.

PreviousSupported Models NextUse in a Workflow

Last updated 1 month ago

Was this helpful?