(Legacy) Serverless Hosted API
We recommend using the V2 of our Serverless Hosted API. The V2 API is faster. Refer to the Serverless Hosted API V2 documentation to get started with the new API.
Model Support
The following models types are supported by the Serverless Hosted API (v1):
Latency comparison (v1 vs v2)
The end-to-end latency of requests sent to the Serverless Hosted API depends on several factors:
Model architecture, which has a bearing on the execution time
Size and resolution of the images that impact upload time and model inference time during execution
Network latency and bandwidth, which affects request upload time and response download time.
Service subscription and usage by other users at any specific time which could result in queueing latency

We show some representative benchmarks of v1 vs v2 Serverless Hosted API in the table below. It shows both the end-to-end latency (E2E) as well as the execution time (Exec). These numbers are for information only, we encourage users to perform their own benchmarks using our inference benchmark tools or their own custom benchmarks.
yolov8x-640
401 ms
29 ms
4084 ms
821 ms
yolov8m-640
757 ms
21 ms
572 ms
265 ms
yolov8n-640
384 ms
17 ms
312 ms
63 ms
yolov8x-1280
483 ms
97 ms
6431 ms
3032 ms
yolov8m-1280
416 ms
52 ms
1841 ms
1006 ms
yolov8n-1280
428 ms
35 ms
464 ms
157 ms
We encourage users to run their own benchmarks for their model inferences and workflows to get real metrics on their specific usecases.
Limits
The Serverless Hosted API (v1), regardless of the specific task type, accepts files up to 5MB. This limit includes, but is not limited to, the image file size plus any request information attached.
In the cases that requests are too large, we recommend downsizing any attached images. This usually will not result in poor performance as images are downsized regardless after they've been received on our servers to the input size that the model architecture accepts. Some of our SDKs, like the Python SDK, automatically downsize images to the model architecture's input size before they are sent to the API.
Last updated
Was this helpful?