(Legacy) Serverless Hosted API

circle-info

We recommend using the V2 of our Serverless Hosted API. The V2 API is faster. Refer to the Serverless Hosted API V2 documentation to get started with the new API.

Model Support

The following models types are supported by the Serverless Hosted API (v1):

Latency comparison (v1 vs v2)

The end-to-end latency of requests sent to the Serverless Hosted API depends on several factors:

  1. Model architecture, which has a bearing on the execution time

  2. Size and resolution of the images that impact upload time and model inference time during execution

  3. Network latency and bandwidth, which affects request upload time and response download time.

  4. Service subscription and usage by other users at any specific time which could result in queueing latency

We show some representative benchmarks of v1 vs v2 Serverless Hosted API in the table below. It shows both the end-to-end latency (E2E) as well as the execution time (Exec). These numbers are for information only, we encourage users to perform their own benchmarks using our inference benchmark toolsarrow-up-right or their own custom benchmarks.

Model
V2 (E2E)
V2 (Exec)
V1 (E2E)
V1 (Exec)

yolov8x-640

401 ms

29 ms

4084 ms

821 ms

yolov8m-640

757 ms

21 ms

572 ms

265 ms

yolov8n-640

384 ms

17 ms

312 ms

63 ms

yolov8x-1280

483 ms

97 ms

6431 ms

3032 ms

yolov8m-1280

416 ms

52 ms

1841 ms

1006 ms

yolov8n-1280

428 ms

35 ms

464 ms

157 ms

We encourage users to run their own benchmarks for their model inferences and workflows to get real metrics on their specific usecases.

Limits

The Serverless Hosted API (v1), regardless of the specific task type, accepts files up to 5MB. This limit includes, but is not limited to, the image file size plus any request information attached.

circle-info

In the cases that requests are too large, we recommend downsizing any attached images. This usually will not result in poor performance as images are downsized regardless after they've been received on our servers to the input size that the model architecture accepts. Some of our SDKs, like the Python SDK, automatically downsize images to the model architecture's input size before they are sent to the API.

Last updated

Was this helpful?