Deploy your Roboflow model on the edge to the NVIDIA Jetson

Prefer to learn using video? Check out our NVIDIA Jetson deployment guide video.

The Roboflow Inference server is a drop-in replacement for the Hosted Inference API that can be deployed on your own hardware. We have optimized it to get maximum performance from the NVIDIA Jetson line of edge-AI devices by specifically tailoring the drivers, libraries, and binaries specifically to its CPU and GPU architectures.

Task Support

The following task types are supported by the hosted API:

Task TypeSupported by NVIDIA Jetson

Object Detection


Instance Segmentation

Semantic Segmentation


You can take the edge acceleration version of your model to the NVIDIA Jetson, where you may need realtime speeds with limited hardware resources.

Step #1: Flash Jetson Device

Ensure that your Jetson is flashed with Jetpack 4.5, 4.6, or 5.1. You can check you existing with this repository from Jetson Hacks

git clone https://github.com/jetsonhacks/jetsonUtilities.git
cd jetsonUtilities
python jetsonInfo.py

Step #2: Run Docker Container

Next, run the Roboflow Inference Server using the accompanying Docker container:

sudo docker run --privileged --net=host --runtime=nvidia --mount source=roboflow,target=/tmp/cache -e NUM_WORKERS=1 roboflow/roboflow-inference-server-jetson-4.5.0:latest

The docker image you need depends on what Jetpack version you are using.

  • Jetpack 4.5: roboflow/roboflow-inference-server-jetson-4.5.0

  • Jetpack 4.6: roboflow/roboflow-inference-server-jetson-4.6.1

  • Jetpack 5.1: roboflow/roboflow-inference-server-jetson-5.1.1

The Jetson images default to using a CUDA execution provider. To use TensorRT, set the environment variable ONNXRUNTIME_EXECUTION_PROVIDERS=TensorrtExecutionProvider. Note, while using TensorRT can increase performance, it also incurs an additional startup compilation cost.

Step #3: Use the Server

You can now use the server to run inference on any of your models. The following command shows the syntax for making a request to the inference API via curl:

base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"

When you send a request for the first time, your model will compile on your Jetson device for 5-10 minutes.

Expected Performance

There are many factors that affect the performance of a particular inference pipeline including model size, input image size, model input size, confidence threshold, etc. For those looking for a rough estimate of performance, we provide the benchmarks below:


Model Type: Roboflow 3.0 Fast

Model Input Resolution: 640 x 640

Input Image Size: 1024 x 1024

Hardware: Jetson Orin Nano running Jetpack 5.1.1


Python Script via pip install inference: 30 FPS

HTTP Requests to roboflow/roboflow-inference-server-jetson-5.1.1:0.9.1: 15FPS

More benchmarks for varying configurations coming soon!

Last updated