Deploy your Roboflow model on the edge to the NVIDIA Jetson
Prefer to learn using video? Check out our NVIDIA Jetson deployment guide video.
The Roboflow Inference server is a drop-in replacement for the Hosted Inference API that can be deployed on your own hardware. We have optimized it to get maximum performance from the NVIDIA Jetson line of edge-AI devices by specifically tailoring the drivers, libraries, and binaries specifically to its CPU and GPU architectures.

Task Support

The following task types are supported by the hosted API:
Task Type
Supported by NVIDIA Jetson
Object Detection
Instance Segmentation
Semantic Segmentation


You can take the edge acceleration version of your model to the NVIDIA Jetson, where you may need realtime speeds with limited hardware resources.

Step #1: Flash Jetson Device

Ensure that your Jetson is flashed with Jetpack 4.5, 4.6, or 5.1. You can check you existing with this repository from Jetson Hacks
git clone
cd jetsonUtilities

Step #2: Run Docker Container

Next, run the Roboflow Inference Server using the accompanying Docker container:
sudo docker run --privileged --net=host --runtime=nvidia --mount source=roboflow,target=/tmp/cache -e NUM_WORKERS=1 roboflow/roboflow-inference-server-jetson-4.5.0:latest
The docker image you need depends on what Jetpack version you are using.
  • Jetpack 4.5: roboflow/roboflow-inference-server-jetson-4.5.0
  • Jetpack 4.6: roboflow/roboflow-inference-server-jetson-4.6.1
  • Jetpack 5.1: roboflow/roboflow-inference-server-jetson-5.1.1
The Jetson images default to using a CUDA execution provider. To use TensorRT, set the environment variable ONNXRUNTIME_EXECUTION_PROVIDERS=TensorrtExecutionProvider. Note, while using TensorRT can increase performance, it also incurs an additional startup compilation cost.

Step #3: Use the Server

You can now use the server to run inference on any of your models. The following command shows the syntax for making a request to the inference API via curl:
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
When you send a request for the first time, your model will compile on your Jetson device for 5-10 minutes.
Last modified 17d ago