NVIDIA Jetson
Deploy your Roboflow model on the edge to the NVIDIA Jetson
The Roboflow Inference server is a drop-in replacement for the Hosted Inference API that can be deployed on your own hardware. We have optimized it to get maximum performance from the NVIDIA Jetson line of edge-AI devices by specifically tailoring the drivers, libraries, and binaries specifically to its CPU and GPU architectures.
The following task types are supported by the hosted API:
You can take the edge acceleration version of your model to the NVIDIA Jetson, where you may need realtime speeds with limited hardware resources.
Ensure that your Jetson is flashed with Jetpack 4.5, 4.6, or 5.1. You can check you existing with this repository from Jetson Hacks
git clone https://github.com/jetsonhacks/jetsonUtilities.git
cd jetsonUtilities
python jetsonInfo.py
Next, run the Roboflow Inference Server using the accompanying Docker container:
sudo docker run --privileged --net=host --runtime=nvidia --mount source=roboflow,target=/tmp/cache -e NUM_WORKERS=1 roboflow/roboflow-inference-server-jetson-4.5.0:latest
The docker image you need depends on what Jetpack version you are using.
- Jetpack 4.5: roboflow/roboflow-inference-server-jetson-4.5.0
- Jetpack 4.6: roboflow/roboflow-inference-server-jetson-4.6.1
- Jetpack 5.1: roboflow/roboflow-inference-server-jetson-5.1.1
The Jetson images default to using a CUDA execution provider. To use TensorRT, set the environment variable
ONNXRUNTIME_EXECUTION_PROVIDERS=TensorrtExecutionProvider
. Note, while using TensorRT can increase performance, it also incurs an additional startup compilation cost.You can now use the server to run inference on any of your models. The following command shows the syntax for making a request to the inference API via
curl
:base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
When you send a request for the first time, your model will compile on your Jetson device for 5-10 minutes.
Last modified 17d ago