NVIDIA Jetson
Deploy your Roboflow Train model to your NVIDIA Jetson with GPU acceleration.
Last updated
Deploy your Roboflow Train model to your NVIDIA Jetson with GPU acceleration.
Last updated
View the step-by-step guide on deploying with Roboflow to NVIDIA Jetson.
Our Hosted API is suitable for most use-cases; it uses battle-tested infrastructure and seamlessly autoscales up and down to handle even the most intense use-cases. But, because it is hosted remotely, there are some scenarios where it's not ideal: notably, in situations where bandwidth is constrained or where production data cannot extend beyond your local network or corporate firewall or where you need realtime inference speeds on the edge. In those cases, an on-premise deployment is needed.
Object Detection
✅
Classification
✅
Instance Segmentation
✅
Semantic Segmentation
✅
The inference API is available as a Docker container optimized and configured for the NVIDIA Jetson line of devices. You should use the latest stable version of NVIDIA's Jetson Jetpack (last tested on version 4.6) which comes ready to run this container. To install, simply pull the container:
Then run it (while passing through access to the Jetson's GPU and native networking stack for speed):
You can now use your Jetson as a drop-in replacement for the Hosted Inference API (see those docs for example code snippets in several programming languages). If you're running your application directly on the Jetson, use the sample code from the Hosted API but replace https://detect.roboflow.com
with http://localhost:9001
in the API call. For example,
You can also run as a client-server context and send images to the Jetson for inference from another machine on your network; simply replace localhost
with the Jetson's local IP address.
Note: The first call to the model will take a few seconds to download your model weights and initialize them on the GPU; subsequent predictions will be much quicker.
On our local tests, we saw a sustained throughput of
4 frames per second on the Jetson Nano 2GB (with swap memory)
6 frames per second on the Jetson Nano 4GB
10 frames per second on the Jetson Xavier NX (single instance)
15 frames per second on the Jetson Xavier NX (2 instance cluster; see below)
These results were obtained using while operating in a client-server context (so there is some minor network latency involved) and a 416x416 model.
Note: If your application is also running on the Jetson itself, you will incur less network latency but will also be sharing compute and memory resources so your results may vary.
The weights for your model are downloaded each time the container runs. Full offline mode support (for autonomous and air-gapped devices) is available for enterprise deployments.
The Jetson Nano 2GB requires a swapfile
to be created or it will run out of memory and crash while trying to initialize your model. Do this before you docker run
the inference server container to add 8GB of swap memory:
To persist these changes, add the following line to the end of /etc/fstab
Then reboot your device.
Because the swapfile lives on your micro SD card, it's important to ensure you have a card with high throughput. You may also want to disable your X Server and run in headless mode (see below).
The inference server is configured with a cluster mode that will let it run multiple instances of itself in parallel and automatically do load balancing of requests between them. This will not improve the latency but it will let your device process multiple images at one time by letting it utilize more CPU cores.
The number of instances you can run is limited by input size of your model (determined by your Resize
preprocessing step or defaulting to 416x416 if you did not select a size), the amount of memory your device has, and the amount of memory needed for other services on the device (like your application code).
A Xavier NX's 8GB of memory can safely fit two instances of most models in memory while still leaving room for your program to run.
Tip: To ensure the maximum amount of memory is available, be sure to shut down your X Server with sudo service gdm stop
or, if you want this to be the default mode for the system, sudo systemctl set-default multi-user.target
.
To start a two instance cluster, add --env INSTANCES=2
to the docker run
command:
In our tests, the Jetson Xavier NX's throughput went from ~10 frames per second with a single instance to 15 fps with two instances. We were only able to run a single instance on the Jetson Nano.
We got best results with power mode 1
(15 watt, 4 CPU cores) for 2-instance cluster mode on the Xavier NX. You can enable this mode with:
The Roboflow Inference Server takes up 5GB of disk space. We recommend using a fast SD Card (at least U3, V30). Alternatively, some Jetson-powered embedded devices feature integrated eMMC flash memory which should be even more performant. The default JetPack install consumes ~15GB so it's preferable to have an SD Card or Flash Memory capacity of 32GB or higher (but it is possible to run on 16GB by removing unnecessary packages).
Our Docker image contains all of the needed CUDA and CuDNN packages required to run our models; this means CUDA and CuDNN are not needed on the host so long as NVIDIA Container Runtime and the NVIDIA graphics drivers remain installed.
You can remove these packages and free up about 7GB of space by running the following commands to remove CUDA, CuDNN, and some other large extraneous packages: