Enterprise GPU
As an additional Enterprise deployment, we offer an accelerated inference solution that you can deploy to your GPU devices.

Installation Requirements

These deployment options require a Roboflow Enterprise License.
To deploy the Enterprise GPU inference server, you must first install NVIDIA drivers and nvidia-container-runtime, allowing docker to passthrough your GPU to the inference server. You can test to see if your system already has nvidia-container-runtime and if your installation was successful with the following command:
1
docker run --gpus all -it ubuntu nvidia-smi
Copied!
If your installation was successful, you will see your GPU device from within the container:
1
Tue Nov 9 16:04:47 2021
2
+-----------------------------------------------------------------------------+
3
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: N/A |
4
|-------------------------------+----------------------+----------------------+
5
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
6
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
7
| | | MIG M. |
8
|===============================+======================+======================|
9
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
10
| N/A 41C P0 56W / 149W | 504MiB / 11441MiB | 0% Default |
11
| | | N/A |
12
+-------------------------------+----------------------+----------------------+
13
14
+-----------------------------------------------------------------------------+
15
| Processes: |
16
| GPU GI CI PID Type Process name GPU Memory |
17
| ID ID Usage |
18
|=============================================================================|
19
+-----------------------------------------------------------------------------+
Copied!

Enterprise GPU TRT

The Enterprise GPU TRT deployment compiles your model on device, optimizing for the hardware that you have available.
Start the Enterprise GPU server with the following command:
1
sudo docker run --gpus all -p 9001:9001 --network="host" roboflow/inference-server:trt
Copied!
Next, you can optionally compile your model in advance. Note: If you do not compile your model, the servers first inference will take a while as your model compiles on your device:
1
curl http://localhost:9001/start/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]
2
#returns 200, Engine Creation and Ignition Success
Copied!
Run inference on your model by posting a base64 encoded image to the server:
1
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
2
#returns a json response of your model's object detection predictions
Copied!

Caching Your Model for Offline Mode

In certain cases, you might want to cache your model locally so that each time the server starts it does not need to communicate with the external Roboflow servers to download your model.
To cache your model offline, first create a docker volume:
1
docker volume create roboflow
Copied!
Then, start the server with your docker volume mounted on the /cache directory:
1
sudo docker run --gpus all -p 9001:9001 --network="host" --mount source=roboflow,target=/cache roboflow/inference-server:trt
Copied!

Handling Large Images With Tiling

In some cases, you might need to infer on very large images, where accuracy can degrade significantly. In cases like this, you'll want your inference server to slice these images into smaller tiles before running inference for better accuracy.
To tile with a given pixel width and height, you'll need to curl the inference server with a query parameter containing a pixel dimension. We'll take that pixel dimension and use it to create tiles with those dimensions for width and height. The query parameter should look like &tile=500. This will slice your image into 500x500 pixel tiles before running inference.
Full curl request example:
1
#slices your image into 300x300 tiles before running inference
2
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]&tile=[YOUR TILE DIMENSIONS]"
Copied!

Attach Video Capture and UDP Socket to Jetson Inference Container

Plug a UDP socket into your Jetson deployment method when you require realtime network configuration between your Jetson and another machine in your network.
1
sudo docker run --privileged --device=/dev/video0:/dev/video0 --net=host --gpus all -e DATASET="YOUR_MODEL" -e VERSION="YOUR_VERSION" -e API_KEY="YOUR_API_KEY" -e VIDEO_DEVICE="/dev/video0" -e IP_BROADCAST_ADDR="0.0.0.0" -e IP_BROADCAST_PORT="8080" --mount source=roboflow,target=/cache roboflow/inference-server:trt-jetson-udp
Copied!

Explanation of required environment variables:

DATASET - the dataset you want to launch
VERSION - the version of your dataset you want to launch
API_KEY - your Roboflow api key
VIDEO_DEVICE - the video device you want to capture frames from, be sure it's forwarded to docker with the --device flag as well
IP_BROADCAST_ADDR - the IP address you want to broadcast predictions to via UDP
IP_BROADCAST_PORT - the port to broadcast predictions to

Explanation of optional environment variables:

CONFIDENCE - the confidence to filter predictions from, default 0.4
OVERLAP - the IoU overlap at which to filter predictions during NMS, default 0.3
CAP_WIDTH , CAP_HEIGHT - the resolution to capture from your camera, default 640, 380 . Inference is performed at the resolution that you trained your model on.
`