Enterprise GPU
As an additional Enterprise deployment, we offer an accelerated inference solution that you can deploy to your GPU devices.

Installation Requirements

These deployment options require a Roboflow Enterprise License.
To deploy the Enterprise GPU inference server, you must first install NVIDIA drivers and nvidia-container-runtime, allowing docker to passthrough your GPU to the inference server. You can test to see if your system already has nvidia-container-runtime and if your installation was successful with the following command:
1
docker run --gpus all -it ubuntu nvidia-smi
Copied!
If your installation was successful, you will see your GPU device from within the container:
1
Tue Nov 9 16:04:47 2021
2
+-----------------------------------------------------------------------------+
3
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: N/A |
4
|-------------------------------+----------------------+----------------------+
5
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
6
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
7
| | | MIG M. |
8
|===============================+======================+======================|
9
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
10
| N/A 41C P0 56W / 149W | 504MiB / 11441MiB | 0% Default |
11
| | | N/A |
12
+-------------------------------+----------------------+----------------------+
13
14
+-----------------------------------------------------------------------------+
15
| Processes: |
16
| GPU GI CI PID Type Process name GPU Memory |
17
| ID ID Usage |
18
|=============================================================================|
19
+-----------------------------------------------------------------------------+
Copied!

Enterprise GPU x86_64

For systems with a x86_64 CPU architecture.
You can check your architecture with the arch command in the terminal.
Then start the x86_64 inference server with the following command:
1
docker run --gpus all -p 9001:9001 roboflow/inference-server:gpu-`arch`
Copied!
Run inference on your model by posting a base64 encoded image to the server:
1
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
2
#returns a json response of your model's object detection predictions
Copied!

🚧 Enterprise GPU TRT

Note: this deployment option is under construction. The Enterprise GPU TRT deployment compiles your model on device, optimizing for the hardware that you have available.
Start the Enterprise GPU server with the following command:
1
sudo docker run --gpus all -p 9001:9001 --network="host" roboflow/inference-server:trt
Copied!
Next, you can optionally compile your model in advance. Note: If you do not compile your model, the servers first inference will take a while as your model compiles on your device:
1
curl http://localhost:9001/start/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]
2
#returns 200, Engine Creation and Ignition Success
Copied!
Run inference on your model by posting a base64 encoded image to the server:
1
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
2
#returns a json response of your model's object detection predictions
Copied!

Caching Your Model for Offline Mode

In certain cases, you might want to cache your model locally so that each time the server starts it does not need to communicate with the external Roboflow servers to download your model.
To cache your model offline, first create a docker volume:
1
docker volume create roboflow
Copied!
Then, start the server with your docker volume mounted on the /cache directory:
1
sudo docker run --gpus all -p 9001:9001 --network="host" --mount source=roboflow,target=/cache roboflow/inference-server:trt
Copied!
Last modified 2mo ago