Enterprise GPU
As an additional Enterprise deployment, we offer an accelerated inference solution that you can deploy to your Enterprise GPUs.
The Enterprise GPU deployment compiles your model on device, optimizing for the hardware that you have available.
You must first install NVIDIA drivers and nvidia-container-runtime, allowing docker to passthrough your GPU to the inference server:
1
sudo apt-get install nvidia-container-runtime
Copied!
Then you can stand up the Enterprise GPU server with the following command:
1
sudo docker run --gpus all -p 9001:9001 --network="host" roboflow/inference-server:trt
Copied!
Next, you can optionally compile your model in advance. Note: If you do not compile your model, the servers first inference will take a while as your model compiles on your device:
1
curl http://localhost:9001/start/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]
2
#returns 200, Engine Creation and Ignition Success
Copied!
Run inference on your model by posting a base64 encoded image to the server:
1
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]"
2
#returns the usual json response of your object detection predictions
Copied!

Caching Your Model for Offline Mode

In certain cases, you might want to cache your model locally so that each time the server starts it does not need to communicate with the external Roboflow servers to download your model.
To cache your model offline, first create a docker volume:
1
docker volume create roboflow
Copied!
Then, start the server with your docker volume mounted on the /cache directory:
1
sudo docker run --gpus all -p 9001:9001 --network="host" --mount source=roboflow,target=/cache roboflow/inference-server:trt
Copied!
Last modified 16h ago