CPU (Legacy)
Deploy your model to on CPU on your own infrastructure.
Last updated
Deploy your model to on CPU on your own infrastructure.
Last updated
This is an outdated version of a newer page available here.
The inference API is available as a Docker container for 64-bit Intel and AMD machines, it is not compatible with Mac OS based devices. To install, simply pull the container:
Then run it:
You can now use the Inference Server as a drop-in replacement for our Hosted Inference API (see those docs for example code snippets in several programming languages). Use the sample code from the Hosted API but replace https://detect.roboflow.com
with http://{INFERENCE-SERVER-IP}:9001
in the API call. For example,
Note: The first call to a model will take a few seconds to download your weights and initialize them; subsequent predictions will be much quicker.
To deploy a Docker container to Google Cloud Virtual Machines, you will need to first create a Google Cloud account and set up a Virtual Machine (VM) instance. For this example, we are setting up a "e2-medium" instance by changing the machine type. For production workloads, you may need to choose a more powerful machine type.
Before we create the instance, scroll down to the "Boot disk" section and click "change" to increase your boot size to at least 50GB. In the boot disk settings we can also change the operating system. Please select the "Deep Learning on Linux" operating system with the default "Debian 10 based Deep Learning VM (with Intel MKL) M101" version.
Now we can scroll to the bottom and click "create" to initialize the instance. Once the VM is running, you can SSH into the instance and install Docker on it. Click the small "SSH" connect button twice to open up two terminals. We will use one of these terminals to run the Docker container and the other to run inference.
After you SSH into your Google Virtual Machine, you can pull and run the CPU inference server on your machine by using the command below. Wait for the container to run and say "inference-server is ready to receive traffic."
Once the container is running on your VM, you can access it using the internal IP address of the VM. You may need to configure the VM's firewall rules to allow incoming traffic to the container's ports. You can find your "Internal IP" in the VM instances page inside the Google Cloud platform. With this internal IP we can run the command below on the other SSH terminal:
A successful curl call should trigger the docker container to download your Roboflow weights and prepare the inference engine. This is what a successful inference looks like:
To install the gcloud cli, you will first need to have the Google Cloud SDK installed on your system. You can download the Google Cloud SDK from the Google Cloud website. Once you have the SDK installed, you can use the gcloud
command to install the gcloud cli.
You can find the documentation to install the Google Cloud SDK here: https://cloud.google.com/sdk/docs/install
Once you have Google Cloud SDK installed open up your favorite terminal to load the Roboflow Docker image.
After the Docker image is loaded we need to use gcloud
to authenticate our terminal.
Now that our terminal is authenticated we can use docker tag
and docker push
to get the Roboflow image into Google Cloud Container Registry.
Now that the roboflow/inference-server:cpu
is uploaded to your Google Cloud Container Registry. Navigate to Cloud Run to create a service with the uploaded images inside of our Container Registry. Click "SELECT" in the existing containers section, navigate to the "cpu-inference-server" folder and select the latest build.
Under "Authentication" check the "Allow unauthenticated invocations" button to allow your service to run as an open API. Expand the "Container, Connections, Security" section and change the "Container port" number to 9001
.
Scroll to the bottom of the create services page and click "CREATE". This will download the docker container to the service and run initialization. A successful build will return your service name with a green check mark which signifies your service has successfully built and ran the docker container.
Open up your Cloud Run service by clicking on the name of the service you wish to open. We are going to open "cpu-inference-server" service. Once open, copy the service URL. Your service URL will look something like this https://cpu-inference-server-njsdrsria-uc.a.run.app
. With this service URL, we have everything we need to run the curl request and use our Roboflow model.
To run the curl request, open up a terminal and use the base64 command below:
base64 your_image.jpg | curl -d @- "https://[Service_URL]/[MODEL_ID]/[VERSION]?api_key=[YOUR_API_KEY]"
In some cases, you might need to infer on very large images, where accuracy can degrade significantly. In cases like this, you'll want your inference server to slice these images into smaller tiles before running inference for better accuracy.
To tile with a given pixel width and height, you'll need to curl the inference server with a query parameter containing a pixel dimension. We'll take that pixel dimension and use it to create tiles with those dimensions for width and height. The query parameter should look like &tile=500
. This will slice your image into 500x500 pixel tiles before running inference.
Full curl request example: