Roboflow Docs
DashboardResourcesProducts
  • Product Documentation
  • Developer Reference
  • Changelog
  • Roboflow Documentation
  • Quickstart
  • Workspaces
    • Workspaces, Projects, and Models
    • Create a Workspace
    • Rename a Workspace
    • Delete a Workspace
  • Team Members
    • Invite a Team Member
    • Role-Based Access Control (RBAC)
    • Change a Team Member Role
    • Remove a Team Member
  • Single Sign On (SSO)
  • Workflows
    • What is Workflows?
    • Create a Workflow
    • Build a Workflow
    • Test a Workflow
    • Deploy a Workflow
    • Workflow Examples
      • Multimodal Model Workflow
    • Share a Workflow
    • Workflows AI Assistant
  • Enterprise Integrations
  • Workflow Blocks
    • Run a Model
      • Object Detection Model
      • Single-Label Classification Model
    • Visualize Predictions
      • Bounding Box Visualization
      • Label Visualization
      • Circle Visualization
      • Background Color Visualization
      • Classification Label Visualization
      • Crop Visualization
  • Dataset Management
    • Create a Project
    • Upload Images, Videos, and Annotations
      • Import Data from Cloud Providers
        • AWS S3 Bucket
        • Azure Blob Storage
        • Google Cloud Storage
      • Import from Roboflow Universe
    • Manage Datasets
      • Dataset Batches
      • Search a Dataset
      • Set Dataset Classes
      • Add Tags to Images
      • Create an Annotation Attribute
      • Download an Image
      • Delete an Image
    • Dataset Versions
      • Create a Dataset Version
      • Preprocess Images
      • Image Augmentation
        • Add Augmentations to Images
      • Delete a Version
    • Dataset Analytics
    • Merge Projects
    • Rename a Project
    • Delete a Project
    • Project Folders
    • Make a Project Public
    • Download a Dataset
  • Annotate
    • Introduction to Roboflow Annotate
    • Annotate an Image
      • Keyboard Shortcuts
      • Comment on an Image
      • Annotation History
      • Similarity Search
    • AI Labeling
      • Label Assist
      • Enhanced Smart Polygon with SAM
        • Smart Polygon (Legacy)
      • Box Prompting
      • Auto Label
    • Set Keypoint Skeletons
    • Annotate Keypoints
    • Annotate Multimodal Data
    • Collaborate on Labeling
    • Annotation Insights
  • Roboflow Labeling Services
  • Train
    • Train a Model
      • Train from a Universe Checkpoint
      • Train from Azure Vision
      • Train from Google Cloud
    • Roboflow Instant
    • Cancel a Training Job
    • Stop Training Early
    • View Training Results
    • View Trained Models
    • Evaluate Trained Models
  • Deploy
    • Deploy a Model or Workflow
    • Supported Models
    • Managed Deployments
    • Serverless Hosted API V2
      • Use in a Workflow
      • Use with the REST API
      • Run an Instant Model
    • Serverless Hosted API
      • Object Detection
      • Classification
      • Instance Segmentation
        • Semantic Segmentation
      • Keypoint Detection
      • Foundation Models
        • CLIP
        • OCR
        • YOLO-World
      • Video Inference
        • Use a Fine-Tuned Model
        • Use CLIP
        • Use Gaze Detection
        • API Reference
        • Video Inference JSON Output Format
      • Pre-Trained Model APIs
        • Blur People API
        • OCR API
        • Logistics API
        • Image Tagging API
        • People Detection API
        • Fish Detection API
        • Bird Detection API
        • PPE Detection API
        • Barcode Detection API
        • License Plate Detection API
        • Ceramic Defect Detection API
        • Metal Defect Detection API
    • Dedicated Deployments
      • Create a Dedicated Deployment
      • Make Requests to a Dedicated Deployment
      • Manage Dedicated Deployments with an API
    • Batch Processing
    • SDKs
      • Python inference-sdk
      • Web Browser
        • inferencejs Reference
        • inferencejs Requirements
      • Lens Studio
        • Changelog - Lens Studio
      • Luxonis OAK
    • Upload Custom Model Weights
    • Download Model Weights
    • Enterprise Deployment
      • License Server
      • Offline Mode
      • Kubernetes
      • Docker Compose
    • Model Monitoring
      • Alerting
  • Universe
    • What is Roboflow Universe?
    • Find a Dataset on Universe
    • Explore Images in a Universe Dataset
    • Fork a Universe Dataset
    • Find a Model on Universe
    • Download a Universe Dataset
  • Set a Project Description
  • View Project Analytics
  • Support
    • Share a Workspace with Support
    • Delete Your Roboflow Account
    • Apply for Academic Credits
  • Billing
    • Premium Trial
    • Credits
      • View Credit Usage
      • Enable or Disable Flex Billing
      • Purchase Prepaid Credits
    • Plans
      • Purchase a Plan
      • Cancel a Plan
      • Update Billing Details
      • Update Payment Method
      • View Invoices
Powered by GitBook
On this page
  • Installing and Using the CPU Inference Server
  • Deploying to Google Cloud VM
  • Deploying to Google Cloud Run
  • Handling Large Images With Tiling

Was this helpful?

  1. Deploy
  2. Legacy Documentation

CPU (Legacy)

Deploy your model to on CPU on your own infrastructure.

Last updated 2 months ago

Was this helpful?

This is an outdated version of a newer page available .

Installing and Using the CPU Inference Server

The inference API is available as a Docker container for 64-bit Intel and AMD machines, it is not compatible with Mac OS based devices. To install, simply pull the container:

sudo docker pull roboflow/inference-server:cpu

Then run it:

sudo docker run --net=host roboflow/inference-server:cpu

You can now use the Inference Server as a drop-in replacement for our (see those docs for example code snippets in several programming languages). Use the sample code from the Hosted API but replace https://detect.roboflow.com with http://{INFERENCE-SERVER-IP}:9001 in the API call. For example,

base64 YOUR_IMAGE.jpg | curl -d @- \
"http://10.0.0.1:9001/your-model/42?api_key=YOUR_KEY"

Locating Your Project Information

Note: The first call to a model will take a few seconds to download your weights and initialize them; subsequent predictions will be much quicker.

Deploying to Google Cloud VM

To deploy a Docker container to Google Cloud Virtual Machines, you will need to first create a Google Cloud account and set up a Virtual Machine (VM) instance. For this example, we are setting up a "e2-medium" instance by changing the machine type. For production workloads, you may need to choose a more powerful machine type.

Before we create the instance, scroll down to the "Boot disk" section and click "change" to increase your boot size to at least 50GB. In the boot disk settings we can also change the operating system. Please select the "Deep Learning on Linux" operating system with the default "Debian 10 based Deep Learning VM (with Intel MKL) M101" version.

Now we can scroll to the bottom and click "create" to initialize the instance. Once the VM is running, you can SSH into the instance and install Docker on it. Click the small "SSH" connect button twice to open up two terminals. We will use one of these terminals to run the Docker container and the other to run inference.

After you SSH into your Google Virtual Machine, you can pull and run the CPU inference server on your machine by using the command below. Wait for the container to run and say "inference-server is ready to receive traffic."

sudo docker run --net=host roboflow/inference-server:cpu

Once the container is running on your VM, you can access it using the internal IP address of the VM. You may need to configure the VM's firewall rules to allow incoming traffic to the container's ports. You can find your "Internal IP" in the VM instances page inside the Google Cloud platform. With this internal IP we can run the command below on the other SSH terminal:

base64 YOUR_IMAGE.jpg | curl -d @- "http://[Internal IP]:9001/[Model_ID]/[Version]?api_key=[YOUR_KEY]"

A successful curl call should trigger the docker container to download your Roboflow weights and prepare the inference engine. This is what a successful inference looks like:

Deploying to Google Cloud Run

To install the gcloud cli, you will first need to have the Google Cloud SDK installed on your system. You can download the Google Cloud SDK from the Google Cloud website. Once you have the SDK installed, you can use the gcloud command to install the gcloud cli.

Once you have Google Cloud SDK installed open up your favorite terminal to load the Roboflow Docker image.

sudo docker run --net=host roboflow/inference-server:cpu

After the Docker image is loaded we need to use gcloud to authenticate our terminal.

gcloud auth login
gcloud auth configure-docker
docker tag roboflow/inference-server:cpu gcr.io/[Google-Project-ID]/cpu-inference-server
docker push gcr.io/[Google-Project-ID]/cpu-inference-server

Under "Authentication" check the "Allow unauthenticated invocations" button to allow your service to run as an open API. Expand the "Container, Connections, Security" section and change the "Container port" number to 9001.

Scroll to the bottom of the create services page and click "CREATE". This will download the docker container to the service and run initialization. A successful build will return your service name with a green check mark which signifies your service has successfully built and ran the docker container.

Open up your Cloud Run service by clicking on the name of the service you wish to open. We are going to open "cpu-inference-server" service. Once open, copy the service URL. Your service URL will look something like this https://cpu-inference-server-njsdrsria-uc.a.run.app. With this service URL, we have everything we need to run the curl request and use our Roboflow model.

To run the curl request, open up a terminal and use the base64 command below:

base64 your_image.jpg | curl -d @- "https://[Service_URL]/[MODEL_ID]/[VERSION]?api_key=[YOUR_API_KEY]"

Handling Large Images With Tiling

In some cases, you might need to infer on very large images, where accuracy can degrade significantly. In cases like this, you'll want your inference server to slice these images into smaller tiles before running inference for better accuracy.

To tile with a given pixel width and height, you'll need to curl the inference server with a query parameter containing a pixel dimension. We'll take that pixel dimension and use it to create tiles with those dimensions for width and height. The query parameter should look like &tile=500. This will slice your image into 500x500 pixel tiles before running inference.

Full curl request example:

#slices your image into 300x300 tiles before running inference
base64 your_img.jpg | curl -d @- "http://localhost:9001/[YOUR MODEL]/[YOUR VERSION]?api_key=[YOUR API KEY]&tile=[YOUR TILE DIMENSIONS]"

You can find the documentation to install the Google Cloud SDK here:

Now that our terminal is authenticated we can use docker tag and docker push to get the Roboflow image into .

Now that the roboflow/inference-server:cpu is uploaded to your . Navigate to to create a service with the uploaded images inside of our Container Registry. Click "SELECT" in the existing containers section, navigate to the "cpu-inference-server" folder and select the latest build.

https://cloud.google.com/sdk/docs/install
Google Cloud Container Registry
Google Cloud Container Registry
Cloud Run
here
Hosted Inference API