Getting started with Roboflow Inference on Kubernetes
Roboflow’s inference docker containers can be easily hosted within Kubernetes.
The Kubernetes manifest below shows a simple example of creating a single CPU-based roboflow infer pod and attaching a cluster-IP service to it.
# Pod
apiVersion: v1
kind: Pod
name: roboflow
labels: roboflow
- name: roboflow
image: roboflow/inference-server:cpu
- containerPort: 9001
name: rf-pod-port
# Service
apiVersion: v1
kind: Service
name: rf-service
type: ClusterIP
selector: roboflow
- name: rf-svc-port
protocol: TCP
port: 9001
targetPort: rf-pod-port
(the above example assumes your Kubernetes cluster can download images from Docker hub)
Save the blurb of yaml above as roboflow.yaml and use the kubectl cli to deploy the pod and service into the default namespace of your Kubernetes cluster.
kubectl apply -f roboflow.yaml
A service (of type ClusterIP) will be created; you can access Roboflow inference from within the Kubernetes cluster at this URI: http://rf-service.default.svc:9001

Beyond this example

Kubernetes gives you the power to incorporate several advanced features and extensions into your Roboflow inference service. For example, you could extend the above example for more advanced use-cases such as
  • Using nodeSelectors to host the pod(s) on GPU machines node pools within your Kubernetes environments and using the roboflow/inference-server:gpu image
  • Creating Kubernetes deployments to horizontally autoscale the Roboflow inference service and setting up auto-scaling triggers based on specific metrics like CPU usage.
  • Using different service types like nodePort and LoadBalancer to serve the Roboflow inference service externally
  • Use ingress controllers to expose Roboflow inference over TLS (HTTPs) etc.
  • Add monitoring and alerting to your Roboflow inference service
  • Integrating the license server and offline modes