PaliGemma 2

Use Google's PaliGemma 2 vision-language model through our Serverless Hosted API

PaliGemma 2 is Google's vision-language model. It accepts an image and a text prompt and returns a text response. We support PaliGemma 2 through our Serverless Hosted API, Dedicated Deployments, and self-hosted Inference.

Code sample

Install the Inference SDK:

pip install inference-sdk

Pass your Roboflow API Key via the API_KEY environment variable. The sample calls the pretrained paligemma2-3b-pt-224 checkpoint with a caption prompt.

import os
import urllib.request
from inference_sdk import InferenceHTTPClient

image_url = "https://media.roboflow.com/notebooks/examples/dog.jpeg"
image_path = "dog.jpeg"
urllib.request.urlretrieve(image_url, image_path)

client = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key=os.getenv("API_KEY"),
)
result = client.infer_lmm(
    image_path,
    model_id="paligemma2-3b-pt-224",
    prompt="caption en",
    max_new_tokens=64,
)
print(result["response"])

The code above prints the model response to the terminal:

Set api_url to match your deployment target:

  • https://serverless.roboflow.com for the Serverless Hosted API.

  • http://localhost:9001 for a local Inference server.

  • Your Dedicated Deployment URL for a private endpoint.

You can train your own PaliGemma 2 checkpoint on Roboflow and call it by its workspace/project/version identifier. See the Inference documentation for additional prompt formats and supported checkpoints.

Last updated

Was this helpful?