> For the complete documentation index, see [llms.txt](https://docs.roboflow.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.roboflow.com/deploy/supported-models/paligemma2.md).

# PaliGemma 2

PaliGemma 2 is Google's vision-language model. It accepts an image and a text prompt and returns a text response. We support PaliGemma 2 through our [Serverless Hosted API](/deploy/serverless-hosted-api-v2.md), [Dedicated Deployments](/deploy/dedicated-deployments.md), and [self-hosted Inference](https://inference.roboflow.com/).

## Code sample

{% stepper %}
{% step %}

### Get your API Key

Create a Roboflow account, find your key on the [Roboflow API settings page](https://app.roboflow.com/settings/api) and make it available to your shell:

```bash
export ROBOFLOW_API_KEY="your-key-here"
```

{% endstep %}

{% step %}

### Install the dependencies

Install the [Inference SDK](https://inference.roboflow.com/):

```bash
pip install inference-sdk
```

{% endstep %}

{% step %}

### Run the model

The sample calls the pretrained `paligemma2-3b-pt-224` checkpoint with a caption prompt.

```python
import os
import cv2
import numpy as np
import requests
from inference_sdk import InferenceHTTPClient

content = requests.get("https://media.roboflow.com/quickstart/dog.jpeg").content
image = cv2.imdecode(np.frombuffer(content, np.uint8), cv2.IMREAD_COLOR)

client = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key=os.environ["ROBOFLOW_API_KEY"],
)
result = client.infer_lmm(
    image,
    model_id="paligemma2-3b-pt-224",
    prompt="caption en",
    max_new_tokens=64,
)
print(result["response"])
```

{% endstep %}
{% endstepper %}

The code above prints the model response to the terminal:

```
a dog is seen here on the shoulder of a man
```

<figure><img src="/files/13G5ggOzjuxU9urIznCj" alt=""><figcaption></figcaption></figure>

## Inference speed

Latency measured with [Roboflow Inference](https://inference.roboflow.com/) on 1x NVIDIA L4, batch size 1, generating exactly 128 tokens with greedy decoding from a fixed prompt. Latency scales with output length, so use tokens/sec to estimate other lengths.

<table data-search="false"><thead><tr><th>Alias</th><th>Latency, 128 tokens (ms)</th><th>Tokens/sec</th></tr></thead><tbody><tr><td><code>paligemma2-3b-pt-224</code></td><td>3986</td><td>32</td></tr></tbody></table>

{% hint style="info" %}
Set `api_url` to match your deployment target:

* `https://serverless.roboflow.com` for the Serverless Hosted API.
* `http://localhost:9001` for a local [Inference](https://inference.roboflow.com/) server.
* Your [Dedicated Deployment](/deploy/dedicated-deployments.md) URL for a private endpoint.
  {% endhint %}

You can train your own PaliGemma 2 checkpoint on Roboflow and call it by its per-model `{workspace}/{model-slug}` ID (see [Versions, Trainings, and Models](/train/versions-trainings-and-models.md)). See the [Inference documentation](https://inference.roboflow.com/) for additional prompt formats and supported checkpoints.