> For the complete documentation index, see [llms.txt](https://docs.roboflow.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.roboflow.com/roboflow/roboflow-hi/deploy/supported-models/paligemma2.md).

# PaliGemma 2

PaliGemma 2 Google का vision-language model है। यह एक image और एक text prompt स्वीकार करता है और एक text response लौटाता है। हम PaliGemma 2 को हमारे [Serverless Hosted API](/roboflow/roboflow-hi/deploy/serverless-hosted-api-v2.md), [Dedicated Deployments](/roboflow/roboflow-hi/deploy/dedicated-deployments.md), और [self-hosted Inference](https://inference.roboflow.com/).

## कोड नमूना

{% stepper %}
{% step %}

### अपनी API Key प्राप्त करें

एक Roboflow खाता बनाएं, अपनी key यहाँ पर ढूँढें [Roboflow API settings page](https://app.roboflow.com/settings/api) और इसे अपने shell में उपलब्ध कराएँ:

```bash
export ROBOFLOW_API_KEY="your-key-here"
```

{% endstep %}

{% step %}

### निर्भरताएँ इंस्टॉल करें

इंस्टॉल करें [Inference SDK](https://inference.roboflow.com/):

```bash
pip install inference-sdk
```

{% endstep %}

{% step %}

### मॉडल चलाएँ

यह sample pretrained को कॉल करता है `paligemma2-3b-pt-224` checkpoint को एक caption prompt के साथ.

```python
import os
import cv2
import numpy as np
import requests
from inference_sdk import InferenceHTTPClient

content = requests.get("https://media.roboflow.com/quickstart/dog.jpeg").content
image = cv2.imdecode(np.frombuffer(content, np.uint8), cv2.IMREAD_COLOR)

client = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key=os.environ["ROBOFLOW_API_KEY"],
)
result = client.infer_lmm(
    image,
    model_id="paligemma2-3b-pt-224",
    prompt="caption en",
    max_new_tokens=64,
)
print(result["response"])
```

{% endstep %}
{% endstepper %}

ऊपर दिया गया code terminal में model का response प्रिंट करता है:

```
यहाँ एक कुत्ता एक आदमी के कंधे पर दिखाई देता है
```

<figure><img src="/files/16518b7f5fe9834d94f3db40fda8f3ba7441a0e1" alt=""><figcaption></figcaption></figure>

## Inference speed

Latency मापी गई [Roboflow Inference](https://inference.roboflow.com/) 1x NVIDIA L4 पर, batch size 1 के साथ, fixed prompt से greedy decoding का उपयोग करके बिल्कुल 128 tokens generate करते हुए। Latency output length के साथ scale होती है, इसलिए अन्य lengths का अनुमान लगाने के लिए tokens/sec का उपयोग करें।

<table data-search="false"><thead><tr><th>उपनाम</th><th>विलंबता, 128 tokens (ms)</th><th>टोकन/सेकंड</th></tr></thead><tbody><tr><td><code>paligemma2-3b-pt-224</code></td><td>3986</td><td>32</td></tr></tbody></table>

{% hint style="info" %}
सेट करें `api_url` को अपने deployment target से मिलाएँ:

* `https://serverless.roboflow.com` Serverless Hosted API के लिए।
* `http://localhost:9001` एक local [Inference](https://inference.roboflow.com/) server.
* आपका [Dedicated Deployment](/roboflow/roboflow-hi/deploy/dedicated-deployments.md) एक private endpoint के लिए URL.
  {% endhint %}

आप Roboflow पर अपना खुद का PaliGemma 2 checkpoint train कर सकते हैं और उसे उसके per-model `{workspace}/{model-slug}` ID (देखें [Versions, Trainings, and Models](/roboflow/roboflow-hi/train/versions-trainings-and-models.md)). देखें the [Inference documentation](https://inference.roboflow.com/) अतिरिक्त prompt formats और समर्थित checkpoints के लिए।