> For the complete documentation index, see [llms.txt](https://docs.roboflow.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.roboflow.com/deploy/supported-models/smolvlm2.md).

# SmolVLM2

SmolVLM2 is a compact vision-language model from HuggingFace. It accepts an image and a text prompt and returns a text response.

{% hint style="info" %}
SmolVLM2 is not available on the Serverless Hosted API. Run it on a [Dedicated Deployment](/deploy/dedicated-deployments.md) or [self-hosted Inference](https://inference.roboflow.com/).
{% endhint %}

## Code sample

{% stepper %}
{% step %}

### Get your API Key

Create a Roboflow account, find your key on the [Roboflow API settings page](https://app.roboflow.com/settings/api) and make it available to your shell:

```bash
export ROBOFLOW_API_KEY="your-key-here"
```

{% endstep %}

{% step %}

### Install the dependencies

Install the [Inference SDK](https://inference.roboflow.com/):

```bash
pip install inference-sdk
```

{% endstep %}

{% step %}

### Run the model

Set `api_url` to your Dedicated Deployment URL or a local Inference server.

```python
import os
import cv2
import numpy as np
import requests
from inference_sdk import InferenceHTTPClient

content = requests.get("https://media.roboflow.com/quickstart/dog.jpeg").content
image = cv2.imdecode(np.frombuffer(content, np.uint8), cv2.IMREAD_COLOR)

client = InferenceHTTPClient(
    api_url="https://your-deployment.roboflow.cloud",
    api_key=os.environ["ROBOFLOW_API_KEY"],
)
result = client.infer_lmm(
    image,
    model_id="smolvlm2",
    prompt="Describe this image briefly.",
    max_new_tokens=64,
)
print(result["response"])
```

{% endstep %}
{% endstepper %}

The code above prints the model response to the terminal:

```
A man is carrying a dog on his shoulders.
```

<figure><img src="/files/DqX3xaHHqFQhTIy2Djfp" alt=""><figcaption></figcaption></figure>

## Inference speed

Latency measured with [Roboflow Inference](https://inference.roboflow.com/) on 1x NVIDIA L4, batch size 1, generating exactly 128 tokens with greedy decoding from a fixed prompt. Latency scales with output length, so use tokens/sec to estimate other lengths.

<table data-search="false"><thead><tr><th>Alias</th><th>Latency, 128 tokens (ms)</th><th>Tokens/sec</th></tr></thead><tbody><tr><td><code>smolvlm2</code></td><td>3113</td><td>41</td></tr></tbody></table>

{% hint style="info" %}
Set `api_url` to match your deployment target:

* `http://localhost:9001` for a local [Inference](https://inference.roboflow.com/) server.
* Your [Dedicated Deployment](/deploy/dedicated-deployments.md) URL for a private endpoint.
  {% endhint %}