SmolVLM2

Use HuggingFace's SmolVLM2 vision-language model on a Dedicated Deployment or self-hosted Inference

SmolVLM2 is a compact vision-language model from HuggingFace. It accepts an image and a text prompt and returns a text response.

SmolVLM2 is not available on the Serverless Hosted API. Run it on a Dedicated Deployment or self-hosted Inference.

Code sample

Install the Inference SDK:

pip install inference-sdk

Set api_url to your Dedicated Deployment URL or a local Inference server. Pass your Roboflow API Key via the API_KEY environment variable.

import os
import urllib.request
from inference_sdk import InferenceHTTPClient

image_url = "https://media.roboflow.com/notebooks/examples/dog.jpeg"
image_path = "dog.jpeg"
urllib.request.urlretrieve(image_url, image_path)

client = InferenceHTTPClient(
    api_url="https://your-deployment.roboflow.cloud",
    api_key=os.getenv("API_KEY"),
)
result = client.infer_lmm(
    image_path,
    model_id="smolvlm2",
    prompt="Describe this image briefly.",
    max_new_tokens=64,
)
print(result["response"])

The code above prints the model response to the terminal:

Set api_url to match your deployment target:

Last updated

Was this helpful?