Moondream2

Use Moondream2 for open-vocabulary detection on a Dedicated Deployment or self-hosted Inference

Moondream2 is a compact vision-language model. In Roboflow Inference, it is exposed as an open-vocabulary object detector: pass a class name as the prompt and receive bounding boxes for matching regions.

Moondream2 is not available on the Serverless Hosted API. Run it on a Dedicated Deployment or self-hosted Inference.

Code sample

Install the Inference SDK and supervision:

pip install inference-sdk supervision opencv-python

Set api_url to your Dedicated Deployment URL or a local Inference server. Pass your Roboflow API Key via the API_KEY environment variable.

import os
import urllib.request

import cv2
import numpy as np
import supervision as sv
from inference_sdk import InferenceHTTPClient

IMAGE_URL = "https://media.roboflow.com/notebooks/examples/dog.jpeg"
IMAGE_PATH = "dog.jpeg"
OUTPUT_PATH = "dog_annotated.png"

urllib.request.urlretrieve(IMAGE_URL, IMAGE_PATH)
image = cv2.imread(IMAGE_PATH)

client = InferenceHTTPClient(
    api_url="https://your-deployment.roboflow.cloud",
    api_key=os.getenv("API_KEY"),
)
result = client.infer_lmm(
    IMAGE_PATH,
    model_id="moondream2",
    prompt="dog",
)

preds = result["predictions"]
xyxys = [
    [p["x"] - p["width"] / 2, p["y"] - p["height"] / 2,
     p["x"] + p["width"] / 2, p["y"] + p["height"] / 2]
    for p in preds
]
detections = sv.Detections(
    xyxy=np.array(xyxys, dtype=float),
    class_id=np.array([p.get("class_id", 0) for p in preds]),
    confidence=np.array([p.get("confidence", 1.0) for p in preds], dtype=float),
    data={"class_name": np.array([p["class"] for p in preds])},
)
labels = [f"{p['class']} {p.get('confidence', 1.0):.2f}" for p in preds]
annotated = sv.BoxAnnotator().annotate(scene=image.copy(), detections=detections)
annotated = sv.LabelAnnotator().annotate(scene=annotated, detections=detections, labels=labels)
cv2.imwrite(OUTPUT_PATH, annotated)

Set api_url to match your deployment target:

Last updated

Was this helpful?