Florence 2

Use Microsoft's Florence 2 multimodal model through our Serverless Hosted API

We support Microsoft's Florence 2, a multimodal vision-language model, via our Serverless Hosted API. Florence 2 supports captioning, object detection, segmentation, and OCR through task prompts (such as <CAPTION>, <OD>, <OCR>, <REFERRING_EXPRESSION_SEGMENTATION>).

Default aliases

Use the alias as the model_id in your request and the runtime resolves it to the corresponding pretrained weights.

Alias

florence-2-base

florence-2-large

Code sample

Florence 2 runs through the shared /infer/lmm endpoint. Call it directly with curl:

curl --location 'https://serverless.roboflow.com/infer/lmm' \
  --header 'Content-Type: application/json' \
  --data '{
    "api_key": "YOUR_API_KEY",
    "image": {"type": "url", "value": "https://media.roboflow.com/notebooks/examples/dog.jpeg"},
    "model_id": "florence-2-base",
    "prompt": "<CAPTION>"
  }'

The same call through the SDK. Install it and call the LMM inference endpoint with a task prompt. Pass your Roboflow API Key via the API_KEY environment variable.

pip install inference-sdk

Set api_url to match your deployment target:

  • https://serverless.roboflow.com for the Serverless Hosted API.

  • http://localhost:9001 for a local Inference server.

  • Your Dedicated Deployment URL for a private endpoint.

Swap <CAPTION> for any supported task prompt (for example <DETAILED_CAPTION>, <OD>, <OCR>, <OPEN_VOCABULARY_DETECTION>, <REFERRING_EXPRESSION_SEGMENTATION>) to switch between captioning, detection, OCR, and segmentation tasks.

For self-hosted deployment and the full list of task prompts, see the Inference documentation.

Last updated

Was this helpful?