For the complete documentation index, see llms.txt. This page is also available as Markdown.

SAM3

Use Meta's SAM3 model through our Serverless Hosted API

We support Meta's Segment Anything Model 3 inferencing via our Serverless Hosted API. We offer two different SAM3 endpoints:

Training a SAM3 model on Roboflow is available on paid plans that include usage-based billing.

Use this table to pick an endpoint:

You have
You want
Use

A text description (ex: "person")

Masks for every matching instance

/sam3/concept_segment

A box around one example object

Masks for every similar instance

/sam3/concept_segment

Text plus example boxes to include or exclude objects

Masks for every matching instance

/sam3/concept_segment

A click or a box on one specific object

A mask for that object only

/sam3/visual_segment

Pass your API key as the api_key query parameter on every request.

Concept Segmentation (PCS)

POST https://serverless.roboflow.com/sam3/concept_segment

Each entry in prompts describes one concept. The response contains one prompt_results entry per prompt, each holding every instance found. Requests accept at most 16 prompts.

Text prompts

import os
import requests

payload = {
    "image": {"type": "url", "value": "https://media.roboflow.com/inference/people-walking.jpg"},
    "prompts": [
        {"type": "text", "text": "person"},
        {"type": "text", "text": "backpack"},
    ],
    "output_prob_thresh": 0.5,
    "format": "polygon",  # or "rle"
}

response = requests.post(
    "https://serverless.roboflow.com/sam3/concept_segment",
    params={"api_key": os.getenv("API_KEY")},
    json=payload,
)
for prompt_result in response.json()["prompt_results"]:
    print(prompt_result["echo"], len(prompt_result["predictions"]), "instances")

Images can also be sent inline as {"type": "base64", "value": "<BASE64_IMAGE>"}.

Exemplar box prompts

Instead of text, you can prompt with an exemplar: a box around one example object. The model finds every instance that matches the example, not just the boxed object.

Boxes use absolute pixel coordinates. Two formats are accepted:

  • {"x": ..., "y": ..., "width": ..., "height": ...} where x, y is the top-left corner

  • {"x0": ..., "y0": ..., "x1": ..., "y1": ...} for explicit corners

box_labels is required when boxes is set and must have one entry per box: 1 marks a positive exemplar (find objects like this), 0 marks a negative exemplar (exclude objects like this).

Combined text and exemplar prompts

A single prompt can carry both text and exemplar boxes. This is useful for narrowing a text concept with visual examples, or excluding lookalikes with negative exemplars:

Here the model segments people matching the first (positive) exemplar while suppressing instances similar to the second (negative) exemplar.

Visual Segmentation (PVS)

POST https://serverless.roboflow.com/sam3/visual_segment

PVS segments one specific object indicated by clicks or a box. Use it for interactive, human-in-the-loop mask refinement; use PCS when you want every instance of a concept.

A prompt can contain points, a box, or both:

  • points are absolute pixel coordinates. "positive": true includes the clicked region, false excludes it. Add more points to refine the mask.

  • box uses center-anchored coordinates: x, y is the box center, unlike PCS boxes which are top-left anchored.

The response contains the single highest-confidence mask for the prompt. multimask_output controls how many internal mask proposals the model generates (three when true), but the best proposal is always selected for the response.

For an interactive demo using OpenCV, see this GitHub Gist, which was used in this video:

Endpoints

SAM3 PCS (promptable concept segmentation)

post

Concept Segmentation (Text Prompts)

Allows you to segment objects using text prompts.

Image Input: The image field accepts either:

  • {"type": "url", "value": "<IMAGE_URL>"} - A publicly accessible image URL

  • {"type": "base64", "value": "<BASE64_DATA>"} - Base64 encoded image data

Prompts: Each prompt in the prompts array should have type: "text" and a text field with the object description.

Query parameters
api_keystringRequired

Your Roboflow API Key. Get one at https://app.roboflow.com/settings/api

Body
formatstringOptional

One of 'polygon', 'rle'

Default: polygon
image_idstringOptional

Optional ID for caching embeddings.

output_prob_threshnumberOptional

Score threshold for outputs.

Default: 0.5
model_idstringOptional

The model ID of SAM3. Use 'sam3/sam3_final' to target the generic base model.

Default: sam3/sam3_final
nms_iou_thresholdnumberOptional

IoU threshold for cross-prompt NMS. If not set, NMS is disabled. Must be in [0.0, 1.0] when set.

Responses
200

Successful Response

application/json
timenumberRequired

The time in seconds it took to produce the segmentation including preprocessing

post
/sam3/concept_segment

SAM3 PVS (promptable visual segmentation)

post

Interactive Segmentation (SAM 2 Style)

SAM 3 also supports interactive segmentation using points and boxes.

Image Input: The image field accepts either:

  • {"type": "url", "value": "<IMAGE_URL>"} - A publicly accessible image URL

  • {"type": "base64", "value": "<BASE64_DATA>"} - Base64 encoded image data

Note: NumPy arrays are NOT supported on the serverless API. Use URL or base64 encoding only.

Prompts: Support point-based prompts with positive/negative clicks for interactive segmentation.

Query parameters
api_keystringRequired

Your Roboflow API Key. Get one at https://app.roboflow.com/settings/api

Body

SAM2 visual segmentation request.

image_idstringOptional

The ID of the image to be segmented used to retrieve cached embeddings. If an embedding is cached, it will be used instead of generating a new embedding. If no embedding is cached, a new embedding will be generated and cached.

Example: image_id
formatstringOptional

The format of the response. Must be one of 'json', 'rle', or 'binary'. If binary, masks are returned as binary numpy arrays. If json, masks are converted to polygons. If rle, masks are converted to RLE format.

Default: jsonExample: json
sam2_version_idstringOptional

The version ID of SAM to be used for this request. Must be one of hiera_tiny, hiera_small, hiera_large, hiera_b_plus

Default: hiera_largeExample: hiera_large
multimask_outputbooleanOptional

If true, the model will return three masks. For ambiguous input prompts (such as a single click), this will often produce better masks than a single prediction.

Default: trueExample: true
save_logits_to_cachebooleanOptional

If True, saves the low-resolution logits to the cache for potential future use.

Default: false
load_logits_from_cachebooleanOptional

If True, attempts to load previously cached low-resolution logits for the given image and prompt set.

Default: false
Responses
200

Successful Response

application/json
timenumberRequired

The time in seconds it took to produce the segmentation including preprocessing

post
/sam3/visual_segment

Last updated

Was this helpful?