SAM

SAM is a general computer vision model that can locate objects within an image and accept prompting to guide its predictions. It can operate on a wide range of data without the need for additional training or fine tuning. It operates in two stages. First, an embedding is calculated for the image using a large encoder model. Second, the embedding and any prompts are sent through a smaller decoder model to generate instance mask predictions.

There are two routes on the Roboflow Inference Server that utilize SAM:

  • embed_image: Used to retrieve only the SAM embedding for an image

  • segment_image: Used to run both the SAM encoder and decoder to get instance mask predictions for an image

Embed Image

To get a SAM embedding for an image:

#Define request payload
infer_payload = {
    "image": {
        "type": "url",
        "value": "https://i.imgur.com/Q6lDy8B.jpg",
    },
    "image_id": "example_image_id",
}

# Define inference server url (localhost:9001, infer.roboflow.com, etc.)
base_url = "https://infer.roboflow.com"

# Define your Roboflow API Key
api_key = <YOUR API KEY HERE>

res = requests.post(
    f"{base_url}/sam/embed_image?api_key={api_key}",
    json=infer_clip_payload,
)

embeddings = request.json()['embeddings']

Notice the image_id key within the request payload. Providing an image ID enables the server to cache the expensive SAM embedding which can speed up future calls to the inference server that make use of the embedding.

Segment Image

To get segmentations for an image:

#Define request payload
infer_payload = {
    "image": {
        "type": "url",
        "value": "https://i.imgur.com/Q6lDy8B.jpg",
    },
    "point_coords": [[380, 350]],
    "point_labels": [1],
    "image_id": "example_image_id",
}

res = requests.post(
    f"{base_url}/sam/embed_image?api_key={api_key}",
    json=infer_clip_payload,
)

masks = request.json()['masks']

The point_coords value defines a list of points used to prompt the model. The point_labels define whether a point is a positive point (1) or a negative point (0). Positive points prompt the model to include a certain area of the image in the predicted masks, while negative points do the opposite. Here's an example with multiple points:

#Define request payload
infer_payload = {
    "point_coords": [[380, 350], [80, 300], [700, 250]],
    "point_labels": [1, 0, 0],
    "image_id": "example_image_id",
}

res = requests.post(
    f"{base_url}/sam/embed_image?api_key={api_key}",
    json=infer_clip_payload,
)

masks = request.json()['masks']

Notice, we took advantage of the image_id key and relied on the server using the cached value for the image embedding during this request.

There are many other options for how masks and prompts can be sent to and received from the inference server. See the inference API schema or go to the /docs endpoint of a running inference server for more details.

Last updated