For the complete documentation index, see llms.txt. This page is also available as Markdown.

Two-Stage and CLIP Inference

Chain two models, run OCR on detections, and compare images with CLIP.

Workspace exposes three convenience methods that combine multiple models or modalities in one call:

  • two_stage() - run an object-detection model, then run a second model on each crop.

  • two_stage_ocr() - run an object-detection model, then OCR each crop.

  • clip_compare() - embed every image in a directory with CLIP and rank them against a target image.

These helpers are useful for prototypes (license plates, badge readers, similar-image search) before you build the equivalent in a Workflow.

Two-stage detection + classification

The first stage detects regions of interest. The second stage classifies (or runs another detector on) each cropped region.

import roboflow

rf = roboflow.Roboflow(api_key="YOUR_API_KEY")
ws = rf.workspace()

results = ws.two_stage(
    image="photo.jpg",
    first_stage_model_name="cars-or-trucks",
    first_stage_model_version=2,
    second_stage_model_name="vehicle-make",
    second_stage_model_version=4,
)

for r in results:
    print(r)

Each entry in results carries the parent detection (bbox, class, confidence) and the second-stage classification.

Two-stage detection + OCR

The second stage runs Tesseract-style OCR on each crop and returns the recognized text alongside the bounding box.

CLIP image comparison

clip_compare() embeds every image in a directory and the supplied target, returns the cosine-similarity score for each pair, sorted by similarity.

Parameters

  • dir (str) - directory to scan for candidate images.

  • image_ext (str, default ".png") - file extension to match. Note this includes the leading dot.

  • target_image (str) - path to the image you're searching for similarity against.

When to graduate to Workflows

These helpers run on the hosted inference endpoint and serially make one HTTP call per stage / per image. Once you need batching, conditional branching, custom blocks, or a long-running pipeline, build the equivalent as a Workflow - the runtime is purpose-built for chained inference and supports streaming inputs.

Last updated

Was this helpful?