GLM-OCR

Use GLM-OCR for image OCR through our Serverless Hosted API

GLM-OCR is an OCR model based on the GLM vision-language model family. It transcribes text from an image and is well-suited for documents, signs, and labels with mixed layouts. We support GLM-OCR through our Serverless Hosted API, Dedicated Deployments, and self-hosted Inference.

Code sample

GLM-OCR runs through the shared /infer/lmm endpoint. Call it directly with curl:

curl --location 'https://serverless.roboflow.com/infer/lmm' \
  --header 'Content-Type: application/json' \
  --data '{
    "api_key": "YOUR_API_KEY",
    "image": {"type": "url", "value": "https://media.roboflow.com/inference/license_plate_1.jpg"},
    "model_id": "glm-ocr",
    "prompt": "OCR",
    "max_new_tokens": 128
  }'

The same call through the SDK. Install it:

pip install inference-sdk

Pass your Roboflow API Key via the API_KEY environment variable.

import os
import urllib.request
from inference_sdk import InferenceHTTPClient

image_url = "https://media.roboflow.com/inference/license_plate_1.jpg"
image_path = "license_plate_1.jpg"
urllib.request.urlretrieve(image_url, image_path)

client = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key=os.getenv("API_KEY"),
)
result = client.infer_lmm(
    image_path,
    model_id="glm-ocr",
    prompt="OCR",
    max_new_tokens=128,
)
print(result["response"])

The code above prints the recognized text to the terminal:

Set api_url to match your deployment target:

  • https://serverless.roboflow.com for the Serverless Hosted API.

  • http://localhost:9001 for a local Inference server.

  • Your Dedicated Deployment URL for a private endpoint.

Last updated

Was this helpful?