Integration and Automation

Automate Batch Processing with APIs, webhooks, and cloud storage integration.

Batch Processing is well-suited for task automation, especially when processes need to run on a recurring basis. This page covers how to integrate Roboflow Batch Processing with external systems using APIs and webhooks.

Overview

A typical Batch Processing pipeline consists of:

Data Ingestion — Upload data to Data Staging (ephemeral storage for input and output data).
Processing — Run a Workflow against ingested data, producing CSV/JSONL results. This is usually followed by an export stage that creates compressed archives for convenient extraction.
Data Export — Download results from the output batch via download links.

API Reference

All CLI commands have equivalent REST API endpoints. Below are the key API interactions.

Video Ingestion

curl -X POST "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/upload/video" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "fileName=your_video.mp4"

The response contains "signedURLDetails" with:

"uploadURL" — the URL to PUT the video
"extensionHeaders" — additional headers to include

Upload the video:

curl -X PUT <url-from-the-response> \
  -H "Name: value" \
  --upload-file <path-to-your-video>

Include all headers from the "extensionHeaders" response field.

Image Ingestion

Single Image Upload

Best for batches up to 5,000 images. Cannot be combined with bulk upload for the same batch.

curl -X POST "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/upload/image" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "fileName=your_image.jpg" \
  -F "your_image.jpg=@/path/to/your/image.jpg"

Bulk Upload

Recommended for batches exceeding 5,000 images. Bundle up to 500 images per *.tar archive.

Request an upload URL:

curl -X POST "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/bulk-upload/image-files" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY"

Pack images into a *.tar archive according to the size and file-count limits returned by the API.
Upload the archive using the signed URL and extension headers from the response.

Bulk-upload batches cannot be mixed with single-image uploads for the same batch.

When performing bulk ingestion, data is indexed in the background. There may be a short delay before all data is available.

Check Batch Status

Before starting a job, verify that all data has been ingested:

curl -X GET "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/count" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY"

To check shard upload details (paginated):

curl -X GET "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/shards" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "nextPageToken=OptionalNextPageToken"

Start a Job

curl -X POST "https://api.roboflow.com/batch-processing/v1/external/{workspace}/jobs/{job_id}?api_key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "simple-image-processing-v1",
    "jobInput": {
        "type": "staging-batch-input-v1",
        "batchId": "{batch_id}"
    },
    "computeConfiguration": {
        "type": "compute-configuration-v2",
        "machineType": "cpu",
        "workersPerMachine": 4
    },
    "processingTimeoutSeconds": 3600,
    "processingSpecification": {
        "type": "workflows-processing-specification-v1",
        "workspace": "{workspace}",
        "workflowId": "{workflow_id}",
        "aggregationFormat": "jsonl"
    }
}'

Job ID constraints: Lowercase letters, digits, hyphens, and underscores only. Maximum 20 characters.

Monitor Job Status

General job status:

curl -X GET "https://api.roboflow.com/batch-processing/v1/external/{workspace}/jobs/{job_id}" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY"

List job stages:

curl -X GET "https://api.roboflow.com/batch-processing/v1/external/{workspace}/jobs/{job_id}/stages" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY"

List tasks for a stage (paginated):

curl -X GET "https://api.roboflow.com/batch-processing/v1/external/{workspace}/jobs/{job_id}/stages/{stage_id}/tasks" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "nextPageToken={next_page_token}"

Export Results

List parts of an output batch:

curl -X GET "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/parts" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY"

List download URLs for a part (paginated):

curl -X GET "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/list" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "nextPageToken=YOUR_NEXT_PAGE_TOKEN" \
  --data-urlencode "partName=YOUR_PART_NAME"

Download a file:

curl <download-url> -o <download-file-location>

Data Staging Batch Types

Simple batches (type: simple-batch) — Created when ingesting data one item at a time. Best for up to 5,000–10,000 items.
Sharded batches (type: sharded-batch) — Created via bulk ingestion (images only). Designed for millions of data points with automatic sharding.
Multipart batches (type: multipart-batch) — Created internally by the system. A logical grouping of sub-batches managed as one entity.

Webhook Automation

Instead of polling for status, you can use webhooks to get notified when ingestion or processing completes.

Data Ingestion Webhooks

The CLI commands create-batch-of-images and create-batch-of-videos support:

--notifications-url <webhook_url> — Webhook endpoint for notifications.
--notification-category <value> — Filter notifications:
- ingest-status (default) — Overall ingestion process status.
- files-status — Individual file processing status.

Notifications are delivered via HTTP POST with an Authorization header containing your Roboflow Publishable Key.

Ingest Status Notification

{
    "type": "roboflow-data-staging-notification-v1",
    "event_id": "8c20f970-fe10-41e1-9ef2-e057c63c07ff",
    "ingest_id": "8cd48813430f2be70b492db67e07cc86",
    "batch_id": "test-batch-117",
    "shard_id": null,
    "notification": {
        "type": "ingest-status-notification-v1",
        "success": false,
        "error_details": {
            "type": "unsafe-url-detected",
            "reason": "Untrusted domain found: https://example.com/image.png"
        }
    },
    "delivery_attempt": 1
}

File Status Notification

{
    "type": "roboflow-data-staging-notification-v1",
    "event_id": "8f42708b-aeb7-4b73-9d83-cf18518b6d81",
    "ingest_id": "d5cb69aa-b2d1-4202-a1c1-0231f180bda9",
    "batch_id": "prod-batch-1",
    "shard_id": "0d40fa12-349e-439f-83f8-42b9b7987b33",
    "notification": {
        "type": "ingest-files-status-notification-v1",
        "success": true,
        "ingested_files": [
            "000000494869.jpg",
            "000000186042.jpg"
        ],
        "failed_files": [
            {
                "type": "file-size-limit-exceeded",
                "file_name": "big_image.png",
                "reason": "Max size of single image is 20971520B."
            }
        ],
        "content_truncated": false
    },
    "delivery_attempt": 1
}

Job Completion Webhooks

Add --notifications-url when starting a job:

inference rf-cloud batch-processing process-images-with-workflow \
  --workflow-id <workflow-id> \
  --batch-id <batch-id> \
  --notifications-url <webhook_url>

Job Completion Notification

{
  "type": "roboflow-batch-job-notification-v1",
  "event_id": "8f42708b-aeb7-4b73-9d83-cf18518b6d81",
  "job_id": "<your-batch-job-id>",
  "job_state": "success | fail",
  "delivery_attempt": 1
}

Signed URL Ingestion

For advanced automation, you can ingest data via signed URLs instead of local files:

--data-source references-file — Process files referenced via signed URLs.
--references <path_or_url> — Path to a JSONL file containing file URLs, or a signed URL pointing to such a file.

Reference File Format (JSONL)

{"name": "<unique-file-name-1>", "url": "https://<signed-url>"}
{"name": "<unique-file-name-2>", "url": "https://<signed-url>"}

Signed URL ingestion is available to Growth Plan and Enterprise customers.

Cloud Storage Authentication

AWS S3 and S3-Compatible Storage

Credentials are detected automatically from:

Environment variables:

export AWS_ACCESS_KEY_ID=your-access-key-id
export AWS_SECRET_ACCESS_KEY=your-secret-access-key
export AWS_SESSION_TOKEN=your-session-token  # Optional

AWS credential files (~/.aws/credentials, ~/.aws/config)
IAM roles (EC2, ECS, Lambda)

Named profiles:

export AWS_PROFILE=production

S3-compatible services (Cloudflare R2, MinIO, etc.):

export AWS_ENDPOINT_URL=https://account-id.r2.cloudflarestorage.com
export AWS_REGION=auto  # R2 requires region='auto'
export AWS_ACCESS_KEY_ID=your-r2-access-key
export AWS_SECRET_ACCESS_KEY=your-r2-secret-key

Google Cloud Storage

Credentials are detected from:

Service account key file (recommended for automation):

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

User credentials from gcloud CLI (gcloud auth login)
GCP metadata service (when running on Google Cloud Platform)

Azure Blob Storage

SAS Token (recommended):

export AZURE_STORAGE_ACCOUNT_NAME=mystorageaccount
export AZURE_STORAGE_SAS_TOKEN="sv=2021-06-08&ss=b&srt=sco&sp=rl&se=2024-12-31"

Account Key:

export AZURE_STORAGE_ACCOUNT_NAME=mystorageaccount
export AZURE_STORAGE_ACCOUNT_KEY=your-account-key

Generate a SAS token via Azure CLI:

az storage container generate-sas \
  --account-name mystorageaccount \
  --name my-container \
  --permissions rl \
  --expiry 2024-12-31T23:59:59Z

Custom Scripts

For advanced use cases, reference scripts for generating signed URL files:

AWS S3: generateS3SignedUrls.sh
Google Cloud Storage: generateGCSSignedUrls.sh
Azure Blob Storage: generateAzureSasUrls.sh

PreviousCLI Usage NextTroubleshooting

Last updated 11 hours ago

Was this helpful?

hashtagOverview

hashtagAPI Reference

hashtagVideo Ingestion

hashtagImage Ingestion

hashtagSingle Image Upload

hashtagBulk Upload

hashtagCheck Batch Status

hashtagStart a Job

hashtagMonitor Job Status

hashtagExport Results

hashtagData Staging Batch Types

hashtagWebhook Automation

hashtagData Ingestion Webhooks

hashtagIngest Status Notification

hashtagFile Status Notification

hashtagJob Completion Webhooks

hashtagJob Completion Notification

hashtagSigned URL Ingestion

hashtagReference File Format (JSONL)

hashtagCloud Storage Authentication

hashtagAWS S3 and S3-Compatible Storage

hashtagGoogle Cloud Storage

hashtagAzure Blob Storage

hashtagCustom Scripts

Overview

API Reference

Video Ingestion

Image Ingestion

Single Image Upload

Bulk Upload

Check Batch Status

Start a Job

Monitor Job Status

Export Results

Data Staging Batch Types

Webhook Automation

Data Ingestion Webhooks

Ingest Status Notification

File Status Notification

Job Completion Webhooks

Job Completion Notification

Signed URL Ingestion

Reference File Format (JSONL)

Cloud Storage Authentication

AWS S3 and S3-Compatible Storage

Google Cloud Storage

Azure Blob Storage

Custom Scripts