Integration and Automation

Automate Batch Processing with APIs, webhooks, and cloud storage integration.

Batch Processing is well-suited for task automation, especially when processes need to run on a recurring basis. This page covers how to integrate Roboflow Batch Processing with external systems using APIs and webhooks.

Overview

A typical Batch Processing pipeline consists of:

  1. Data Ingestion — Upload data to Data Staging (ephemeral storage for input and output data).

  2. Processing — Run a Workflow against ingested data, producing CSV/JSONL results. This is usually followed by an export stage that creates compressed archives for convenient extraction.

  3. Data Export — Download results from the output batch via download links.

API Reference

All CLI commands have equivalent REST API endpoints. Below are the key API interactions.

Video Ingestion

curl -X POST "https://api.roboflow.com/data-staging/v1/external/{workspace}/batches/{batch_id}/upload/video" \
  -G \
  --data-urlencode "api_key=YOUR_API_KEY" \
  --data-urlencode "fileName=your_video.mp4"

The response contains "signedURLDetails" with:

  • "uploadURL" — the URL to PUT the video

  • "extensionHeaders" — additional headers to include

Upload the video:

Include all headers from the "extensionHeaders" response field.

Image Ingestion

Single Image Upload

Best for batches up to 5,000 images. Cannot be combined with bulk upload for the same batch.

Bulk Upload

Recommended for batches exceeding 5,000 images. Bundle up to 500 images per *.tar archive.

  1. Request an upload URL:

  1. Pack images into a *.tar archive according to the size and file-count limits returned by the API.

  2. Upload the archive using the signed URL and extension headers from the response.

circle-exclamation
circle-info

When performing bulk ingestion, data is indexed in the background. There may be a short delay before all data is available.

Check Batch Status

Before starting a job, verify that all data has been ingested:

To check shard upload details (paginated):

Start a Job

circle-info

Job ID constraints: Lowercase letters, digits, hyphens, and underscores only. Maximum 20 characters.

Monitor Job Status

General job status:

List job stages:

List tasks for a stage (paginated):

Export Results

List parts of an output batch:

List download URLs for a part (paginated):

Download a file:

Data Staging Batch Types

  • Simple batches (type: simple-batch) — Created when ingesting data one item at a time. Best for up to 5,000–10,000 items.

  • Sharded batches (type: sharded-batch) — Created via bulk ingestion (images only). Designed for millions of data points with automatic sharding.

  • Multipart batches (type: multipart-batch) — Created internally by the system. A logical grouping of sub-batches managed as one entity.

Webhook Automation

Instead of polling for status, you can use webhooks to get notified when ingestion or processing completes.

Data Ingestion Webhooks

The CLI commands create-batch-of-images and create-batch-of-videos support:

  • --notifications-url <webhook_url> — Webhook endpoint for notifications.

  • --notification-category <value> — Filter notifications:

    • ingest-status (default) — Overall ingestion process status.

    • files-status — Individual file processing status.

Notifications are delivered via HTTP POST with an Authorization header containing your Roboflow Publishable Key.

Ingest Status Notification

File Status Notification

Job Completion Webhooks

Add --notifications-url when starting a job:

Job Completion Notification

Signed URL Ingestion

For advanced automation, you can ingest data via signed URLs instead of local files:

  • --data-source references-file — Process files referenced via signed URLs.

  • --references <path_or_url> — Path to a JSONL file containing file URLs, or a signed URL pointing to such a file.

Reference File Format (JSONL)

circle-info

Signed URL ingestion is available to Growth Plan and Enterprise customers.

Cloud Storage Authentication

AWS S3 and S3-Compatible Storage

Credentials are detected automatically from:

  1. Environment variables:

  1. AWS credential files (~/.aws/credentials, ~/.aws/config)

  2. IAM roles (EC2, ECS, Lambda)

Named profiles:

S3-compatible services (Cloudflare R2, MinIO, etc.):

Google Cloud Storage

Credentials are detected from:

  1. Service account key file (recommended for automation):

  1. User credentials from gcloud CLI (gcloud auth login)

  2. GCP metadata service (when running on Google Cloud Platform)

Azure Blob Storage

SAS Token (recommended):

Account Key:

Generate a SAS token via Azure CLI:

Custom Scripts

For advanced use cases, reference scripts for generating signed URL files:

Last updated

Was this helpful?