Links

Uploading Private Bucket Data

Learn how to upload data from a private GCS bucket into Roboflow via the upload API and signed URLs.
If your raw image data lives in a private bucket on a cloud provider then you can still use the Robflow REST API to upload images to Roboflow. However, you will first need to create "signed URLs" for each image so Roboflow can read from your private bucket.
We show how this can be done for a private GCP bucket in the Python script pasted below. Read the comments in the script to follow along to setup the environment to run the upload script correctly.

Requirements

The Python3 script will use a GCP service account key to sign GCS bucket URLs of each image in a bucket.
Before you run the script, you will need to setup a few things.
The gsutil command line program to work with GCS buckets in your GCP project. Instructions.
Install a couple of python3 packages via the command line code
pip3 install pyopenssl
pip3 install requests
Obtain a GCP service account key in json format; note that the service account should have GCS object list and read permissions.
Finally, you need the Roboflow API key. Export the key into your terminal environment like so
export ROBOFLOW_API_KEY=<YOUR API KEY>

Running the Python Script

This script needs 3 command line arguments to run:
# $1 --> The path to the service account private key
# $2 --> The name of the gcs bucket e.g. gs://foo-bar-bucket/ containing the images
# $3 --> The Roboflow dataset ID to upload the image into
Here is an example invocation
# Example invocation
# python3 gcs-signed-urls-for-upload.py ./gcp-sa-key-file.json gs://test-signing-urls/ hard-hat-sample-qulad

The Script

1
#Roboflow Inc.
2
# What is this?
3
# Python script to upload images in a private GCS bucket to Roboflow.
4
# The script uses a GCP service account private key to create
5
# signed URLs; these are time-limited URLs that Roboflow can use
6
# to ingest image data into the Roboflow workspaces.
7
#
8
# Requirements
9
# This script assumes gsutil is configured to the corresponding project;
10
# documentation here:
11
# https://cloud.google.com/storage/docs/gsutil_install
12
# Openssl is required to sign the URLs, install with this command:
13
# pip3 install pyopenssl
14
# Install the requests package
15
# pip3 install requests
16
# Use this link to learn and obtain a GCP service account key
17
# https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys )
18
19
# Roboflow API key
20
# Export your ROBOFLOW_API_KEY in the terminal you are running this script; for example
21
# export ROBOFLOW_API_KEY=<private API key>
22
# To obtain your API key, follow the instructions here: https://docs.roboflow.com/rest-api#obtaining-your-api-key
23
24
# Using the script
25
# This script needs 3 command line arguments to run
26
# $1 --> The path to the service account private key
27
# $2 --> The name of the gcs bucket e.g. gs://foo-bar-bucket/ containing the images
28
# $3 --> The Roboflow dataset ID to upload the image into
29
30
# Example invocation
31
# python3 gcs-signed-urls-for-upload.py ./gcp-sa-key-file.json gs://test-signing-urls/ hard-hat-sample-qulad
32
33
import sys
34
import subprocess
35
import requests
36
import urllib.parse
37
import os
38
39
SIGNED_URL_VALIDITY = "10m"
40
if "ROBOFLOW_API_KEY" not in os.environ:
41
print("Please export the ROBOFLOW_API_KEY into your environment.")
42
key_path = sys.argv[1]
43
bucket_name = sys.argv[2]
44
upload_endpoint = "https://api.roboflow.com/dataset/"+sys.argv[3]+"/upload"
45
# A list of objects in the bucket; note: if there are too many objects in the bucket (5000+ for example), consider getting this
46
# list in a different way e.g. using the boto3 library and pagination, etc.
47
bucket_objects = subprocess.check_output(['gsutil', 'ls', sys.argv[2]], universal_newlines = True).split("\n")
48
49
for each_object in bucket_objects:
50
# Filter out any non-image objects in the bucket
51
if each_object.endswith( ('.jpeg', '.jpg','.png','.PNG','.JPEG','.JPG') ):
52
# Obtain signed URL for the object
53
raw_data = subprocess.check_output(['gsutil', 'signurl', '-d', SIGNED_URL_VALIDITY, key_path, each_object], universal_newlines = True)
54
# Construct the URL correctly
55
signed_img_url = "https://"+raw_data.split("https://")[-1]
56
img_name = each_object.split("/")[-1]
57
# Create the upload URL to post to the Roboflow Upload endpoint
58
upload_url = "".join([
59
upload_endpoint,
60
"?api_key="+os.environ.get("ROBOFLOW_API_KEY",),
61
"&name="+img_name,
62
"&split=train",
63
"&image=" + urllib.parse.quote_plus(signed_img_url)
64
])
65
# POST to the Roboflow Upload API
66
r = requests.post(upload_url)
67
# Post result
68
print(r.text)
69

AWS and Azure Storage

You can adapt this script for signing and uploading Amazon AWS S3 buckets or Azure blobs as well.
Last modified 1mo ago