# Upload a Dataset

`Workspace.upload_dataset()` uploads a structured dataset (images + matching annotations) to a Roboflow project. The project will be created if it doesn't exist; otherwise the new images get added to the existing project.

```python
import roboflow

rf = roboflow.Roboflow(api_key="YOUR_API_KEY")
workspace = rf.workspace()

workspace.upload_dataset(
    "./dataset/",                  # path to a structured dataset directory
    "my-detector",                 # project id (created if it doesn't exist)
    num_workers=10,
    project_license="MIT",
    project_type="object-detection",
    batch_name=None,
    num_retries=0,
    is_prediction=False,           # True for model-generated annotations awaiting review
)
```

## Parameters

* `dataset_path` (str) - path to the dataset root.
* `project_name` (str) - destination project's id. Created if it doesn't exist.
* `num_workers` (int, default `10`) - concurrent uploads. We recommend not exceeding 25.
* `project_license` (str, default `"MIT"`) - license for a newly-created project. Set to `"Private"` for private projects (paid plans only).
* `project_type` (str, default `"object-detection"`) - type for a newly-created project. Ignored if the project already exists.
* `batch_name` (str, optional) - group these uploads under a named batch. Useful for tracking the source of a labeling round.
* `num_retries` (int, default `0`) - retry transient upload failures.
* `is_prediction` (bool, default `False`) - set to `True` to upload annotations as model predictions awaiting review rather than ground truth.

## Expected directory layout

For a COCO dataset:

```
my_dataset/
├── train/
│   ├── image1.jpg
│   └── _annotations.coco.json
├── valid/
│   ├── image2.jpg
│   └── _annotations.coco.json
└── test/
    ├── image3.jpg
    └── _annotations.coco.json
```

For VOC, drop matching `.xml` files alongside each image. For YOLO, drop matching `.txt` files plus a `data.yaml` describing the class list.

## Note on SHA-256 dedup (v1.3.6+)

As of `roboflow` 1.3.6, the SDK uploads original image bytes rather than re-encoding via Pillow. This brings parity with the web uploader and lets the Roboflow server deduplicate uploads by SHA-256 - re-uploading the same image (e.g. into a different batch) succeeds without consuming additional storage credits.

## REST and CLI equivalents

* REST: see [Upload an Image (REST)](/developer/rest-api/manage-images/upload-an-image.md) for the per-image endpoint that `upload_dataset` calls under the hood.
* CLI: see [Upload a Dataset (CLI)](/developer/command-line-interface/upload-a-dataset.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.roboflow.com/developer/python-sdk/upload-a-dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
