Object Detection
Adding object detection datasets.
If you are trying to identify objects in images with bounding boxes, you will need an object detection dataset. Object detection datasets require images (or videos) and annotations.
- If you do have annotations, you can upload them by dragging and dropping them into Roboflow. Roboflow can handle many annotation formats.
Select your folder(s) of images/videos and annotations.

If you are uploading images or videos, the best way to do this is to click-and-drag the whole folder of images/videos (and annotations, if you have them) directly into Roboflow. Roboflow supports many annotation formats. In many cases, dragging the files and dropping them into the box (as shown in the .gif below) should work.
If your annotation format is not included in the list below or if there are errors, then contact us! We will help add your annotation format into the list of supported formats.

You will know that your upload is successful when you see the progress bar move all the way to the right and you see images enclosed in bounding boxes. If you did not initially include annotations, you can add them in Roboflow now or later.
Roboflow supports a wide array of annotation formats. You can see the full list of supported formats here. Some population formats include:
PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning) is a Network of Excellence funded by the European Union. From 2005 - 2012, PASCAL ran the Visual Object Challenge (VOC). PASCAL annually released object detection datasets and reported benchmarks. (An aggregated PASCAL VOC dataset is available here.)
<annotation>
<folder>train</folder>
<filename>01.jpg</filename>
<path>/roboflow/data/train/01.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>224</width>
<height>224</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>21</name>
<pose>Frontal</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>82</xmin>
<xmax>172</xmax>
<ymin>88</ymin>
<ymax>146</ymax>
</bndbox>
</object>
</annotation>
The Common Objects in Context (COCO) dataset originated in a 2014 paper Microsoft published. The dataset "contains photos of 91 objects types that would be easily recognizable by a 4 year old." There are a total of 2.5 million labeled instances across 328,000 images. Given the sheer quantity and quality of data open sourced, COCO has become a standard dataset for testing and proving state of the art performance in new models. (The dataset is available here.)
TensorFlow object detection .csv files contain one bounding box per line.
filename | width | height | class | xmin | ymin | xmax | ymax |
image_1.jpg | 480 | 270 | queen | 173 | 24 | 260 | 137 |
image_1.jpg | 480 | 270 | queen | 165 | 135 | 253 | 251 |
image_2.jpg | 960 | 540 | jack | 255 | 96 | 337 | 208 |
image_2.jpg | 960 | 540 | jack | 261 | 124 | 543 | 370 |
Files that contain a .txt file for each image and a label_map.txt (or labels.txt) mapping the numeric classID to a class name.
Roboflow supports uploading images in several formats. The most common are JPG, PNG, and BMP.
- HEIC is supported on Safari only.
Roboflow Pro supports ingesting video in H.264 format. It will prompt you to select a frame rate for extracting image captures from your video.
Last modified 6mo ago