Dataset Analytics
Assess and improve the quality of your dataset.
Last updated
Was this helpful?
Assess and improve the quality of your dataset.
Last updated
Was this helpful?
Dataset Analytics shows a range of statistics about the dataset associated with a project. You can see the following pieces of information:
Number of images in your dataset;
Number of annotations;
Average image size;
Median image ratio;
Number of missing annotations;
Number of null annotations;
Image dimensions across your dataset;
Object count histogram, and;
A heatmap of annotation locations.
Using Dataset Analytics, you can derive a range of insights about your dataset. For example, if you have no null annotations, you may want to consider adding a few depending on the project on which you are working; if there are images with missing annotations, you can dig deeper to add the requisite annotations.
To see Dataset Analytics for a project, click "Analytics" in the left sidebar of a project:
The Dataset Analytics tab will then open:
On this page, you can see:
A breakdown of the number of classes in the images in your train, test, and valid datasets.
An overview of the sizes and aspect ratios of the images in your dataset.
A heatmap showing where most of your annotations are.
A histogram showing how many classes are annotated in each image in your dataset.
The Dimension Insights section describes the sizes and aspect ratios of raw images in your dataset.
If you apply the Resize augmentation when you create a project version — which we strongly recommend for almost all use cases — images in your version will be resizes, but the raw images will stay the same.
When you are training a model, it is important that your dataset is representative of the conditions in which your model will be deployed.
If your model will be deployed in an environment in which annotations may appear anywhere in the camera frame — for example, on a factory line where objects of different sizes are moving in real time, or in an image taken on a phone of an object — it is important that you annotate objects that appear in different places in an image.
Labeling objects in different parts of an image will ensure your model doesn't overfit to learn only how to identify objects in specific places.
The Annotation Heat Map shows where there are more or less annotations in images. This can be used to identify scenarios where your dataset annotations are too concentrated in a particular place.
You can drag over an area in the Heat Map to see images in the chosen range:
This shows you the distribution of how many annotated objects appear in each image.
If images that you pass through your model may contain multiple instances of an object, we recommend ensuring your dataset contains images with different numbers of object instances. This will help you ensure your model can generalise well to images with no, one, or multiple objects of interest.
Here if an example of a histogram:
You can select any of the bars on the histogram to see images with a given count: