For the complete documentation index, see llms.txt. This page is also available as Markdown.

Confidence Sweep

Returns per-confidence-threshold metric curves and the F1-optimal threshold per split (and per class). Useful for plotting precision/recall trade-offs and picking a deployment-time threshold.

This is the data the production metrics explorer panel in the app reads.

https://api.roboflow.com/:workspace/model-evals/:evalId/confidence-sweep
curl "https://api.roboflow.com/my-workspace/model-evals/$EVAL_ID/confidence-sweep?api_key=$ROBOFLOW_API_KEY"

Response

{
    "splits": {
        "test": {
            "perThreshold": {
                "0.00": { "precision": 0.02, "recall": 1.0,  "f1": 0.039 },
                "0.20": { "precision": 0.45, "recall": 0.92, "f1": 0.605 },
                "0.37": { "precision": 0.85, "recall": 0.85, "f1": 0.85 },
                "0.50": { "precision": 0.91, "recall": 0.78, "f1": 0.84 }
            },
            "optimalThreshold": 0.37,
            "optimalMetrics": {
                "precision": 0.85,
                "recall": 0.85,
                "f1": 0.85
            },
            "perClass": {
                "Car-rims": {
                    "perThreshold": { "0.37": { "precision": 0.85, "recall": 0.85, "f1": 0.85 } },
                    "optimalThreshold": 0.37,
                    "optimalMetrics": { "precision": 0.85, "recall": 0.85, "f1": 0.85 }
                }
            }
        },
        "valid": { "...": "same shape" },
        "train": { "...": "same shape" }
    }
}

Notes

  • perThreshold keys are confidence thresholds as decimal strings, typically every 0.01 from 0.00 to 0.99.

  • optimalThreshold is the threshold that maximizes F1 for that split.

  • Per-class entries inside a split's perClass have the same shape minus the nested perClass.

Last updated

Was this helpful?