benchmark — Benchmark Suites¶
Run a manifest-driven suite of curated bag checks.
This is useful when you want public rosbag datasets and internal gold bags to act like a reproducible regression suite, not ad hoc manual checks.
Usage¶
# Run all cases in a suite
bagx benchmark benchmarks/open_data_suite.json
bagx benchmark benchmarks/non_slam_suite.json
bagx benchmark warehouse_benchmark.json --rules warehouse_bot
# Export a machine-readable report
bagx benchmark benchmarks/open_data_suite.json --json benchmark-report.json
# Run only selected cases
bagx benchmark benchmarks/open_data_suite.json --case nvidia-r2b-robotarm
# Fail if any referenced bag is missing
bagx benchmark benchmarks/open_data_suite.json --fail-on-missing
Manifest format¶
The manifest is JSON and supports environment-variable expansion in bag_path.
It also supports optional rules_path values to apply custom message rules. rules_path can be either a JSON file path or a plugin name.
{
"suite_name": "open-data-dogfood",
"rules_path": "warehouse_bot",
"cases": [
{
"name": "nvidia-r2b-galileo2",
"bag_path": "${BAGX_REALBAGS}/r2b_galileo2",
"report_type": "eval",
"expect": {
"min_overall_score": 90,
"required_domains": ["Perception"],
"required_recommendations": [
"Perception topics detected",
"Camera calibration topics are recorded"
],
"forbidden_recommendations": ["No GNSS data", "No IMU data"]
}
}
]
}
The repository ships two ready-made suites:
benchmarks/open_data_suite.json: public Autoware + NVIDIA bagsbenchmarks/non_slam_suite.json: perception/manipulation plus optional local Nav2 / MoveIt dogfood bags
For proprietary stacks, pair a benchmark manifest with a custom rules plugin or file and keep your expectations in required_domains, required_recommendations, and min_topic_rates.
Supported expectations¶
min_overall_scoremax_overall_scoremin_domain_scorerequired_domainsrequired_recommendationsforbidden_recommendationsmin_topic_ratesrequired_topics
JSON contract¶
Benchmark JSON reports include:
schema_versionreport_typebagx_version
This makes it practical to gate regressions in CI or compare reports across releases.