9.3 KiB
Calibrate Extrinsics Workflow
This document explains the workflow for calibrate_extrinsics.py, focusing on ground plane alignment (--auto-align) and depth-based refinement (--verify-depth, --refine-depth).
CLI Overview
The script calibrates camera extrinsics using ArUco markers detected in SVO recordings.
Key Options:
--svo: Path to SVO file(s) or directory containing them.--markers: Path to the marker configuration parquet file.--auto-align: Enables automatic ground plane alignment (opt-in).--verify-depth: Enables depth-based verification of computed poses.--refine-depth: Enables optimization of poses using depth data (requires--verify-depth).--max-samples: Limits the number of processed samples for fast iteration.--debug: Enables verbose debug logging (default is INFO).
Ground Plane Alignment (--auto-align)
When --auto-align is enabled, the script attempts to align the global coordinate system such that a specific face of the marker object becomes the ground plane (XZ plane, normal pointing +Y).
Prerequisites:
- The marker parquet file MUST contain
nameandidscolumns defining which markers belong to which face (e.g., "top", "bottom", "front"). - If this metadata is missing, alignment is skipped with a warning.
Decision Flow: The script selects the ground face using the following precedence:
-
Explicit Face (
--ground-face):- If you provide
--ground-face="bottom", the script looks up the markers for "bottom" in the loaded map. - It computes the average normal of those markers and aligns it to the global up vector.
- If you provide
-
Marker ID Mapping (
--ground-marker-id):- If you provide
--ground-marker-id=21, the script finds which face contains marker 21 (e.g., "bottom"). - It then proceeds as if
--ground-face="bottom"was specified.
- If you provide
-
Heuristic Detection (Fallback):
- If neither option is provided, the script analyzes all visible markers.
- It computes the normal for every defined face.
- It selects the face whose normal is most aligned with the camera's "down" direction (assuming the camera is roughly upright).
Logging: The script logs the selected decision path for debugging:
Mapped ground-marker-id 21 to face 'bottom' (markers=[21])Using explicit ground face 'bottom' (markers=[21])Heuristically detected ground face 'bottom' (markers=[21])
Depth Verification & Refinement
This workflow uses the ZED camera's depth map to verify and improve the ArUco-based pose estimation.
1. Verification (--verify-depth)
- Input: The computed extrinsic pose (
T_{world\_from\_cam}) and the known 3D world coordinates of the marker corners. - Process:
- Projects marker corners into the camera frame using the computed pose.
- Samples the ZED depth map at these projected 2D locations (using a 5x5 median filter for robustness).
- Compares the measured depth (ZED) with the computed depth (distance from camera center to projected corner).
- Output:
- RMSE (Root Mean Square Error) of the depth residuals.
- Number of valid points (where depth was available and finite).
- Added to JSON output under
depth_verify.
2. Refinement (--refine-depth)
- Trigger: Runs only if verification is enabled and enough valid depth points (>4) are found.
- Process:
- Uses
scipy.optimize.minimize(L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector). - Objective Function: Minimizes the squared difference between computed depth and measured depth for all visible marker corners.
- Constraints: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
- Uses
- Output:
- Refined pose replaces the original pose in the JSON output.
- Improvement stats (delta rotation, delta translation, RMSE reduction) added under
refine_depth.
Fast Iteration (--max-samples)
For development or quick checks, processing thousands of frames is unnecessary.
- Use
--max-samples Nto stop afterNvalid samples (frames where markers were detected). - Example:
--max-samples 1will process the first valid frame, run alignment/refinement, save the result, and exit.
Example Workflow
Full Run with Alignment and Refinement:
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
--markers aruco/markers/box.parquet \
--aruco-dictionary DICT_APRILTAG_36h11 \
--auto-align \
--ground-marker-id 21 \
--verify-depth \
--refine-depth \
--output output/calibrated.json
Fast Debug Run:
uv run calibrate_extrinsics.py \
--svo output/ \
--markers aruco/markers/box.parquet \
--auto-align \
--max-samples 1 \
--debug \
--no-preview
Known Unexpected Behavior / Troubleshooting
Depth Refinement Failure (Unit Mismatch)
Symptoms:
depth_verifyreports extremely large RMSE values (e.g., > 1000).refine_depthreportssuccess: false,iterations: 0, and near-zero improvement.- The optimization fails to converge or produces nonsensical results.
Root Cause:
The ZED SDK retrieve_measure(sl.MEASURE.DEPTH) returns depth values in the unit defined by InitParameters.coordinate_units. The default is MILLIMETERS. However, the calibration system (extrinsics, marker geometry) operates in METERS.
This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver.
Mitigation:
The SVOReader class in aruco/svo_sync.py explicitly converts the retrieved depth map to meters:
# aruco/svo_sync.py
return depth_data / 1000.0
This ensures that all geometric math downstream remains consistent in meters.
Diagnostic Check:
If you suspect a unit mismatch, check the depth_verify RMSE in the output JSON.
- Healthy: RMSE < 0.5 (meters)
- Mismatch: RMSE > 100 (likely millimeters)
Note: Confidence filtering (--depth-confidence-threshold) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.
Findings Summary (2026-02-07)
This section summarizes the latest deep investigation across local code, outputs, and external docs.
Confirmed Facts
-
Marker geometry parquet is in meters
aruco/markers/standard_box_markers_600mm.parquetstores values around0.3(meters), not300(millimeters).docs/marker-parquet-format.mdalso documents meter-scale coordinates.
-
Depth unit contract is still fragile
- ZED defaults to millimeters unless
InitParameters.coordinate_unitsis explicitly set. - Current reader path converts depth by dividing by
1000.0inaruco/svo_sync.py. - This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere.
- ZED defaults to millimeters unless
-
Observed runtime behavior still indicates refinement instability
- Existing outputs (for example
output/aligned_refined_extrinsics*.json) show very largedepth_verify.rmse, oftenrefine_depth.success: false,iterations: 0, and negligible improvement. - This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative.
- Existing outputs (for example
-
Current refinement objective is not robust enough
- Objective is plain squared depth residuals + simple regularization.
- It does not currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics.
Likely Contributors to Poor Refinement
- Depth outliers are not sufficiently down-weighted in optimization.
- Confidence map is used for verification filtering, but not as residual weights in the optimizer objective.
- Representative frame choice uses the latest valid frame, not necessarily the best-quality frame.
- Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization".
Recommended Implementation Order (for next session)
-
Unit hardening (P0)
- Explicitly set
init_params.coordinate_units = sl.UNIT.METERin SVO reader. - Remove or guard manual
/1000.0conversion to avoid double-scaling risk. - Add depth sanity logs (min/median/max sampled depth) under
--debug.
- Explicitly set
-
Robust objective (P0)
- Replace MSE-only residual with Huber (or Soft-L1) in meters.
- Add confidence-weighted depth residuals in objective function.
- Split translation/rotation regularization coefficients.
-
Frame quality selection (P1)
- Replace "latest valid frame" with best-frame scoring:
- marker count (higher better)
- median reprojection error (lower better)
- valid depth ratio (higher better)
- Replace "latest valid frame" with best-frame scoring:
-
Diagnostics and acceptance gates (P1)
- Log optimizer termination reason, gradient/step behavior, and effective valid points.
- Treat tiny RMSE changes as "no effective refinement" even if optimizer returns.
-
Benchmark matrix (P1)
- Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame.
- Report per-camera pre/post RMSE, iteration count, and success/failure reason.
Practical note
The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that robust objective design and frame quality control are now the primary bottlenecks for meaningful depth refinement gains.