11 KiB
Calibrate Extrinsics Workflow
This document explains the workflow for calibrate_extrinsics.py, focusing on ground plane alignment (--auto-align) and depth-based refinement (--verify-depth, --refine-depth).
CLI Overview
The script calibrates camera extrinsics using ArUco markers detected in SVO recordings.
Key Options:
--svo: Path to SVO file(s) or directory containing them.--markers: Path to the marker configuration parquet file.--auto-align: Enables automatic ground plane alignment (opt-in).--verify-depth: Enables depth-based verification of computed poses.--refine-depth: Enables optimization of poses using depth data (requires--verify-depth).--use-confidence-weights: Uses ZED depth confidence map to weight residuals in optimization.--benchmark-matrix: Runs a comparison of baseline vs. robust refinement configurations.--max-samples: Limits the number of processed samples for fast iteration.--debug: Enables verbose debug logging (default is INFO).
Ground Plane Alignment (--auto-align)
When --auto-align is enabled, the script attempts to align the global coordinate system such that a specific face of the marker object becomes the ground plane (XZ plane, normal pointing +Y).
Prerequisites:
- The marker parquet file MUST contain
nameandidscolumns defining which markers belong to which face (e.g., "top", "bottom", "front"). - If this metadata is missing, alignment is skipped with a warning.
Decision Flow: The script selects the ground face using the following precedence:
-
Explicit Face (
--ground-face):- If you provide
--ground-face="bottom", the script looks up the markers for "bottom" in the loaded map. - It computes the average normal of those markers and aligns it to the global up vector.
- If you provide
-
Marker ID Mapping (
--ground-marker-id):- If you provide
--ground-marker-id=21, the script finds which face contains marker 21 (e.g., "bottom"). - It then proceeds as if
--ground-face="bottom"was specified.
- If you provide
-
Heuristic Detection (Fallback):
- If neither option is provided, the script analyzes all visible markers.
- It computes the normal for every defined face.
- It selects the face whose normal is most aligned with the camera's "down" direction (assuming the camera is roughly upright).
Logging: The script logs the selected decision path for debugging:
Mapped ground-marker-id 21 to face 'bottom' (markers=[21])Using explicit ground face 'bottom' (markers=[21])Heuristically detected ground face 'bottom' (markers=[21])
Depth Verification & Refinement
This workflow uses the ZED camera's depth map to verify and improve the ArUco-based pose estimation.
1. Verification (--verify-depth)
- Input: The computed extrinsic pose (
T_{world\_from\_cam}) and the known 3D world coordinates of the marker corners. - Process:
- Projects marker corners into the camera frame using the computed pose.
- Samples the ZED depth map at these projected 2D locations (using a 5x5 median filter for robustness).
- Compares the measured depth (ZED) with the computed depth (distance from camera center to projected corner).
- Output:
- RMSE (Root Mean Square Error) of the depth residuals.
- Number of valid points (where depth was available and finite).
- Added to JSON output under
depth_verify.
2. Refinement (--refine-depth)
- Trigger: Runs only if verification is enabled and enough valid depth points (>4) are found.
- Process:
- Uses
scipy.optimize.least_squareswith a robust loss function (soft_l1) to handle outliers. - Objective Function: Minimizes the robust residual between computed depth and measured depth for all visible marker corners.
- Confidence Weighting (
--use-confidence-weights): If enabled, residuals are weighted by the ZED confidence map (higher confidence = higher weight). - Constraints: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
- Uses
- Output:
- Refined pose replaces the original pose in the JSON output.
- Improvement stats (delta rotation, delta translation, RMSE reduction) added under
refine_depth.
3. Best Frame Selection
When multiple frames are available, the system scores them to pick the best candidate for verification/refinement:
- Criteria:
- Number of detected markers (primary factor).
- Reprojection error (lower is better).
- Valid depth ratio (percentage of marker corners with valid depth data).
- Depth confidence (if available).
- Benefit: Ensures refinement uses high-quality data rather than just the last valid frame.
Benchmark Matrix (--benchmark-matrix)
This mode runs a comparative analysis of different refinement configurations on the same data to evaluate improvements. It compares:
- Baseline: Linear loss (MSE), no confidence weighting.
- Robust: Soft-L1 loss, no confidence weighting.
- Robust + Confidence: Soft-L1 loss with confidence-weighted residuals.
- Robust + Confidence + Best Frame: All of the above, using the highest-scored frame.
Output:
- Prints a summary table for each camera showing RMSE improvement and iteration counts.
- Adds a
benchmarkobject to the JSON output containing detailed stats for each configuration.
Fast Iteration (--max-samples)
For development or quick checks, processing thousands of frames is unnecessary.
- Use
--max-samples Nto stop afterNvalid samples (frames where markers were detected). - Example:
--max-samples 1will process the first valid frame, run alignment/refinement, save the result, and exit.
Example Workflow
Full Run with Alignment and Robust Refinement:
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
--markers aruco/markers/box.parquet \
--aruco-dictionary DICT_APRILTAG_36h11 \
--auto-align \
--ground-marker-id 21 \
--verify-depth \
--refine-depth \
--use-confidence-weights \
--output output/calibrated.json
Benchmark Run:
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
--markers aruco/markers/box.parquet \
--benchmark-matrix \
--max-samples 100
Fast Debug Run:
uv run calibrate_extrinsics.py \
--svo output/ \
--markers aruco/markers/box.parquet \
--auto-align \
--max-samples 1 \
--debug \
--no-preview
Depth Data Management
To enable decoupled refinement workflows, the system supports saving the depth data used during calibration.
Saving Depth Data
Use the --save-depth <path.h5> flag with calibrate_extrinsics.py.
uv run calibrate_extrinsics.py ... --save-depth output/calibration_depth.h5
HDF5 Format Structure:
meta/: Global metadata (schema version, units=meters).cameras/{serial}/:intrinsics: Camera matrix (3x3).pooled_depth: The aggregated depth map used for verification (gzip compressed).resolution: [width, height].
This allows refine_ground_plane.py to run repeatedly with different parameters without re-processing the raw SVO files.
Ground Plane Refinement (refine_ground_plane.py)
This standalone tool refines camera extrinsics by ensuring all cameras agree on the ground plane location. It addresses common issues where ArUco markers are slightly tilted or not perfectly coplanar with the floor.
Workflow
uv run refine_ground_plane.py \
--input-extrinsics output/calibrated.json \
--input-depth output/calibration_depth.h5 \
--output-extrinsics output/refined.json \
--plot --plot-output output/ground_debug.html
Algorithm Details
The algorithm proceeds in four stages:
-
Plane Detection (Per Camera)
- Unprojects the depth map to a point cloud in the world frame (using current extrinsics).
- Uses RANSAC (via Open3D) to segment the dominant plane.
- Quality Gates:
- Minimum inliers (default: 500 points).
- Normal orientation check (must be roughly vertical,
normal_vertical_thresh=0.9).
-
Robust Consensus
- Computes a "consensus plane" from all valid camera detections.
- Method:
- Aligns all normals to the upper hemisphere.
- Computes the geometric median of normals and distances.
- Filters outliers based on deviation from the median (>15° angle or >0.5m distance).
- Computes a weighted average of the remaining inlier planes.
-
Correction Calculation
- Computes a rigid transform
T_{corr}for each camera. - Constraints:
- Rotation: Only corrects pitch and roll (aligns normal to vertical). Yaw is preserved.
- Translation: Only corrects vertical height (aligns plane distance). X/Z position is preserved.
- Consensus-Relative Correction: By default, cameras are aligned to the consensus plane rather than absolute Y=0. This ensures relative consistency between cameras even if the absolute floor height is slightly off.
- Computes a rigid transform
-
Safety Guardrails
- The correction is rejected if:
- Rotation >
max_rotation_deg(default: 5°). - Translation >
max_translation_m(default: 0.1m). - Deviation from consensus >
max_consensus_deviation(default: 10°, 0.5m).
- Rotation >
- Why no ICP?: For flat floors, plane-to-plane alignment is more robust than ICP. ICP on featureless planes can drift (slide) along the surface.
- The correction is rejected if:
Tuning Guidance
Based on end-to-end observations:
--stride: Default is 8. Decrease to 4 or 2 for higher density if the floor is far away or sparse.--ransac-dist-thresh: Default 0.02m (2cm). Increase to 0.03-0.05m if the floor is uneven or depth noise is high.--max-rotation-deg: Keep this tight (3-5°). If the floor correction needs >5°, the initial ArUco calibration is likely poor and should be re-run.--target-y: Use this if you need the floor to be at a specific absolute height (e.g., -1.5m) instead of just consistent.
Known Unexpected Behavior / Troubleshooting
Resolved: Depth Refinement Failure (Unit Mismatch)
Note: This issue has been resolved in the latest version by enforcing explicit meter units in the SVO reader and removing ambiguous manual conversions.
Previous Symptoms:
depth_verifyreports extremely large RMSE values (e.g., > 1000).refine_depthreportssuccess: false,iterations: 0, and near-zero improvement.
Resolution:
The system now explicitly sets InitParameters.coordinate_units = sl.UNIT.METER when opening SVO files, ensuring consistent units across the pipeline.
Optimization Stalls
If refine_depth shows success: false but nfev (evaluations) is high, the optimizer may have hit a flat region or local minimum.
- Check: Look at
termination_messagein the JSON output. - Fix: Try enabling
--use-confidence-weightsor checking if the initial ArUco pose is too far off (reprojection error > 2.0).