# Calibrate Extrinsics Workflow This document explains the workflow for `calibrate_extrinsics.py`, focusing on ground plane alignment (`--auto-align`) and depth-based refinement (`--verify-depth`, `--refine-depth`). ## CLI Overview The script calibrates camera extrinsics using ArUco markers detected in SVO recordings. **Key Options:** - `--svo`: Path to SVO file(s) or directory containing them. - `--markers`: Path to the marker configuration parquet file. - `--auto-align`: Enables automatic ground plane alignment (opt-in). - `--verify-depth`: Enables depth-based verification of computed poses. - `--refine-depth`: Enables optimization of poses using depth data (requires `--verify-depth`). - `--max-samples`: Limits the number of processed samples for fast iteration. - `--debug`: Enables verbose debug logging (default is INFO). ## Ground Plane Alignment (`--auto-align`) When `--auto-align` is enabled, the script attempts to align the global coordinate system such that a specific face of the marker object becomes the ground plane (XZ plane, normal pointing +Y). **Prerequisites:** - The marker parquet file MUST contain `name` and `ids` columns defining which markers belong to which face (e.g., "top", "bottom", "front"). - If this metadata is missing, alignment is skipped with a warning. **Decision Flow:** The script selects the ground face using the following precedence: 1. **Explicit Face (`--ground-face`)**: - If you provide `--ground-face="bottom"`, the script looks up the markers for "bottom" in the loaded map. - It computes the average normal of those markers and aligns it to the global up vector. 2. **Marker ID Mapping (`--ground-marker-id`)**: - If you provide `--ground-marker-id=21`, the script finds which face contains marker 21 (e.g., "bottom"). - It then proceeds as if `--ground-face="bottom"` was specified. 3. **Heuristic Detection (Fallback)**: - If neither option is provided, the script analyzes all visible markers. - It computes the normal for every defined face. - It selects the face whose normal is most aligned with the camera's "down" direction (assuming the camera is roughly upright). **Logging:** The script logs the selected decision path for debugging: - `Mapped ground-marker-id 21 to face 'bottom' (markers=[21])` - `Using explicit ground face 'bottom' (markers=[21])` - `Heuristically detected ground face 'bottom' (markers=[21])` ## Depth Verification & Refinement This workflow uses the ZED camera's depth map to verify and improve the ArUco-based pose estimation. ### 1. Verification (`--verify-depth`) - **Input**: The computed extrinsic pose ($T_{world\_from\_cam}$) and the known 3D world coordinates of the marker corners. - **Process**: 1. Projects marker corners into the camera frame using the computed pose. 2. Samples the ZED depth map at these projected 2D locations (using a 5x5 median filter for robustness). 3. Compares the *measured* depth (ZED) with the *computed* depth (distance from camera center to projected corner). - **Output**: - RMSE (Root Mean Square Error) of the depth residuals. - Number of valid points (where depth was available and finite). - Added to JSON output under `depth_verify`. ### 2. Refinement (`--refine-depth`) - **Trigger**: Runs only if verification is enabled and enough valid depth points (>4) are found. - **Process**: - Uses `scipy.optimize.minimize` (L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector). - **Objective Function**: Minimizes the squared difference between computed depth and measured depth for all visible marker corners. - **Constraints**: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm). - **Output**: - Refined pose replaces the original pose in the JSON output. - Improvement stats (delta rotation, delta translation, RMSE reduction) added under `refine_depth`. ## Fast Iteration (`--max-samples`) For development or quick checks, processing thousands of frames is unnecessary. - Use `--max-samples N` to stop after `N` valid samples (frames where markers were detected). - Example: `--max-samples 1` will process the first valid frame, run alignment/refinement, save the result, and exit. ## Example Workflow **Full Run with Alignment and Refinement:** ```bash uv run calibrate_extrinsics.py \ --svo output/recording.svo \ --markers aruco/markers/box.parquet \ --aruco-dictionary DICT_APRILTAG_36h11 \ --auto-align \ --ground-marker-id 21 \ --verify-depth \ --refine-depth \ --output output/calibrated.json ``` **Fast Debug Run:** ```bash uv run calibrate_extrinsics.py \ --svo output/ \ --markers aruco/markers/box.parquet \ --auto-align \ --max-samples 1 \ --debug \ --no-preview ``` ## Known Unexpected Behavior / Troubleshooting ### Depth Refinement Failure (Unit Mismatch) **Symptoms:** - `depth_verify` reports extremely large RMSE values (e.g., > 1000). - `refine_depth` reports `success: false`, `iterations: 0`, and near-zero improvement. - The optimization fails to converge or produces nonsensical results. **Root Cause:** The ZED SDK `retrieve_measure(sl.MEASURE.DEPTH)` returns depth values in the unit defined by `InitParameters.coordinate_units`. The default is **MILLIMETERS**. However, the calibration system (extrinsics, marker geometry) operates in **METERS**. This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver. **Mitigation:** The `SVOReader` class in `aruco/svo_sync.py` explicitly converts the retrieved depth map to meters: ```python # aruco/svo_sync.py return depth_data / 1000.0 ``` This ensures that all geometric math downstream remains consistent in meters. **Diagnostic Check:** If you suspect a unit mismatch, check the `depth_verify` RMSE in the output JSON. - **Healthy:** RMSE < 0.5 (meters) - **Mismatch:** RMSE > 100 (likely millimeters) *Note: Confidence filtering (`--depth-confidence-threshold`) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.* ## Findings Summary (2026-02-07 exhaustive search) This section summarizes the latest deep investigation across local code, outputs, and external docs. ### Confirmed Facts 1. **Marker geometry parquet is in meters** - `aruco/markers/standard_box_markers_600mm.parquet` stores values around `0.3` (meters), not `300` (millimeters). - `docs/marker-parquet-format.md` also documents meter-scale coordinates. 2. **Depth unit contract is still fragile** - ZED defaults to millimeters unless `InitParameters.coordinate_units` is explicitly set. - Current reader path converts depth by dividing by `1000.0` in `aruco/svo_sync.py`. - This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere. 3. **Observed runtime behavior still indicates refinement instability** - Existing outputs (for example `output/aligned_refined_extrinsics*.json`) show very large `depth_verify.rmse`, often `refine_depth.success: false`, `iterations: 0`, and negligible improvement. - This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative. 4. **Current refinement objective is not robust enough** - Objective is plain squared depth residuals + simple regularization. - It does **not** currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics. ### Likely Contributors to Poor Refinement - Depth outliers are not sufficiently down-weighted in optimization. - Confidence map is used for verification filtering, but not as residual weights in the optimizer objective. - Representative frame choice uses the latest valid frame, not necessarily the best-quality frame. - Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization". ### Recommended Implementation Order (for next session) 1. **Unit hardening (P0)** - Explicitly set `init_params.coordinate_units = sl.UNIT.METER` in SVO reader. - Remove or guard manual `/1000.0` conversion to avoid double-scaling risk. - Add depth sanity logs (min/median/max sampled depth) under `--debug`. 2. **Robust objective (P0)** - Replace MSE-only residual with Huber (or Soft-L1) in meters. - Add confidence-weighted depth residuals in objective function. - Split translation/rotation regularization coefficients. 3. **Frame quality selection (P1)** - Replace "latest valid frame" with best-frame scoring: - marker count (higher better) - median reprojection error (lower better) - valid depth ratio (higher better) 4. **Diagnostics and acceptance gates (P1)** - Log optimizer termination reason, gradient/step behavior, and effective valid points. - Treat tiny RMSE changes as "no effective refinement" even if optimizer returns. 5. **Benchmark matrix (P1)** - Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame. - Report per-camera pre/post RMSE, iteration count, and success/failure reason. ### Practical note The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that **robust objective design and frame quality control** are now the primary bottlenecks for meaningful depth refinement gains.