Files
zed-playground/py_workspace/docs/calibrate-extrinsics-workflow.md
T

9.3 KiB

Calibrate Extrinsics Workflow

This document explains the workflow for calibrate_extrinsics.py, focusing on ground plane alignment (--auto-align) and depth-based refinement (--verify-depth, --refine-depth).

CLI Overview

The script calibrates camera extrinsics using ArUco markers detected in SVO recordings.

Key Options:

  • --svo: Path to SVO file(s) or directory containing them.
  • --markers: Path to the marker configuration parquet file.
  • --auto-align: Enables automatic ground plane alignment (opt-in).
  • --verify-depth: Enables depth-based verification of computed poses.
  • --refine-depth: Enables optimization of poses using depth data (requires --verify-depth).
  • --max-samples: Limits the number of processed samples for fast iteration.
  • --debug: Enables verbose debug logging (default is INFO).

Ground Plane Alignment (--auto-align)

When --auto-align is enabled, the script attempts to align the global coordinate system such that a specific face of the marker object becomes the ground plane (XZ plane, normal pointing +Y).

Prerequisites:

  • The marker parquet file MUST contain name and ids columns defining which markers belong to which face (e.g., "top", "bottom", "front").
  • If this metadata is missing, alignment is skipped with a warning.

Decision Flow: The script selects the ground face using the following precedence:

  1. Explicit Face (--ground-face):

    • If you provide --ground-face="bottom", the script looks up the markers for "bottom" in the loaded map.
    • It computes the average normal of those markers and aligns it to the global up vector.
  2. Marker ID Mapping (--ground-marker-id):

    • If you provide --ground-marker-id=21, the script finds which face contains marker 21 (e.g., "bottom").
    • It then proceeds as if --ground-face="bottom" was specified.
  3. Heuristic Detection (Fallback):

    • If neither option is provided, the script analyzes all visible markers.
    • It computes the normal for every defined face.
    • It selects the face whose normal is most aligned with the camera's "down" direction (assuming the camera is roughly upright).

Logging: The script logs the selected decision path for debugging:

  • Mapped ground-marker-id 21 to face 'bottom' (markers=[21])
  • Using explicit ground face 'bottom' (markers=[21])
  • Heuristically detected ground face 'bottom' (markers=[21])

Depth Verification & Refinement

This workflow uses the ZED camera's depth map to verify and improve the ArUco-based pose estimation.

1. Verification (--verify-depth)

  • Input: The computed extrinsic pose (T_{world\_from\_cam}) and the known 3D world coordinates of the marker corners.
  • Process:
    1. Projects marker corners into the camera frame using the computed pose.
    2. Samples the ZED depth map at these projected 2D locations (using a 5x5 median filter for robustness).
    3. Compares the measured depth (ZED) with the computed depth (distance from camera center to projected corner).
  • Output:
    • RMSE (Root Mean Square Error) of the depth residuals.
    • Number of valid points (where depth was available and finite).
    • Added to JSON output under depth_verify.

2. Refinement (--refine-depth)

  • Trigger: Runs only if verification is enabled and enough valid depth points (>4) are found.
  • Process:
    • Uses scipy.optimize.minimize (L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector).
    • Objective Function: Minimizes the squared difference between computed depth and measured depth for all visible marker corners.
    • Constraints: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
  • Output:
    • Refined pose replaces the original pose in the JSON output.
    • Improvement stats (delta rotation, delta translation, RMSE reduction) added under refine_depth.

Fast Iteration (--max-samples)

For development or quick checks, processing thousands of frames is unnecessary.

  • Use --max-samples N to stop after N valid samples (frames where markers were detected).
  • Example: --max-samples 1 will process the first valid frame, run alignment/refinement, save the result, and exit.

Example Workflow

Full Run with Alignment and Refinement:

uv run calibrate_extrinsics.py \
  --svo output/recording.svo \
  --markers aruco/markers/box.parquet \
  --aruco-dictionary DICT_APRILTAG_36h11 \
  --auto-align \
  --ground-marker-id 21 \
  --verify-depth \
  --refine-depth \
  --output output/calibrated.json

Fast Debug Run:

uv run calibrate_extrinsics.py \
  --svo output/ \
  --markers aruco/markers/box.parquet \
  --auto-align \
  --max-samples 1 \
  --debug \
  --no-preview

Known Unexpected Behavior / Troubleshooting

Depth Refinement Failure (Unit Mismatch)

Symptoms:

  • depth_verify reports extremely large RMSE values (e.g., > 1000).
  • refine_depth reports success: false, iterations: 0, and near-zero improvement.
  • The optimization fails to converge or produces nonsensical results.

Root Cause: The ZED SDK retrieve_measure(sl.MEASURE.DEPTH) returns depth values in the unit defined by InitParameters.coordinate_units. The default is MILLIMETERS. However, the calibration system (extrinsics, marker geometry) operates in METERS.

This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver.

Mitigation: The SVOReader class in aruco/svo_sync.py explicitly converts the retrieved depth map to meters:

# aruco/svo_sync.py
return depth_data / 1000.0

This ensures that all geometric math downstream remains consistent in meters.

Diagnostic Check: If you suspect a unit mismatch, check the depth_verify RMSE in the output JSON.

  • Healthy: RMSE < 0.5 (meters)
  • Mismatch: RMSE > 100 (likely millimeters)

Note: Confidence filtering (--depth-confidence-threshold) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.

Findings Summary (2026-02-07)

This section summarizes the latest deep investigation across local code, outputs, and external docs.

Confirmed Facts

  1. Marker geometry parquet is in meters

    • aruco/markers/standard_box_markers_600mm.parquet stores values around 0.3 (meters), not 300 (millimeters).
    • docs/marker-parquet-format.md also documents meter-scale coordinates.
  2. Depth unit contract is still fragile

    • ZED defaults to millimeters unless InitParameters.coordinate_units is explicitly set.
    • Current reader path converts depth by dividing by 1000.0 in aruco/svo_sync.py.
    • This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere.
  3. Observed runtime behavior still indicates refinement instability

    • Existing outputs (for example output/aligned_refined_extrinsics*.json) show very large depth_verify.rmse, often refine_depth.success: false, iterations: 0, and negligible improvement.
    • This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative.
  4. Current refinement objective is not robust enough

    • Objective is plain squared depth residuals + simple regularization.
    • It does not currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics.

Likely Contributors to Poor Refinement

  • Depth outliers are not sufficiently down-weighted in optimization.
  • Confidence map is used for verification filtering, but not as residual weights in the optimizer objective.
  • Representative frame choice uses the latest valid frame, not necessarily the best-quality frame.
  • Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization".
  1. Unit hardening (P0)

    • Explicitly set init_params.coordinate_units = sl.UNIT.METER in SVO reader.
    • Remove or guard manual /1000.0 conversion to avoid double-scaling risk.
    • Add depth sanity logs (min/median/max sampled depth) under --debug.
  2. Robust objective (P0)

    • Replace MSE-only residual with Huber (or Soft-L1) in meters.
    • Add confidence-weighted depth residuals in objective function.
    • Split translation/rotation regularization coefficients.
  3. Frame quality selection (P1)

    • Replace "latest valid frame" with best-frame scoring:
      • marker count (higher better)
      • median reprojection error (lower better)
      • valid depth ratio (higher better)
  4. Diagnostics and acceptance gates (P1)

    • Log optimizer termination reason, gradient/step behavior, and effective valid points.
    • Treat tiny RMSE changes as "no effective refinement" even if optimizer returns.
  5. Benchmark matrix (P1)

    • Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame.
    • Report per-camera pre/post RMSE, iteration count, and success/failure reason.

Practical note

The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that robust objective design and frame quality control are now the primary bottlenecks for meaningful depth refinement gains.