zed-playground/py_workspace/docs/calibrate-extrinsics-workflow.md

# Calibrate Extrinsics Workflow

This document explains the workflow for `calibrate_extrinsics.py`, focusing on ground plane alignment (`--auto-align`) and depth-based refinement (`--verify-depth`, `--refine-depth`).

## CLI Overview

The script calibrates camera extrinsics using ArUco markers detected in SVO recordings.

**Key Options:**
- `--svo`: Path to SVO file(s) or directory containing them.
- `--markers`: Path to the marker configuration parquet file.
- `--auto-align`: Enables automatic ground plane alignment (opt-in).
- `--verify-depth`: Enables depth-based verification of computed poses.
- `--refine-depth`: Enables optimization of poses using depth data (requires `--verify-depth`).
- `--max-samples`: Limits the number of processed samples for fast iteration.
- `--debug`: Enables verbose debug logging (default is INFO).

## Ground Plane Alignment (`--auto-align`)

When `--auto-align` is enabled, the script attempts to align the global coordinate system such that a specific face of the marker object becomes the ground plane (XZ plane, normal pointing +Y).

**Prerequisites:**
- The marker parquet file MUST contain `name` and `ids` columns defining which markers belong to which face (e.g., "top", "bottom", "front").
- If this metadata is missing, alignment is skipped with a warning.

**Decision Flow:**
The script selects the ground face using the following precedence:

1.  **Explicit Face (`--ground-face`)**:
    - If you provide `--ground-face="bottom"`, the script looks up the markers for "bottom" in the loaded map.
    - It computes the average normal of those markers and aligns it to the global up vector.

2.  **Marker ID Mapping (`--ground-marker-id`)**:
    - If you provide `--ground-marker-id=21`, the script finds which face contains marker 21 (e.g., "bottom").
    - It then proceeds as if `--ground-face="bottom"` was specified.

3.  **Heuristic Detection (Fallback)**:
    - If neither option is provided, the script analyzes all visible markers.
    - It computes the normal for every defined face.
    - It selects the face whose normal is most aligned with the camera's "down" direction (assuming the camera is roughly upright).

**Logging:**
The script logs the selected decision path for debugging:
- `Mapped ground-marker-id 21 to face 'bottom' (markers=[21])`
- `Using explicit ground face 'bottom' (markers=[21])`
- `Heuristically detected ground face 'bottom' (markers=[21])`

## Depth Verification & Refinement

This workflow uses the ZED camera's depth map to verify and improve the ArUco-based pose estimation.

### 1. Verification (`--verify-depth`)
- **Input**: The computed extrinsic pose ($T_{world\_from\_cam}$) and the known 3D world coordinates of the marker corners.
- **Process**:
    1. Projects marker corners into the camera frame using the computed pose.
    2. Samples the ZED depth map at these projected 2D locations (using a 5x5 median filter for robustness).
    3. Compares the *measured* depth (ZED) with the *computed* depth (distance from camera center to projected corner).
- **Output**:
    - RMSE (Root Mean Square Error) of the depth residuals.
    - Number of valid points (where depth was available and finite).
    - Added to JSON output under `depth_verify`.

### 2. Refinement (`--refine-depth`)
- **Trigger**: Runs only if verification is enabled and enough valid depth points (>4) are found.
- **Process**:
    - Uses `scipy.optimize.minimize` (L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector).
    - **Objective Function**: Minimizes the squared difference between computed depth and measured depth for all visible marker corners.
    - **Constraints**: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
- **Output**:
    - Refined pose replaces the original pose in the JSON output.
    - Improvement stats (delta rotation, delta translation, RMSE reduction) added under `refine_depth`.

## Fast Iteration (`--max-samples`)

For development or quick checks, processing thousands of frames is unnecessary.
- Use `--max-samples N` to stop after `N` valid samples (frames where markers were detected).
- Example: `--max-samples 1` will process the first valid frame, run alignment/refinement, save the result, and exit.

## Example Workflow

**Full Run with Alignment and Refinement:**
```bash
uv run calibrate_extrinsics.py \
  --svo output/recording.svo \
  --markers aruco/markers/box.parquet \
  --aruco-dictionary DICT_APRILTAG_36h11 \
  --auto-align \
  --ground-marker-id 21 \
  --verify-depth \
  --refine-depth \
  --output output/calibrated.json
```

**Fast Debug Run:**
```bash
uv run calibrate_extrinsics.py \
  --svo output/ \
  --markers aruco/markers/box.parquet \
  --auto-align \
  --max-samples 1 \
  --debug \
  --no-preview
```

## Known Unexpected Behavior / Troubleshooting

### Depth Refinement Failure (Unit Mismatch)

**Symptoms:**
- `depth_verify` reports extremely large RMSE values (e.g., > 1000).
- `refine_depth` reports `success: false`, `iterations: 0`, and near-zero improvement.
- The optimization fails to converge or produces nonsensical results.

**Root Cause:**
The ZED SDK `retrieve_measure(sl.MEASURE.DEPTH)` returns depth values in the unit defined by `InitParameters.coordinate_units`. The default is **MILLIMETERS**. However, the calibration system (extrinsics, marker geometry) operates in **METERS**.

This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver.

**Mitigation:**
The `SVOReader` class in `aruco/svo_sync.py` explicitly converts the retrieved depth map to meters:
```python
# aruco/svo_sync.py
return depth_data / 1000.0
```
This ensures that all geometric math downstream remains consistent in meters.

**Diagnostic Check:**
If you suspect a unit mismatch, check the `depth_verify` RMSE in the output JSON.
- **Healthy:** RMSE < 0.5 (meters)
- **Mismatch:** RMSE > 100 (likely millimeters)

*Note: Confidence filtering (`--depth-confidence-threshold`) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.*

## Findings Summary (2026-02-07)

This section summarizes the latest deep investigation across local code, outputs, and external docs.

### Confirmed Facts

1. **Marker geometry parquet is in meters**
   - `aruco/markers/standard_box_markers_600mm.parquet` stores values around `0.3` (meters), not `300` (millimeters).
   - `docs/marker-parquet-format.md` also documents meter-scale coordinates.

2. **Depth unit contract is still fragile**
   - ZED defaults to millimeters unless `InitParameters.coordinate_units` is explicitly set.
   - Current reader path converts depth by dividing by `1000.0` in `aruco/svo_sync.py`.
   - This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere.

3. **Observed runtime behavior still indicates refinement instability**
   - Existing outputs (for example `output/aligned_refined_extrinsics*.json`) show very large `depth_verify.rmse`, often `refine_depth.success: false`, `iterations: 0`, and negligible improvement.
   - This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative.

4. **Current refinement objective is not robust enough**
   - Objective is plain squared depth residuals + simple regularization.
   - It does **not** currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics.

### Likely Contributors to Poor Refinement

- Depth outliers are not sufficiently down-weighted in optimization.
- Confidence map is used for verification filtering, but not as residual weights in the optimizer objective.
- Representative frame choice uses the latest valid frame, not necessarily the best-quality frame.
- Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization".

### Recommended Implementation Order (for next session)

1. **Unit hardening (P0)**
   - Explicitly set `init_params.coordinate_units = sl.UNIT.METER` in SVO reader.
   - Remove or guard manual `/1000.0` conversion to avoid double-scaling risk.
   - Add depth sanity logs (min/median/max sampled depth) under `--debug`.

2. **Robust objective (P0)**
   - Replace MSE-only residual with Huber (or Soft-L1) in meters.
   - Add confidence-weighted depth residuals in objective function.
   - Split translation/rotation regularization coefficients.

3. **Frame quality selection (P1)**
   - Replace "latest valid frame" with best-frame scoring:
     - marker count (higher better)
     - median reprojection error (lower better)
     - valid depth ratio (higher better)

4. **Diagnostics and acceptance gates (P1)**
   - Log optimizer termination reason, gradient/step behavior, and effective valid points.
   - Treat tiny RMSE changes as "no effective refinement" even if optimizer returns.

5. **Benchmark matrix (P1)**
   - Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame.
   - Report per-camera pre/post RMSE, iteration count, and success/failure reason.

### Practical note

The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that **robust objective design and frame quality control** are now the primary bottlenecks for meaningful depth refinement gains.