feat(calibration): robust depth refinement pipeline with diagnostics and benchmarking

This commit is contained in:
2026-02-07 05:51:07 +00:00
parent ead3796cdb
commit dad1f2a69f
17 changed files with 1876 additions and 261 deletions
@@ -12,6 +12,8 @@ The script calibrates camera extrinsics using ArUco markers detected in SVO reco
- `--auto-align`: Enables automatic ground plane alignment (opt-in).
- `--verify-depth`: Enables depth-based verification of computed poses.
- `--refine-depth`: Enables optimization of poses using depth data (requires `--verify-depth`).
- `--use-confidence-weights`: Uses ZED depth confidence map to weight residuals in optimization.
- `--benchmark-matrix`: Runs a comparison of baseline vs. robust refinement configurations.
- `--max-samples`: Limits the number of processed samples for fast iteration.
- `--debug`: Enables verbose debug logging (default is INFO).
@@ -63,13 +65,35 @@ This workflow uses the ZED camera's depth map to verify and improve the ArUco-ba
### 2. Refinement (`--refine-depth`)
- **Trigger**: Runs only if verification is enabled and enough valid depth points (>4) are found.
- **Process**:
- Uses `scipy.optimize.minimize` (L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector).
- **Objective Function**: Minimizes the squared difference between computed depth and measured depth for all visible marker corners.
- Uses `scipy.optimize.least_squares` with a robust loss function (`soft_l1`) to handle outliers.
- **Objective Function**: Minimizes the robust residual between computed depth and measured depth for all visible marker corners.
- **Confidence Weighting** (`--use-confidence-weights`): If enabled, residuals are weighted by the ZED confidence map (higher confidence = higher weight).
- **Constraints**: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
- **Output**:
- Refined pose replaces the original pose in the JSON output.
- Improvement stats (delta rotation, delta translation, RMSE reduction) added under `refine_depth`.
### 3. Best Frame Selection
When multiple frames are available, the system scores them to pick the best candidate for verification/refinement:
- **Criteria**:
- Number of detected markers (primary factor).
- Reprojection error (lower is better).
- Valid depth ratio (percentage of marker corners with valid depth data).
- Depth confidence (if available).
- **Benefit**: Ensures refinement uses high-quality data rather than just the last valid frame.
## Benchmark Matrix (`--benchmark-matrix`)
This mode runs a comparative analysis of different refinement configurations on the same data to evaluate improvements. It compares:
1. **Baseline**: Linear loss (MSE), no confidence weighting.
2. **Robust**: Soft-L1 loss, no confidence weighting.
3. **Robust + Confidence**: Soft-L1 loss with confidence-weighted residuals.
4. **Robust + Confidence + Best Frame**: All of the above, using the highest-scored frame.
**Output:**
- Prints a summary table for each camera showing RMSE improvement and iteration counts.
- Adds a `benchmark` object to the JSON output containing detailed stats for each configuration.
## Fast Iteration (`--max-samples`)
For development or quick checks, processing thousands of frames is unnecessary.
@@ -78,7 +102,7 @@ For development or quick checks, processing thousands of frames is unnecessary.
## Example Workflow
**Full Run with Alignment and Refinement:**
**Full Run with Alignment and Robust Refinement:**
```bash
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
@@ -88,9 +112,19 @@ uv run calibrate_extrinsics.py \
--ground-marker-id 21 \
--verify-depth \
--refine-depth \
--use-confidence-weights \
--output output/calibrated.json
```
**Benchmark Run:**
```bash
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
--markers aruco/markers/box.parquet \
--benchmark-matrix \
--max-samples 100
```
**Fast Debug Run:**
```bash
uv run calibrate_extrinsics.py \
@@ -104,89 +138,18 @@ uv run calibrate_extrinsics.py \
## Known Unexpected Behavior / Troubleshooting
### Depth Refinement Failure (Unit Mismatch)
### Resolved: Depth Refinement Failure (Unit Mismatch)
**Symptoms:**
*Note: This issue has been resolved in the latest version by enforcing explicit meter units in the SVO reader and removing ambiguous manual conversions.*
**Previous Symptoms:**
- `depth_verify` reports extremely large RMSE values (e.g., > 1000).
- `refine_depth` reports `success: false`, `iterations: 0`, and near-zero improvement.
- The optimization fails to converge or produces nonsensical results.
**Root Cause:**
The ZED SDK `retrieve_measure(sl.MEASURE.DEPTH)` returns depth values in the unit defined by `InitParameters.coordinate_units`. The default is **MILLIMETERS**. However, the calibration system (extrinsics, marker geometry) operates in **METERS**.
**Resolution:**
The system now explicitly sets `InitParameters.coordinate_units = sl.UNIT.METER` when opening SVO files, ensuring consistent units across the pipeline.
This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver.
**Mitigation:**
The `SVOReader` class in `aruco/svo_sync.py` explicitly converts the retrieved depth map to meters:
```python
# aruco/svo_sync.py
return depth_data / 1000.0
```
This ensures that all geometric math downstream remains consistent in meters.
**Diagnostic Check:**
If you suspect a unit mismatch, check the `depth_verify` RMSE in the output JSON.
- **Healthy:** RMSE < 0.5 (meters)
- **Mismatch:** RMSE > 100 (likely millimeters)
*Note: Confidence filtering (`--depth-confidence-threshold`) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.*
## Findings Summary (2026-02-07)
This section summarizes the latest deep investigation across local code, outputs, and external docs.
### Confirmed Facts
1. **Marker geometry parquet is in meters**
- `aruco/markers/standard_box_markers_600mm.parquet` stores values around `0.3` (meters), not `300` (millimeters).
- `docs/marker-parquet-format.md` also documents meter-scale coordinates.
2. **Depth unit contract is still fragile**
- ZED defaults to millimeters unless `InitParameters.coordinate_units` is explicitly set.
- Current reader path converts depth by dividing by `1000.0` in `aruco/svo_sync.py`.
- This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere.
3. **Observed runtime behavior still indicates refinement instability**
- Existing outputs (for example `output/aligned_refined_extrinsics*.json`) show very large `depth_verify.rmse`, often `refine_depth.success: false`, `iterations: 0`, and negligible improvement.
- This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative.
4. **Current refinement objective is not robust enough**
- Objective is plain squared depth residuals + simple regularization.
- It does **not** currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics.
### Likely Contributors to Poor Refinement
- Depth outliers are not sufficiently down-weighted in optimization.
- Confidence map is used for verification filtering, but not as residual weights in the optimizer objective.
- Representative frame choice uses the latest valid frame, not necessarily the best-quality frame.
- Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization".
### Recommended Implementation Order (for next session)
1. **Unit hardening (P0)**
- Explicitly set `init_params.coordinate_units = sl.UNIT.METER` in SVO reader.
- Remove or guard manual `/1000.0` conversion to avoid double-scaling risk.
- Add depth sanity logs (min/median/max sampled depth) under `--debug`.
2. **Robust objective (P0)**
- Replace MSE-only residual with Huber (or Soft-L1) in meters.
- Add confidence-weighted depth residuals in objective function.
- Split translation/rotation regularization coefficients.
3. **Frame quality selection (P1)**
- Replace "latest valid frame" with best-frame scoring:
- marker count (higher better)
- median reprojection error (lower better)
- valid depth ratio (higher better)
4. **Diagnostics and acceptance gates (P1)**
- Log optimizer termination reason, gradient/step behavior, and effective valid points.
- Treat tiny RMSE changes as "no effective refinement" even if optimizer returns.
5. **Benchmark matrix (P1)**
- Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame.
- Report per-camera pre/post RMSE, iteration count, and success/failure reason.
### Practical note
The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that **robust objective design and frame quality control** are now the primary bottlenecks for meaningful depth refinement gains.
### Optimization Stalls
If `refine_depth` shows `success: false` but `nfev` (evaluations) is high, the optimizer may have hit a flat region or local minimum.
- **Check**: Look at `termination_message` in the JSON output.
- **Fix**: Try enabling `--use-confidence-weights` or checking if the initial ArUco pose is too far off (reprojection error > 2.0).