From ddb7054f96e8f891cd5f63888fba084870f0a98e Mon Sep 17 00:00:00 2001 From: crosstyan Date: Sat, 7 Feb 2026 04:13:46 +0000 Subject: [PATCH] docs(calibration): update findings summary and troubleshooting for depth refinement --- .../docs/calibrate-extrinsics-workflow.md | 89 +++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/py_workspace/docs/calibrate-extrinsics-workflow.md b/py_workspace/docs/calibrate-extrinsics-workflow.md index e3ce6ef..34bef94 100644 --- a/py_workspace/docs/calibrate-extrinsics-workflow.md +++ b/py_workspace/docs/calibrate-extrinsics-workflow.md @@ -101,3 +101,92 @@ uv run calibrate_extrinsics.py \ --debug \ --no-preview ``` + +## Known Unexpected Behavior / Troubleshooting + +### Depth Refinement Failure (Unit Mismatch) + +**Symptoms:** +- `depth_verify` reports extremely large RMSE values (e.g., > 1000). +- `refine_depth` reports `success: false`, `iterations: 0`, and near-zero improvement. +- The optimization fails to converge or produces nonsensical results. + +**Root Cause:** +The ZED SDK `retrieve_measure(sl.MEASURE.DEPTH)` returns depth values in the unit defined by `InitParameters.coordinate_units`. The default is **MILLIMETERS**. However, the calibration system (extrinsics, marker geometry) operates in **METERS**. + +This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver. + +**Mitigation:** +The `SVOReader` class in `aruco/svo_sync.py` explicitly converts the retrieved depth map to meters: +```python +# aruco/svo_sync.py +return depth_data / 1000.0 +``` +This ensures that all geometric math downstream remains consistent in meters. + +**Diagnostic Check:** +If you suspect a unit mismatch, check the `depth_verify` RMSE in the output JSON. +- **Healthy:** RMSE < 0.5 (meters) +- **Mismatch:** RMSE > 100 (likely millimeters) + +*Note: Confidence filtering (`--depth-confidence-threshold`) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.* + +## Findings Summary (2026-02-07 exhaustive search) + +This section summarizes the latest deep investigation across local code, outputs, and external docs. + +### Confirmed Facts + +1. **Marker geometry parquet is in meters** + - `aruco/markers/standard_box_markers_600mm.parquet` stores values around `0.3` (meters), not `300` (millimeters). + - `docs/marker-parquet-format.md` also documents meter-scale coordinates. + +2. **Depth unit contract is still fragile** + - ZED defaults to millimeters unless `InitParameters.coordinate_units` is explicitly set. + - Current reader path converts depth by dividing by `1000.0` in `aruco/svo_sync.py`. + - This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere. + +3. **Observed runtime behavior still indicates refinement instability** + - Existing outputs (for example `output/aligned_refined_extrinsics*.json`) show very large `depth_verify.rmse`, often `refine_depth.success: false`, `iterations: 0`, and negligible improvement. + - This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative. + +4. **Current refinement objective is not robust enough** + - Objective is plain squared depth residuals + simple regularization. + - It does **not** currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics. + +### Likely Contributors to Poor Refinement + +- Depth outliers are not sufficiently down-weighted in optimization. +- Confidence map is used for verification filtering, but not as residual weights in the optimizer objective. +- Representative frame choice uses the latest valid frame, not necessarily the best-quality frame. +- Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization". + +### Recommended Implementation Order (for next session) + +1. **Unit hardening (P0)** + - Explicitly set `init_params.coordinate_units = sl.UNIT.METER` in SVO reader. + - Remove or guard manual `/1000.0` conversion to avoid double-scaling risk. + - Add depth sanity logs (min/median/max sampled depth) under `--debug`. + +2. **Robust objective (P0)** + - Replace MSE-only residual with Huber (or Soft-L1) in meters. + - Add confidence-weighted depth residuals in objective function. + - Split translation/rotation regularization coefficients. + +3. **Frame quality selection (P1)** + - Replace "latest valid frame" with best-frame scoring: + - marker count (higher better) + - median reprojection error (lower better) + - valid depth ratio (higher better) + +4. **Diagnostics and acceptance gates (P1)** + - Log optimizer termination reason, gradient/step behavior, and effective valid points. + - Treat tiny RMSE changes as "no effective refinement" even if optimizer returns. + +5. **Benchmark matrix (P1)** + - Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame. + - Report per-camera pre/post RMSE, iteration count, and success/failure reason. + +### Practical note + +The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that **robust objective design and frame quality control** are now the primary bottlenecks for meaningful depth refinement gains.