|
|
|
@@ -0,0 +1,685 @@
|
|
|
|
|
# Robust Depth Refinement for Camera Extrinsics
|
|
|
|
|
|
|
|
|
|
## TL;DR
|
|
|
|
|
|
|
|
|
|
> **Quick Summary**: Replace the failing depth-based pose refinement pipeline with a robust optimizer (`scipy.optimize.least_squares` with soft-L1 loss), add unit hardening, confidence-weighted residuals, best-frame selection, rich diagnostics, and a benchmark matrix comparing configurations.
|
|
|
|
|
>
|
|
|
|
|
> **Deliverables**:
|
|
|
|
|
> - Unit-hardened depth retrieval (set `coordinate_units=METER`, guard double-conversion)
|
|
|
|
|
> - Robust optimization objective using `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`
|
|
|
|
|
> - Confidence-weighted depth residuals (toggleable via CLI flag)
|
|
|
|
|
> - Best-frame selection replacing naive "latest valid frame"
|
|
|
|
|
> - Rich optimizer diagnostics and acceptance gates
|
|
|
|
|
> - Benchmark matrix comparing baseline/robust/+confidence/+best-frame
|
|
|
|
|
> - Updated tests for all new functionality
|
|
|
|
|
>
|
|
|
|
|
> **Estimated Effort**: Medium (3-4 hours implementation)
|
|
|
|
|
> **Parallel Execution**: YES - 2 waves
|
|
|
|
|
> **Critical Path**: Task 1 (units) → Task 2 (robust optimizer) → Task 3 (confidence) → Task 5 (diagnostics) → Task 6 (benchmark)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Context
|
|
|
|
|
|
|
|
|
|
### Original Request
|
|
|
|
|
Implement the 5 items from "Recommended Implementation Order" in `docs/calibrate-extrinsics-workflow.md`, plus research and choose the best optimization method for depth-based camera extrinsic refinement.
|
|
|
|
|
|
|
|
|
|
### Interview Summary
|
|
|
|
|
**Key Discussions**:
|
|
|
|
|
- Requirements were explicitly specified in the documentation (no interactive interview needed)
|
|
|
|
|
- Research confirmed `scipy.optimize.least_squares` is superior to `scipy.optimize.minimize` for this problem class
|
|
|
|
|
|
|
|
|
|
**Research Findings**:
|
|
|
|
|
- **freemocap/anipose** (production multi-camera calibration) uses exactly `least_squares(method="trf", loss=loss, f_scale=threshold)` for bundle adjustment — validates our approach
|
|
|
|
|
- **scipy docs** recommend `soft_l1` or `huber` for robust fitting; `f_scale` controls the inlier/outlier threshold
|
|
|
|
|
- **Current output JSONs** confirm catastrophic failure: RMSE 5000+ meters (`aligned_refined_extrinsics_fast.json`), RMSE ~11.6m (`test_refine_current.json`), iterations=0/1, success=false across all cameras
|
|
|
|
|
- **Unit mismatch** still active despite `/1000.0` conversion — ZED defaults to mm, code divides by 1000, but no `coordinate_units=METER` set
|
|
|
|
|
- **Confidence map** retrieved but only used in verify filtering, not in optimizer objective
|
|
|
|
|
|
|
|
|
|
### Metis Review
|
|
|
|
|
**Identified Gaps** (addressed):
|
|
|
|
|
- Output JSON schema backward compatibility → New fields are additive only (existing fields preserved)
|
|
|
|
|
- Confidence weighting can interact with robust loss → Made toggleable, logged statistics
|
|
|
|
|
- Best-frame selection changes behavior → Deterministic scoring, old behavior available as fallback
|
|
|
|
|
- Zero valid points edge case → Explicit early exit with diagnostic
|
|
|
|
|
- Numerical pass/fail gate → Added RMSE threshold checks
|
|
|
|
|
- Regression guard → Default CLI behavior unchanged unless user opts into new features
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Work Objectives
|
|
|
|
|
|
|
|
|
|
### Core Objective
|
|
|
|
|
Make depth-based extrinsic refinement actually work by fixing the unit mismatch, switching to a robust optimizer, incorporating confidence weighting, and selecting the best frame for refinement.
|
|
|
|
|
|
|
|
|
|
### Concrete Deliverables
|
|
|
|
|
- Modified `aruco/svo_sync.py` with unit hardening
|
|
|
|
|
- Rewritten `aruco/depth_refine.py` using `least_squares` with robust loss
|
|
|
|
|
- Updated `aruco/depth_verify.py` with confidence weight extraction helper
|
|
|
|
|
- Updated `calibrate_extrinsics.py` with frame scoring, diagnostics, new CLI flags
|
|
|
|
|
- New and updated tests in `tests/`
|
|
|
|
|
- Updated `docs/calibrate-extrinsics-workflow.md` with new behavior docs
|
|
|
|
|
|
|
|
|
|
### Definition of Done
|
|
|
|
|
- [x] `uv run pytest` passes with 0 failures
|
|
|
|
|
- [x] Synthetic test: robust optimizer converges (success=True, nfev > 1) with injected outliers
|
|
|
|
|
- [x] Existing tests still pass (backward compatibility)
|
|
|
|
|
- [x] Benchmark matrix produces 4 comparable result records
|
|
|
|
|
|
|
|
|
|
### Must Have
|
|
|
|
|
- `coordinate_units = sl.UNIT.METER` set in SVOReader
|
|
|
|
|
- `least_squares` with `loss="soft_l1"` and `f_scale=0.1` as default optimizer
|
|
|
|
|
- Confidence weighting via `--use-confidence-weights` flag
|
|
|
|
|
- Best-frame selection with deterministic scoring
|
|
|
|
|
- Optimizer diagnostics in output JSON and logs
|
|
|
|
|
- All changes covered by automated tests
|
|
|
|
|
|
|
|
|
|
### Must NOT Have (Guardrails)
|
|
|
|
|
- Must NOT change unrelated calibration logic (marker detection, PnP, pose averaging, alignment)
|
|
|
|
|
- Must NOT change file I/O formats or break JSON schema (only additive fields)
|
|
|
|
|
- Must NOT introduce new dependencies beyond scipy/numpy already in use
|
|
|
|
|
- Must NOT implement multi-optimizer auto-selection or hyperparameter search
|
|
|
|
|
- Must NOT turn frame scoring into a ML quality model — simple weighted heuristic only
|
|
|
|
|
- Must NOT add premature abstractions or over-engineer the API
|
|
|
|
|
- Must NOT remove existing CLI flags or change their default behavior
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Verification Strategy
|
|
|
|
|
|
|
|
|
|
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
|
|
|
|
|
>
|
|
|
|
|
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
|
|
|
|
|
> Every criterion is verified by running `uv run pytest` or inspecting code.
|
|
|
|
|
|
|
|
|
|
### Test Decision
|
|
|
|
|
- **Infrastructure exists**: YES (pytest configured in pyproject.toml, tests/ directory)
|
|
|
|
|
- **Automated tests**: YES (tests-after, matching existing project pattern)
|
|
|
|
|
- **Framework**: pytest (via `uv run pytest`)
|
|
|
|
|
|
|
|
|
|
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
|
|
|
|
|
|
|
|
|
|
**Verification Tool by Deliverable Type:**
|
|
|
|
|
|
|
|
|
|
| Type | Tool | How Agent Verifies |
|
|
|
|
|
|------|------|-------------------|
|
|
|
|
|
| Python module changes | Bash (`uv run pytest`) | Run tests, assert 0 failures |
|
|
|
|
|
| New functions | Bash (`uv run pytest -k test_name`) | Run specific test, assert pass |
|
|
|
|
|
| CLI behavior | Bash (`uv run python calibrate_extrinsics.py --help`) | Verify new flags present |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Execution Strategy
|
|
|
|
|
|
|
|
|
|
### Parallel Execution Waves
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Wave 1 (Start Immediately):
|
|
|
|
|
├── Task 1: Unit hardening (svo_sync.py) [no dependencies]
|
|
|
|
|
└── Task 4: Best-frame selection (calibrate_extrinsics.py) [no dependencies]
|
|
|
|
|
|
|
|
|
|
Wave 2 (After Wave 1):
|
|
|
|
|
├── Task 2: Robust optimizer (depth_refine.py) [depends: 1]
|
|
|
|
|
├── Task 3: Confidence weighting (depth_verify.py + depth_refine.py) [depends: 2]
|
|
|
|
|
└── Task 5: Diagnostics and acceptance gates [depends: 2]
|
|
|
|
|
|
|
|
|
|
Wave 3 (After Wave 2):
|
|
|
|
|
└── Task 6: Benchmark matrix [depends: 2, 3, 4, 5]
|
|
|
|
|
|
|
|
|
|
Wave 4 (After All):
|
|
|
|
|
└── Task 7: Documentation update [depends: all]
|
|
|
|
|
|
|
|
|
|
Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 6
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Dependency Matrix
|
|
|
|
|
|
|
|
|
|
| Task | Depends On | Blocks | Can Parallelize With |
|
|
|
|
|
|------|------------|--------|---------------------|
|
|
|
|
|
| 1 | None | 2, 3 | 4 |
|
|
|
|
|
| 2 | 1 | 3, 5, 6 | - |
|
|
|
|
|
| 3 | 2 | 6 | 5 |
|
|
|
|
|
| 4 | None | 6 | 1 |
|
|
|
|
|
| 5 | 2 | 6 | 3 |
|
|
|
|
|
| 6 | 2, 3, 4, 5 | 7 | - |
|
|
|
|
|
| 7 | All | None | - |
|
|
|
|
|
|
|
|
|
|
### Agent Dispatch Summary
|
|
|
|
|
|
|
|
|
|
| Wave | Tasks | Recommended Agents |
|
|
|
|
|
|------|-------|-------------------|
|
|
|
|
|
| 1 | 1, 4 | `category="quick"` for T1; `category="unspecified-low"` for T4 |
|
|
|
|
|
| 2 | 2, 3, 5 | `category="deep"` for T2; `category="quick"` for T3, T5 |
|
|
|
|
|
| 3 | 6 | `category="unspecified-low"` |
|
|
|
|
|
| 4 | 7 | `category="writing"` |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## TODOs
|
|
|
|
|
|
|
|
|
|
- [x] 1. Unit Hardening (P0)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- In `aruco/svo_sync.py`, add `init_params.coordinate_units = sl.UNIT.METER` in the `SVOReader.__init__` method, right after `init_params.set_from_svo_file(path)` (around line 42)
|
|
|
|
|
- Guard the existing `/1000.0` conversion: check whether `coordinate_units` is already METER. If METER is set, skip the division. If not set or MILLIMETER, apply the division. Add a log warning if division is applied as fallback
|
|
|
|
|
- Add depth sanity logging under `--debug` mode: after retrieving depth, log `min/median/max/p95` of valid depth values. This goes in the `_retrieve_depth` method
|
|
|
|
|
- Write a test that verifies the unit-hardened path doesn't double-convert
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT change depth retrieval for confidence maps
|
|
|
|
|
- Do NOT modify the `grab_synced()` or `grab_all()` methods
|
|
|
|
|
- Do NOT add new CLI parameters for this task
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `quick`
|
|
|
|
|
- Reason: Small, focused change in one file + one test file
|
|
|
|
|
- **Skills**: [`git-master`]
|
|
|
|
|
- `git-master`: Atomic commit of unit hardening change
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: YES
|
|
|
|
|
- **Parallel Group**: Wave 1 (with Task 4)
|
|
|
|
|
- **Blocks**: Tasks 2, 3
|
|
|
|
|
- **Blocked By**: None
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References** (existing code to follow):
|
|
|
|
|
- `aruco/svo_sync.py:40-44` — Current `init_params` setup where `coordinate_units` must be added
|
|
|
|
|
- `aruco/svo_sync.py:180-189` — Current `_retrieve_depth` method with `/1000.0` conversion to modify
|
|
|
|
|
- `aruco/svo_sync.py:191-196` — Confidence retrieval pattern (do NOT modify, but understand adjacency)
|
|
|
|
|
|
|
|
|
|
**API/Type References** (contracts to implement against):
|
|
|
|
|
- ZED SDK `InitParameters.coordinate_units` — Set to `sl.UNIT.METER`
|
|
|
|
|
- `loguru.logger` — Used project-wide for debug logging
|
|
|
|
|
|
|
|
|
|
**Test References** (testing patterns to follow):
|
|
|
|
|
- `tests/test_depth_verify.py:36-66` — Test pattern using synthetic depth maps (follow this style)
|
|
|
|
|
- `tests/test_depth_refine.py:21-39` — Test pattern with synthetic K matrix and depth maps
|
|
|
|
|
|
|
|
|
|
**Documentation References**:
|
|
|
|
|
- `docs/calibrate-extrinsics-workflow.md:116-132` — Documents the unit mismatch problem and mitigation strategy
|
|
|
|
|
- `docs/calibrate-extrinsics-workflow.md:166-169` — Specifies the exact implementation steps for unit hardening
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] `init_params.coordinate_units = sl.UNIT.METER` is set in SVOReader.__init__ before `cam.open()`
|
|
|
|
|
- [ ] The `/1000.0` division in `_retrieve_depth` is guarded (only applied if units are NOT meters)
|
|
|
|
|
- [ ] Debug logging of depth statistics (min/median/max) is added to `_retrieve_depth` when depth mode is active
|
|
|
|
|
- [ ] `uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q` → all pass (no regressions)
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: Verify unit hardening doesn't break existing tests
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Preconditions: All dependencies installed
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
3. Assert: output contains "passed" and no "FAILED"
|
|
|
|
|
Expected Result: All existing tests pass
|
|
|
|
|
Evidence: Terminal output captured
|
|
|
|
|
|
|
|
|
|
Scenario: Verify coordinate_units is set in code
|
|
|
|
|
Tool: Bash (grep)
|
|
|
|
|
Preconditions: File modified
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: grep -n "coordinate_units" aruco/svo_sync.py
|
|
|
|
|
2. Assert: output contains "UNIT.METER" or "METER"
|
|
|
|
|
Expected Result: Unit setting is present
|
|
|
|
|
Evidence: Grep output
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion`
|
|
|
|
|
- Files: `aruco/svo_sync.py`, `tests/test_depth_refine.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 2. Robust Optimizer — Replace MSE with `least_squares` + Soft-L1 Loss (P0)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- **Rewrite `depth_residual_objective`** → Replace with a **residual vector function** `depth_residuals(params, ...)` that returns an array of residuals (not a scalar cost). Each element is `(z_measured - z_predicted)` for one marker corner. This is what `least_squares` expects.
|
|
|
|
|
- **Add regularization as pseudo-residuals**: Append `[reg_weight_rot * delta_rvec, reg_weight_trans * delta_tvec]` to the residual vector. This naturally penalizes deviation from the initial pose. Split into separate rotation and translation regularization weights (default: `reg_rot=0.1`, `reg_trans=1.0` — translation more tightly regularized in meters scale).
|
|
|
|
|
- **Replace `minimize(method="L-BFGS-B")` with `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`**:
|
|
|
|
|
- `method="trf"` — Trust Region Reflective, handles bounds naturally
|
|
|
|
|
- `loss="soft_l1"` — Smooth robust loss, downweights outliers beyond `f_scale`
|
|
|
|
|
- `f_scale=0.1` — Residuals >0.1m are treated as outliers (matches ZED depth noise ~1-5cm)
|
|
|
|
|
- `bounds` — Same ±5°/±5cm bounds, expressed as `(lower_bounds_array, upper_bounds_array)` tuple
|
|
|
|
|
- `x_scale="jac"` — Automatic Jacobian-based scaling (prevents ill-conditioning)
|
|
|
|
|
- `max_nfev=200` — Maximum function evaluations
|
|
|
|
|
- **Update `refine_extrinsics_with_depth` signature**: Add parameters for `loss`, `f_scale`, `reg_rot`, `reg_trans`. Keep backward-compatible defaults. Return enriched stats dict including: `termination_message`, `nfev`, `optimality`, `active_mask`, `cost`.
|
|
|
|
|
- **Handle zero residuals**: If residual vector is empty (no valid depth points), return initial pose unchanged with stats indicating `"reason": "no_valid_depth_points"`.
|
|
|
|
|
- **Maintain backward-compatible scalar cost reporting**: Compute `initial_cost` and `final_cost` from the residual vector for comparison with old output format.
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT change `extrinsics_to_params` or `params_to_extrinsics` (the Rodrigues parameterization is correct)
|
|
|
|
|
- Do NOT modify `depth_verify.py` in this task
|
|
|
|
|
- Do NOT add confidence weighting here (that's Task 3)
|
|
|
|
|
- Do NOT add CLI flags here (that's Task 5)
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `deep`
|
|
|
|
|
- Reason: Core algorithmic change, requires understanding of optimization theory and careful residual construction
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
- No specialized skills needed — pure Python/numpy/scipy work
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: NO
|
|
|
|
|
- **Parallel Group**: Wave 2 (sequential after Wave 1)
|
|
|
|
|
- **Blocks**: Tasks 3, 5, 6
|
|
|
|
|
- **Blocked By**: Task 1
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References** (existing code to follow):
|
|
|
|
|
- `aruco/depth_refine.py:19-47` — Current `depth_residual_objective` function to REPLACE
|
|
|
|
|
- `aruco/depth_refine.py:50-112` — Current `refine_extrinsics_with_depth` function to REWRITE
|
|
|
|
|
- `aruco/depth_refine.py:1-16` — Import block and helper functions (keep `extrinsics_to_params`, `params_to_extrinsics`)
|
|
|
|
|
- `aruco/depth_verify.py:27-67` — `compute_depth_residual` function — this is the per-point residual computation called from the objective. Understand its contract: returns `float(z_measured - z_predicted)` or `None`.
|
|
|
|
|
|
|
|
|
|
**API/Type References**:
|
|
|
|
|
- `scipy.optimize.least_squares` — [scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html): `fun(x, *args) -> residuals_array`; parameters: `method="trf"`, `loss="soft_l1"`, `f_scale=0.1`, `bounds=(lb, ub)`, `x_scale="jac"`, `max_nfev=200`
|
|
|
|
|
- Return type: `OptimizeResult` with attributes: `.x`, `.cost`, `.fun`, `.jac`, `.grad`, `.optimality`, `.active_mask`, `.nfev`, `.njev`, `.status`, `.message`, `.success`
|
|
|
|
|
|
|
|
|
|
**External References** (production examples):
|
|
|
|
|
- `freemocap/anipose` bundle_adjust method — Uses `least_squares(error_fun, x0, jac_sparsity=jac_sparse, f_scale=f_scale, x_scale="jac", loss=loss, ftol=ftol, method="trf", tr_solver="lsmr")` for multi-camera calibration. Key pattern: residual function returns per-point reprojection errors.
|
|
|
|
|
- scipy Context7 docs — Example shows `least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train))` where `fun` returns residual vector
|
|
|
|
|
|
|
|
|
|
**Test References**:
|
|
|
|
|
- `tests/test_depth_refine.py` — ALL 4 existing tests must still pass. They test: roundtrip, no-change convergence, offset correction, and bounds respect. The new optimizer must satisfy these same properties.
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] `from scipy.optimize import least_squares` replaces `from scipy.optimize import minimize`
|
|
|
|
|
- [ ] `depth_residuals()` returns `np.ndarray` (vector), not scalar float
|
|
|
|
|
- [ ] `least_squares(method="trf", loss="soft_l1", f_scale=0.1)` is the optimizer call
|
|
|
|
|
- [ ] Regularization is split: separate `reg_rot` and `reg_trans` weights, appended as pseudo-residuals
|
|
|
|
|
- [ ] Stats dict includes: `termination_message`, `nfev`, `optimality`, `cost`
|
|
|
|
|
- [ ] Zero-residual case returns initial pose with `reason: "no_valid_depth_points"`
|
|
|
|
|
- [ ] `uv run pytest tests/test_depth_refine.py -q` → all 4 existing tests pass
|
|
|
|
|
- [ ] New test: synthetic data with 30% outlier depths → robust optimizer converges (success=True, nfev > 1) with lower median residual than would occur with pure MSE
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: All existing depth_refine tests pass after rewrite
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Preconditions: Task 1 completed, aruco/depth_refine.py rewritten
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_depth_refine.py -v
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
3. Assert: output contains "4 passed"
|
|
|
|
|
Expected Result: All 4 existing tests pass
|
|
|
|
|
Evidence: Terminal output captured
|
|
|
|
|
|
|
|
|
|
Scenario: Robust optimizer handles outliers better than MSE
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Preconditions: New test added
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_depth_refine.py::test_robust_loss_handles_outliers -v
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
3. Assert: test passes
|
|
|
|
|
Expected Result: With 30% outliers, robust optimizer has lower median abs residual
|
|
|
|
|
Evidence: Terminal output captured
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer`
|
|
|
|
|
- Files: `aruco/depth_refine.py`, `tests/test_depth_refine.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/test_depth_refine.py -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 3. Confidence-Weighted Depth Residuals (P0)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- **Add confidence weight extraction helper** to `aruco/depth_verify.py`: Create a function `get_confidence_weight(confidence_map, u, v, confidence_thresh=50) -> float` that returns a normalized weight in [0, 1]. ZED confidence: [1, 100] where higher = LESS confident. Normalize as `max(0, (confidence_thresh - conf_value)) / confidence_thresh`. Values above threshold → weight 0. Clamp to `[eps, 1.0]` where eps=1e-6.
|
|
|
|
|
- **Update `depth_residuals()` in `aruco/depth_refine.py`**: Accept optional `confidence_map` and `confidence_thresh` parameters. If confidence_map is provided, multiply each depth residual by `sqrt(weight)` before returning. This implements weighted least squares within the `least_squares` framework.
|
|
|
|
|
- **Update `refine_extrinsics_with_depth` signature**: Add `confidence_map=None`, `confidence_thresh=50` parameters. Pass through to `depth_residuals()`.
|
|
|
|
|
- **Update `calibrate_extrinsics.py`**: Pass `confidence_map=frame.confidence_map` and `confidence_thresh=depth_confidence_threshold` to `refine_extrinsics_with_depth` when confidence weighting is requested
|
|
|
|
|
- **Add `--use-confidence-weights/--no-confidence-weights` CLI flag** (default: False for backward compatibility)
|
|
|
|
|
- **Log confidence statistics** under `--debug`: After computing weights, log `n_zero_weight`, `mean_weight`, `median_weight`
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT change the verification logic in `verify_extrinsics_with_depth` (it already uses confidence correctly)
|
|
|
|
|
- Do NOT change confidence semantics (higher ZED value = less confident)
|
|
|
|
|
- Do NOT make confidence weighting the default behavior
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `quick`
|
|
|
|
|
- Reason: Adding parameters and weight multiplication — straightforward plumbing
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: NO (depends on Task 2)
|
|
|
|
|
- **Parallel Group**: Wave 2 (after Task 2)
|
|
|
|
|
- **Blocks**: Task 6
|
|
|
|
|
- **Blocked By**: Task 2
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References**:
|
|
|
|
|
- `aruco/depth_verify.py:82-96` — Existing confidence handling pattern (filtering, NOT weighting). Follow this semantics but produce a continuous weight instead of binary skip
|
|
|
|
|
- `aruco/depth_verify.py:93-95` — ZED confidence semantics: "Higher confidence value means LESS confident... Range [1, 100], where 100 is typically occlusion/invalid"
|
|
|
|
|
- `aruco/depth_refine.py` — Updated in Task 2 with `depth_residuals()` function. Add `confidence_map` parameter here
|
|
|
|
|
- `calibrate_extrinsics.py:136-148` — Current call site for `refine_extrinsics_with_depth`. Add confidence_map/thresh forwarding
|
|
|
|
|
|
|
|
|
|
**Test References**:
|
|
|
|
|
- `tests/test_depth_verify.py:69-84` — Test pattern for `compute_marker_corner_residuals`. Follow for confidence weight test
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] `get_confidence_weight()` function exists in `depth_verify.py`
|
|
|
|
|
- [ ] Confidence weighting is off by default (backward compatible)
|
|
|
|
|
- [ ] `--use-confidence-weights` flag exists in CLI
|
|
|
|
|
- [ ] Low-confidence points have lower influence on optimization (verified by test)
|
|
|
|
|
- [ ] `uv run pytest tests/ -q` → all pass
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: Confidence weighting reduces outlier influence
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_depth_refine.py::test_confidence_weighting -v
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
Expected Result: With low-confidence outlier points, weighted optimizer ignores them
|
|
|
|
|
Evidence: Terminal output
|
|
|
|
|
|
|
|
|
|
Scenario: CLI flag exists
|
|
|
|
|
Tool: Bash
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run python calibrate_extrinsics.py --help | grep -i confidence-weight
|
|
|
|
|
2. Assert: output contains "--use-confidence-weights"
|
|
|
|
|
Expected Result: Flag is available
|
|
|
|
|
Evidence: Help text
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag`
|
|
|
|
|
- Files: `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 4. Best-Frame Selection (P1)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- **Create `score_frame_quality()` function** in `calibrate_extrinsics.py` (or a new `aruco/frame_scoring.py` if cleaner). The function takes: `n_markers: int`, `reproj_error: float`, `depth_map: np.ndarray`, `marker_corners_world: Dict[int, np.ndarray]`, `T_world_cam: np.ndarray`, `K: np.ndarray` and returns a float score (higher = better).
|
|
|
|
|
- **Scoring formula**: `score = w_markers * n_markers + w_reproj * (1 / (reproj_error + eps)) + w_depth * valid_depth_ratio`
|
|
|
|
|
- `w_markers = 1.0` — more markers = better constraint
|
|
|
|
|
- `w_reproj = 5.0` — lower reprojection error = more accurate PnP
|
|
|
|
|
- `w_depth = 3.0` — higher ratio of valid depth at marker locations = better depth signal
|
|
|
|
|
- `valid_depth_ratio = n_valid_depths / n_total_corners`
|
|
|
|
|
- `eps = 1e-6` to avoid division by zero
|
|
|
|
|
- **Replace "last valid frame" logic** in `calibrate_extrinsics.py`: Instead of overwriting `verification_frames[serial]` every time (line 467-471), track ALL valid frames per camera with their scores. After the processing loop, select the frame with the highest score.
|
|
|
|
|
- **Log selected frame**: Under `--debug`, log the chosen frame index, score, and component breakdown for each camera
|
|
|
|
|
- **Ensure deterministic tiebreaking**: If scores are equal, pick the frame with the lower frame_index (earliest)
|
|
|
|
|
- **Keep frame storage bounded**: Store at most `max_stored_frames=10` candidates per camera (configurable), keeping the top-scoring ones
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT add ML-based frame scoring
|
|
|
|
|
- Do NOT change the frame grabbing/syncing logic
|
|
|
|
|
- Do NOT add new dependencies
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `unspecified-low`
|
|
|
|
|
- Reason: New functionality but straightforward heuristic
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: YES
|
|
|
|
|
- **Parallel Group**: Wave 1 (with Task 1)
|
|
|
|
|
- **Blocks**: Task 6
|
|
|
|
|
- **Blocked By**: None
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References**:
|
|
|
|
|
- `calibrate_extrinsics.py:463-471` — Current "last valid frame" logic to REPLACE. Currently: `verification_frames[serial] = {"frame": frame, "ids": ids, "corners": corners}`
|
|
|
|
|
- `calibrate_extrinsics.py:452-478` — Full frame processing context (pose estimation, accumulation, frame caching)
|
|
|
|
|
- `aruco/depth_verify.py:27-67` — `compute_depth_residual` can be used to check valid depth at marker locations for scoring
|
|
|
|
|
|
|
|
|
|
**Test References**:
|
|
|
|
|
- `tests/test_depth_cli_postprocess.py` — Test pattern for calibrate_extrinsics functions
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] `score_frame_quality()` function exists and returns a float
|
|
|
|
|
- [ ] Best frame is selected (not last frame) for each camera
|
|
|
|
|
- [ ] Scoring is deterministic (same inputs → same selected frame)
|
|
|
|
|
- [ ] Frame selection metadata is logged under `--debug`
|
|
|
|
|
- [ ] `uv run pytest tests/ -q` → all pass (no regressions)
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: Frame scoring is deterministic
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_frame_scoring.py -v
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
Expected Result: Same inputs always produce same score and selection
|
|
|
|
|
Evidence: Terminal output
|
|
|
|
|
|
|
|
|
|
Scenario: Higher marker count increases score
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_frame_scoring.py::test_more_markers_higher_score -v
|
|
|
|
|
2. Assert: exit code 0
|
|
|
|
|
Expected Result: Frame with more markers scores higher
|
|
|
|
|
Evidence: Terminal output
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `feat(calibrate): replace naive frame selection with quality-scored best-frame`
|
|
|
|
|
- Files: `calibrate_extrinsics.py`, `tests/test_frame_scoring.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 5. Diagnostics and Acceptance Gates (P1)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- **Enrich `refine_extrinsics_with_depth` stats dict**: The `least_squares` result (from Task 2) already provides `.status`, `.message`, `.nfev`, `.njev`, `.optimality`, `.active_mask`. Surface these in the returned stats dict as: `termination_status` (int), `termination_message` (str), `nfev` (int), `njev` (int), `optimality` (float), `n_active_bounds` (int, count of parameters at bound limits).
|
|
|
|
|
- **Add effective valid points count**: Log how many marker corners had valid (finite, positive) depth, and how many were used after confidence filtering. Add to stats: `n_depth_valid`, `n_confidence_filtered`.
|
|
|
|
|
- **Add RMSE improvement gate**: If `improvement_rmse < 1e-4` AND `nfev > 5`, log WARNING: "Refinement converged with negligible improvement — consider checking depth data quality"
|
|
|
|
|
- **Add failure diagnostic**: If `success == False` or `nfev <= 1`, log WARNING with termination message and suggest checking depth unit consistency
|
|
|
|
|
- **Log optimizer progress under `--debug`**: Before and after optimization, log: initial cost, final cost, delta_rotation, delta_translation, termination message, number of function evaluations
|
|
|
|
|
- **Surface diagnostics in JSON output**: Add fields to `refine_depth` dict in output JSON: `termination_status`, `termination_message`, `nfev`, `n_valid_points`, `loss_function`, `f_scale`
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT add automated "redo with different params" logic
|
|
|
|
|
- Do NOT add email/notification alerts
|
|
|
|
|
- Do NOT change the optimization algorithm or parameters (already done in Task 2)
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `quick`
|
|
|
|
|
- Reason: Adding logging and dict fields — no algorithmic changes
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: YES (with Task 3)
|
|
|
|
|
- **Parallel Group**: Wave 2
|
|
|
|
|
- **Blocks**: Task 6
|
|
|
|
|
- **Blocked By**: Task 2
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References**:
|
|
|
|
|
- `aruco/depth_refine.py:103-111` — Current stats dict construction (to EXTEND, not replace)
|
|
|
|
|
- `calibrate_extrinsics.py:159-181` — Current refinement result logging and JSON field assignment
|
|
|
|
|
- `loguru.logger` — Project uses loguru for structured logging
|
|
|
|
|
|
|
|
|
|
**API/Type References**:
|
|
|
|
|
- `scipy.optimize.OptimizeResult` — `.status` (int: 1=convergence, 0=max_nfev, -1=improper), `.message` (str), `.nfev`, `.njev`, `.optimality` (gradient infinity norm)
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] Stats dict contains: `termination_status`, `termination_message`, `nfev`, `n_valid_points`
|
|
|
|
|
- [ ] Output JSON `refine_depth` section contains diagnostic fields
|
|
|
|
|
- [ ] WARNING log emitted when improvement < 1e-4 with nfev > 5
|
|
|
|
|
- [ ] WARNING log emitted when success=False or nfev <= 1
|
|
|
|
|
- [ ] `uv run pytest tests/ -q` → all pass
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: Diagnostics present in refine stats
|
|
|
|
|
Tool: Bash (uv run pytest)
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run pytest tests/test_depth_refine.py -v
|
|
|
|
|
2. Assert: All tests pass
|
|
|
|
|
3. Check that stats dict from refine function contains "termination_message" key
|
|
|
|
|
Expected Result: Diagnostics are in stats output
|
|
|
|
|
Evidence: Terminal output
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `feat(refine): add rich optimizer diagnostics and acceptance gates`
|
|
|
|
|
- Files: `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 6. Benchmark Matrix (P1)
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- **Add `--benchmark-matrix` flag** to `calibrate_extrinsics.py` CLI
|
|
|
|
|
- **When enabled**, run the depth refinement pipeline 4 times per camera with different configurations:
|
|
|
|
|
1. **baseline**: `loss="linear"` (no robust loss), no confidence weights
|
|
|
|
|
2. **robust**: `loss="soft_l1"`, `f_scale=0.1`, no confidence weights
|
|
|
|
|
3. **robust+confidence**: `loss="soft_l1"`, `f_scale=0.1`, confidence weighting ON
|
|
|
|
|
4. **robust+confidence+best-frame**: Same as #3 but using best-frame selection
|
|
|
|
|
- **Output**: For each configuration, report per-camera: pre-refinement RMSE, post-refinement RMSE, improvement, iteration count, success/failure, termination reason
|
|
|
|
|
- **Format**: Print a formatted table to stdout (using click.echo) AND save to a benchmark section in the output JSON
|
|
|
|
|
- **Implementation**: Create a helper function `run_benchmark_matrix(T_initial, marker_corners_world, depth_map, K, confidence_map, ...)` that returns a list of result dicts
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT implement automated configuration tuning
|
|
|
|
|
- Do NOT add visualization/plotting dependencies
|
|
|
|
|
- Do NOT change the default (non-benchmark) codepath behavior
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `unspecified-low`
|
|
|
|
|
- Reason: Orchestration code, calling existing functions with different params
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: NO (depends on all previous tasks)
|
|
|
|
|
- **Parallel Group**: Wave 3 (after all)
|
|
|
|
|
- **Blocks**: Task 7
|
|
|
|
|
- **Blocked By**: Tasks 2, 3, 4, 5
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References**:
|
|
|
|
|
- `calibrate_extrinsics.py:73-196` — `apply_depth_verify_refine_postprocess` function. The benchmark matrix calls this logic with varied parameters
|
|
|
|
|
- `aruco/depth_refine.py` — Updated `refine_extrinsics_with_depth` with `loss`, `f_scale`, `confidence_map` params
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] `--benchmark-matrix` flag exists in CLI
|
|
|
|
|
- [ ] When enabled, 4 configurations are run per camera
|
|
|
|
|
- [ ] Output table is printed to stdout
|
|
|
|
|
- [ ] Benchmark results are in output JSON under `benchmark` key
|
|
|
|
|
- [ ] `uv run pytest tests/ -q` → all pass
|
|
|
|
|
|
|
|
|
|
**Agent-Executed QA Scenarios:**
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Scenario: Benchmark flag in CLI help
|
|
|
|
|
Tool: Bash
|
|
|
|
|
Steps:
|
|
|
|
|
1. Run: uv run python calibrate_extrinsics.py --help | grep benchmark
|
|
|
|
|
2. Assert: output contains "--benchmark-matrix"
|
|
|
|
|
Expected Result: Flag is present
|
|
|
|
|
Evidence: Help text output
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `feat(calibrate): add --benchmark-matrix for comparing refinement configurations`
|
|
|
|
|
- Files: `calibrate_extrinsics.py`, `tests/test_benchmark.py`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
- [x] 7. Documentation Update
|
|
|
|
|
|
|
|
|
|
**What to do**:
|
|
|
|
|
- Update `docs/calibrate-extrinsics-workflow.md`:
|
|
|
|
|
- Add new CLI flags: `--use-confidence-weights`, `--benchmark-matrix`
|
|
|
|
|
- Update "Depth Verification & Refinement" section with new optimizer details
|
|
|
|
|
- Update "Refinement" section: document `least_squares` with `soft_l1` loss, `f_scale`, confidence weighting
|
|
|
|
|
- Add "Best-Frame Selection" section explaining the scoring formula
|
|
|
|
|
- Add "Diagnostics" section documenting new output JSON fields
|
|
|
|
|
- Update "Example Workflow" commands to show new flags
|
|
|
|
|
- Mark the "Known Unexpected Behavior" unit mismatch section as RESOLVED with the fix description
|
|
|
|
|
|
|
|
|
|
**Must NOT do**:
|
|
|
|
|
- Do NOT rewrite unrelated documentation sections
|
|
|
|
|
- Do NOT add tutorial-style content
|
|
|
|
|
|
|
|
|
|
**Recommended Agent Profile**:
|
|
|
|
|
- **Category**: `writing`
|
|
|
|
|
- Reason: Pure documentation writing
|
|
|
|
|
- **Skills**: []
|
|
|
|
|
|
|
|
|
|
**Parallelization**:
|
|
|
|
|
- **Can Run In Parallel**: NO
|
|
|
|
|
- **Parallel Group**: Wave 4 (final)
|
|
|
|
|
- **Blocks**: None
|
|
|
|
|
- **Blocked By**: All previous tasks
|
|
|
|
|
|
|
|
|
|
**References**:
|
|
|
|
|
|
|
|
|
|
**Pattern References**:
|
|
|
|
|
- `docs/calibrate-extrinsics-workflow.md` — Entire file. Follow existing section structure and formatting
|
|
|
|
|
|
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
|
|
|
|
|
|
- [ ] New CLI flags documented
|
|
|
|
|
- [ ] `least_squares` optimizer documented with parameter explanations
|
|
|
|
|
- [ ] Best-frame selection documented
|
|
|
|
|
- [ ] Unit mismatch section updated as resolved
|
|
|
|
|
- [ ] Example commands include new flags
|
|
|
|
|
|
|
|
|
|
**Commit**: YES
|
|
|
|
|
- Message: `docs: update calibrate-extrinsics-workflow for robust refinement changes`
|
|
|
|
|
- Files: `docs/calibrate-extrinsics-workflow.md`
|
|
|
|
|
- Pre-commit: `uv run pytest tests/ -q`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Commit Strategy
|
|
|
|
|
|
|
|
|
|
| After Task | Message | Files | Verification |
|
|
|
|
|
|------------|---------|-------|--------------|
|
|
|
|
|
| 1 | `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion` | `aruco/svo_sync.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 2 | `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer` | `aruco/depth_refine.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 3 | `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag` | `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 4 | `feat(calibrate): replace naive frame selection with quality-scored best-frame` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 5 | `feat(refine): add rich optimizer diagnostics and acceptance gates` | `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 6 | `feat(calibrate): add --benchmark-matrix for comparing refinement configurations` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
|
|
|
|
|
| 7 | `docs: update calibrate-extrinsics-workflow for robust refinement changes` | `docs/calibrate-extrinsics-workflow.md` | `uv run pytest tests/ -q` |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Success Criteria
|
|
|
|
|
|
|
|
|
|
### Verification Commands
|
|
|
|
|
```bash
|
|
|
|
|
uv run pytest tests/ -q # Expected: all pass, 0 failures
|
|
|
|
|
uv run pytest tests/test_depth_refine.py -v # Expected: all tests pass including new robust/confidence tests
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Final Checklist
|
|
|
|
|
- [x] All "Must Have" items present
|
|
|
|
|
- [x] All "Must NOT Have" items absent
|
|
|
|
|
- [x] All tests pass (`uv run pytest tests/ -q`)
|
|
|
|
|
- [x] Output JSON backward compatible (existing fields preserved, new fields additive)
|
|
|
|
|
- [x] Default CLI behavior unchanged (new features opt-in)
|
|
|
|
|
- [x] Optimizer actually converges on synthetic test data (success=True, nfev > 1)
|