feat(calibration): robust depth refinement pipeline with diagnostics and benchmarking

This commit is contained in:
2026-02-07 05:51:07 +00:00
parent ead3796cdb
commit dad1f2a69f
17 changed files with 1876 additions and 261 deletions
+161 -96
View File
@@ -1,113 +1,178 @@
# Agent Context & Reminders
# AGENTS.md — Repository Guide for Coding Agents
## ZED SDK Architecture
This file is for autonomous/agentic coding in `/workspaces/zed-playground`.
Primary active Python workspace: `/workspaces/zed-playground/py_workspace`.
### Streaming API vs Fusion API
---
The ZED SDK provides two distinct network APIs that are often confused:
## 1) Environment & Scope
| Feature | Streaming API | Fusion API |
|---------|---------------|------------|
| **Data Transmitted** | Compressed video (H264/H265) | Metadata only (bodies, objects, poses) |
| **Bandwidth** | 10-40 Mbps | <100 Kbps |
| **Edge Compute** | Video encoding only | Full depth NN + tracking + detection |
| **Host Compute** | Full depth + tracking + detection | Lightweight fusion only |
| **API Methods** | `enableStreaming()` / `setFromStream()` | `startPublishing()` / `subscribe()` |
- Python package manager: **uv**
- Python version: **3.12+**
- Core deps (py_workspace): `pyzed`, `opencv-python`, `click`, `numpy`, `scipy`, `loguru`, `awkward`, `jaxtyping`
- Dev deps: `pytest`, `basedpyright`
- Treat `py_workspace/loguru/` and `py_workspace/tmp/` as non-primary project areas unless explicitly asked.
### Key Insight
---
**There is NO built-in mode for streaming computed depth maps or point clouds.** The architecture forces a choice:
## 2) Build / Run / Lint / Test Commands
1. **Streaming API**: Edge sends video → Host computes everything (depth, tracking, detection)
2. **Fusion API**: Edge computes everything → Sends only metadata (bodies/poses)
### Python (py_workspace)
### Code Patterns
Run from: `/workspaces/zed-playground/py_workspace`
```bash
uv sync
uv run python -V
```
Run scripts:
```bash
uv run streaming_receiver.py --help
uv run recording_multi.py
uv run calibrate_extrinsics.py --help
```
Type checking / lint-equivalent:
```bash
uv run basedpyright
```
Tests:
```bash
uv run pytest
```
Run a single test file:
```bash
uv run pytest tests/test_depth_refine.py
```
Run a single test function:
```bash
uv run pytest tests/test_depth_refine.py::test_refine_extrinsics_with_depth_with_offset
```
Run by keyword:
```bash
uv run pytest -k "depth and refine"
```
Useful verbosity / fail-fast options:
```bash
uv run pytest -x -vv
```
Notes:
- `pyproject.toml` sets `testpaths = ["tests"]`
- `norecursedirs = ["loguru", "tmp", "libs"]`
### C++ sample project (body tracking)
Run from:
`/workspaces/zed-playground/playground/body tracking/multi-camera/cpp/build`
#### Streaming Sender (Edge)
```cpp
sl::StreamingParameters stream_params;
stream_params.codec = sl::STREAMING_CODEC::H265;
stream_params.port = 30000;
stream_params.bitrate = 12000;
zed.enableStreaming(stream_params);
```
#### Streaming Receiver (Host)
```cpp
sl::InitParameters init_params;
init_params.input.setFromStream("192.168.1.100", 30000);
zed.open(init_params);
// Full ZED SDK available - depth, tracking, etc.
```
#### Fusion Publisher (Edge or Host)
```cpp
sl::CommunicationParameters comm_params;
comm_params.setForLocalNetwork(30000);
// or comm_params.setForIntraProcess(); for same-machine
zed.startPublishing(comm_params);
```
#### Fusion Subscriber (Host)
```cpp
sl::Fusion fusion;
fusion.init(init_params);
sl::CameraIdentifier cam(serial_number);
fusion.subscribe(cam, comm_params, pose);
```
## Project: Multi-Camera Body Tracking
### Location
`/workspaces/zed-playground/playground/body tracking/multi-camera/cpp/`
### Architecture
- **ClientPublisher**: Receives camera streams, runs body tracking, publishes to Fusion
- **Fusion**: Subscribes to multiple ClientPublishers, fuses body data from all cameras
- **GLViewer**: 3D visualization of fused bodies
### Camera Configuration (Hard-coded)
From `inside_network.json`:
| Serial | IP | Streaming Port |
|--------|-----|----------------|
| 44289123 | 192.168.128.2 | 30000 |
| 44435674 | 192.168.128.2 | 30002 |
| 41831756 | 192.168.128.2 | 30004 |
| 46195029 | 192.168.128.2 | 30006 |
### Data Flow
```
Edge Camera (enableStreaming) → Network Stream
ClientPublisher (setFromStream) → Body Tracking (host)
startPublishing() → Fusion (INTRA_PROCESS)
Fused Bodies → GLViewer
```
### Build
```bash
cd "/workspaces/zed-playground/playground/body tracking/multi-camera/cpp/build"
cmake ..
make -j4
```
### Run
```bash
./ZED_BodyFusion <config_file.json>
```
## Related Samples
---
### Camera Streaming Receiver
`/workspaces/zed-playground/playground/camera streaming/receiver/cpp/`
- Simple streaming receiver sample
- Shows basic `setFromStream()` usage with OpenCV display
## 3) Rules Files Scan (Cursor / Copilot)
## ZED SDK Headers
Located at: `/usr/local/zed/include/sl/`
- `Camera.hpp` - Main camera API
- `Fusion.hpp` - Fusion module API
- `CameraOne.hpp` - Single camera utilities
As of latest scan:
- No `.cursorrules` found
- No `.cursor/rules/` found
- No `.github/copilot-instructions.md` found
If these files are later added, treat them as higher-priority local policy and update this guide.
---
## 4) Code Style Conventions (Python)
### Imports
- Prefer grouping: stdlib → third-party → local modules.
- ZED imports use `import pyzed.sl as sl`.
- In package modules (`aruco/*`), use relative imports (`from .pose_math import ...`).
- In top-level scripts, absolute package imports are common (`from aruco... import ...`).
### Formatting & structure
- 4-space indentation, PEP8-style layout.
- Keep functions focused; isolate heavy logic into helper functions.
- Favor explicit CLI options via `click.option` rather than positional ambiguity.
### Typing
- Type hints are expected on public and most internal functions.
- Existing code uses both `typing.Optional/List/Dict` and modern `|` syntax; stay consistent with surrounding file.
- Use `jaxtyping` shape hints when already used in module (`TYPE_CHECKING` guard pattern).
- Avoid `Any` unless unavoidable (OpenCV / pyzed boundaries).
### Naming
- `snake_case`: functions, variables, modules.
- `PascalCase`: classes.
- `UPPER_SNAKE_CASE`: constants (e.g., dictionary maps).
### Docstrings
- Include concise purpose + `Args` / `Returns` where helpful.
- For matrix/array-heavy functions, document expected shape and units.
### Logging & user output
- CLI-facing messaging: use `click.echo`.
- Diagnostic/internal logs: use `loguru` (`logger.debug/info/warning`).
- Keep debug noise behind `--debug` style flag where possible.
### Error handling
- Raise specific exceptions (`ValueError`, `FileNotFoundError`, etc.) with actionable messages.
- For CLI fatal paths, use `click.UsageError` / `SystemExit(1)` patterns found in project.
- Validate early (shape/range/None checks) before expensive operations.
---
## 5) Testing Conventions
- Framework: `pytest`
- Numeric assertions: prefer `numpy.testing.assert_allclose` where appropriate.
- Exception checks: `pytest.raises(..., match=...)`.
- Add tests under `py_workspace/tests/`.
- If adding behavior in `aruco/*`, add or update corresponding tests (`test_depth_*`, `test_alignment`, etc.).
---
## 6) ZED-Specific Project Guidance
### Architecture reminder: Streaming vs Fusion
- Streaming API: send compressed video, compute depth/tracking on host.
- Fusion API: publish metadata (bodies/poses), lightweight host fusion.
- There is no built-in “stream depth map” mode in the same way as metadata fusion.
### Depth units
- Be explicit with coordinate units and keep units consistent end-to-end.
- Marker geometry/parquet conventions in this repo are meter-based; do not mix mm/m silently.
### Threading
- OpenCV GUI (`cv2.imshow`, `cv2.waitKey`) belongs on main thread.
- Capture/grab work in worker threads with queue handoff.
### Network config
- Use `zed_network_utils.py` and `zed_settings/inside_network.json` conventions.
---
## 7) Agent Execution Checklist
Before editing:
1. Identify target workspace (`py_workspace` vs playground C++).
2. Confirm commands from this file and nearby module docs.
3. Search for existing tests covering the area.
After editing:
1. Run focused test(s) first, then broader test run as needed.
2. Run `uv run basedpyright` for type regressions.
3. Keep diffs minimal and avoid unrelated file churn.
If uncertain:
- Prefer small, verifiable changes.
- Document assumptions in commit/PR notes.
+6
View File
@@ -1,7 +1,13 @@
{"id":"py_workspace-6m5","title":"Robust Optimizer Implementation","status":"closed","priority":0,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T05:22:45.183574374Z","created_by":"crosstyan","updated_at":"2026-02-07T05:22:53.151871639Z","closed_at":"2026-02-07T05:22:53.151871639Z","close_reason":"Implemented robust optimizer with least_squares and soft_l1 loss, updated tests"}
{"id":"py_workspace-6sg","title":"Document marker parquet structure","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T02:48:08.95742431Z","created_by":"crosstyan","updated_at":"2026-02-07T02:49:35.897152691Z","closed_at":"2026-02-07T02:49:35.897152691Z","close_reason":"Documented parquet structure in aruco/markers/PARQUET_FORMAT.md"}
{"id":"py_workspace-a85","title":"Add CLI option for ArUco dictionary in calibrate_extrinsics.py","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-06T10:13:41.896728814Z","created_by":"crosstyan","updated_at":"2026-02-06T10:14:44.083065399Z","closed_at":"2026-02-06T10:14:44.083065399Z","close_reason":"Added CLI option for selectable ArUco dictionary including AprilTag aliases"}
{"id":"py_workspace-cg9","title":"Implement core alignment utilities (Task 1)","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-06T10:40:36.296030875Z","created_by":"crosstyan","updated_at":"2026-02-06T10:40:46.196825039Z","closed_at":"2026-02-06T10:40:46.196825039Z","close_reason":"Implemented compute_face_normal, rotation_align_vectors, and apply_alignment_to_pose in aruco/alignment.py"}
{"id":"py_workspace-j8b","title":"Research scipy.optimize.least_squares robust optimization for depth residuals","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T04:54:04.720996955Z","created_by":"crosstyan","updated_at":"2026-02-07T04:55:22.995644Z","closed_at":"2026-02-07T04:55:22.995644Z","close_reason":"Research completed and recommendations provided."}
{"id":"py_workspace-kpa","title":"Unit Hardening (P0)","status":"closed","priority":0,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T05:01:46.342605011Z","created_by":"crosstyan","updated_at":"2026-02-07T05:01:51.303022101Z","closed_at":"2026-02-07T05:01:51.303022101Z","close_reason":"Implemented unit hardening in SVOReader: set coordinate_units=METER and guarded manual conversion in _retrieve_depth. Added depth sanity logs."}
{"id":"py_workspace-kuy","title":"Move parquet documentation to docs/","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T02:52:12.609090777Z","created_by":"crosstyan","updated_at":"2026-02-07T02:52:43.088520272Z","closed_at":"2026-02-07T02:52:43.088520272Z","close_reason":"Moved parquet documentation to docs/marker-parquet-format.md"}
{"id":"py_workspace-ld1","title":"Search for depth unit conversion and scaling patterns","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T04:53:53.211242053Z","created_by":"crosstyan","updated_at":"2026-02-07T04:54:56.840335809Z","closed_at":"2026-02-07T04:54:56.840335809Z","close_reason":"Exhaustive search completed. Identified manual scaling in svo_sync.py and SDK-level scaling in depth_sensing.py. Documented risks in learnings.md."}
{"id":"py_workspace-nvw","title":"Update documentation for robust depth refinement","status":"open","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T05:41:32.963615133Z","created_by":"crosstyan","updated_at":"2026-02-07T05:41:32.963615133Z"}
{"id":"py_workspace-q4w","title":"Add type hints and folder-aware --svo input in calibrate_extrinsics.py","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-06T10:01:13.943518267Z","created_by":"crosstyan","updated_at":"2026-02-06T10:03:09.855307397Z","closed_at":"2026-02-06T10:03:09.855307397Z","close_reason":"Implemented type hints and directory expansion for --svo"}
{"id":"py_workspace-t4e","title":"Add --min-markers CLI and rejection debug logs in calibrate_extrinsics","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-06T10:21:51.846079425Z","created_by":"crosstyan","updated_at":"2026-02-06T10:22:39.870440044Z","closed_at":"2026-02-06T10:22:39.870440044Z","close_reason":"Added --min-markers (default 1), rejection debug logs, and clarified accepted-pose summary label"}
{"id":"py_workspace-th3","title":"Implement Best-Frame Selection for depth verification","status":"closed","priority":1,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-07T05:04:11.896109458Z","created_by":"crosstyan","updated_at":"2026-02-07T05:06:07.346747231Z","closed_at":"2026-02-07T05:06:07.346747231Z","close_reason":"Implemented best-frame selection with scoring logic and verified with tests."}
{"id":"py_workspace-z3r","title":"Add debug logs for successful ArUco detection","status":"closed","priority":2,"issue_type":"task","owner":"crosstyan@outlook.com","created_at":"2026-02-06T10:17:30.195422209Z","created_by":"crosstyan","updated_at":"2026-02-06T10:18:35.263206185Z","closed_at":"2026-02-06T10:18:35.263206185Z","close_reason":"Added loguru debug logs for successful ArUco detections in calibrate_extrinsics loop"}
+2
View File
@@ -219,3 +219,5 @@ __marimo__/
*.svo2
.ruff_cache
output/
loguru/
tmp/
+7 -5
View File
@@ -1,8 +1,10 @@
{
"active_plan": "/workspaces/zed-playground/py_workspace/.sisyphus/plans/ground-plane-alignment.md",
"started_at": "2026-02-06T10:34:57.130Z",
"active_plan": "/workspaces/zed-playground/py_workspace/.sisyphus/plans/depth-refinement-robust.md",
"started_at": "2026-02-07T04:51:46.370Z",
"session_ids": [
"ses_3cd9cdde1ffeQFgrhQqYAExSTn"
"ses_3c99b5043ffeFGeuraVIodT6wM",
"ses_3c99b5043ffeFGeuraVIodT6wM"
],
"plan_name": "ground-plane-alignment"
}
"plan_name": "depth-refinement-robust",
"agent": "atlas"
}
@@ -0,0 +1,3 @@
# Draft: SUPERSEDED
This draft has been superseded by the final plan at `.sisyphus/plans/depth-refinement-robust.md`.
@@ -0,0 +1,60 @@
## Robust Optimization Patterns
- Use `method='trf'` for robust loss + bounds.
- `loss='cauchy'` is highly effective for outlier-heavy depth data.
- `f_scale` should be tuned to the expected inlier noise (e.g., sensor precision).
- Weights must be manually multiplied into the residual vector.
# Unit Hardening Learnings
- **SDK Unit Consistency**: Explicitly setting `init_params.coordinate_units = sl.UNIT.METER` ensures that all SDK-retrieved measures (depth, point clouds, tracking) are in meters, avoiding manual conversion errors.
- **Double Scaling Guard**: When moving to SDK-level meter units, existing manual conversions (e.g., `/ 1000.0`) must be guarded or removed. Checking `cam.get_init_parameters().coordinate_units` provides a safe runtime check.
- **Depth Sanity Logging**: Adding min/median/max/p95 stats for valid depth values in debug logs helps identify scaling issues (e.g., seeing values in the thousands when expecting meters) or data quality problems early.
- **Loguru Integration**: Standardized on `loguru` for debug logging in `SVOReader` to match project patterns.
## Best-Frame Selection (Task 4)
- Implemented `score_frame` function in `calibrate_extrinsics.py` to evaluate frame quality.
- Scoring criteria:
- Base score: `n_markers * 100.0 - reproj_err`
- Depth bonus: Up to +50.0 based on valid depth ratio at marker corners.
- Main loop now tracks the frame with the highest score per camera instead of just the latest valid frame.
- Deterministic tie-breaking: The first frame with a given score is kept (implicitly by `current_score > best_so_far["score"]`).
- This ensures depth verification and refinement use the highest quality data available in the SVO.
- **Regression Testing for Units**: Added `tests/test_depth_units.py` which mocks `sl.Camera` and `sl.Mat` to verify that `_retrieve_depth` correctly handles both `sl.UNIT.METER` (no scaling) and `sl.UNIT.MILLIMETER` (divides by 1000) paths. This ensures the unit hardening is robust against future changes.
## Robust Optimizer Implementation (Task 2)
- Replaced `minimize(L-BFGS-B)` with `least_squares(trf, soft_l1)`.
- **Key Finding**: `soft_l1` loss with `f_scale=0.1` (10cm) effectively ignores 3m outliers in synthetic tests, whereas MSE is heavily biased by them.
- **Regularization**: Split into `reg_rot` (0.1) and `reg_trans` (1.0) to penalize translation more heavily in meters.
- **Testing**: Synthetic tests require careful depth map painting to ensure markers project into the correct "measured" regions as the optimizer moves the camera. A 5x5 window lookup means we need to paint at least +/- 30 pixels to cover the optimization trajectory.
- **Convergence**: `least_squares` with robust loss may stop slightly earlier than MSE on clean data due to gradient dampening; relaxed tolerance to 5mm for unit tests.
## Task 5: Diagnostics and Acceptance Gates
- Surfaced rich optimizer diagnostics in `refine_extrinsics_with_depth` stats: `termination_status`, `nfev`, `njev`, `optimality`, `n_active_bounds`.
- Added data quality counts: `n_points_total`, `n_depth_valid`, `n_confidence_rejected`.
- Implemented warning gates in `calibrate_extrinsics.py`:
- Negligible improvement: Warns if `improvement_rmse < 1e-4` after more than 5 iterations.
- Stalled/Failed: Warns if `success` is false or `nfev <= 1`.
- These diagnostics provide better visibility into why refinement might be failing or doing nothing, which is critical for the upcoming benchmark matrix (Task 6).
## Benchmark Matrix Implementation
- Added `--benchmark-matrix` flag to `calibrate_extrinsics.py`.
- Implemented `run_benchmark_matrix` to compare 4 configurations:
1. baseline (linear loss, no confidence)
2. robust (soft_l1, f_scale=0.1, no confidence)
3. robust+confidence (soft_l1, f_scale=0.1, confidence weights)
4. robust+confidence+best-frame (same as 3 but using the best-scored frame instead of the first valid one)
- The benchmark results are printed as a table to stdout and saved in the output JSON under the `benchmark` key for each camera.
- Captured `first_frames` in the main loop to provide a consistent baseline for comparison against the `best_frame` (verification_frames).
## Documentation Updates (2026-02-07)
### Workflow Documentation
- Updated `docs/calibrate-extrinsics-workflow.md` to reflect the new robust refinement pipeline.
- Added documentation for new CLI flags: `--use-confidence-weights`, `--benchmark-matrix`.
- Explained the switch from `L-BFGS-B` (MSE) to `least_squares` (Soft-L1) for robust optimization.
- Documented the "Best Frame Selection" logic (scoring based on marker count, reprojection error, and valid depth).
- Marked the "Unit Mismatch" issue as resolved due to explicit meter enforcement in `SVOReader`.
### Key Learnings
- **Documentation as Contract**: Updating the docs *after* implementation revealed that the "Unit Mismatch" section was outdated. Explicitly marking it as "Resolved" preserves the history while clarifying current behavior.
- **Benchmark Matrix Value**: Documenting the benchmark matrix makes it a first-class citizen in the workflow, encouraging users to empirically verify refinement improvements rather than trusting defaults.
- **Confidence Weights**: Explicitly documenting this feature highlights the importance of sensor uncertainty in the optimization process.
@@ -0,0 +1,13 @@
# Depth Unit Scaling Patterns
## Findings
- **Native SDK Scaling**: `depth_sensing.py` uses `init_params.coordinate_units = sl.UNIT.METER`.
- **Manual Scaling**: `aruco/svo_sync.py` uses `depth_data / 1000.0` because it leaves `coordinate_units` at the default (`MILLIMETER`).
## Risks
- **Double-Scaling**: If `svo_sync.py` is updated to use `sl.UNIT.METER` in `InitParameters`, the manual `/ 1000.0` MUST be removed, otherwise depth values will be 1000x smaller than intended.
- **Inconsistency**: Different parts of the codebase handle unit conversion differently (SDK-level vs. Application-level).
## Recommendations
- Standardize on `sl.UNIT.METER` in `InitParameters` across all ZED camera initializations.
- Remove manual `/ 1000.0` scaling once SDK-level units are set to meters.
@@ -0,0 +1,685 @@
# Robust Depth Refinement for Camera Extrinsics
## TL;DR
> **Quick Summary**: Replace the failing depth-based pose refinement pipeline with a robust optimizer (`scipy.optimize.least_squares` with soft-L1 loss), add unit hardening, confidence-weighted residuals, best-frame selection, rich diagnostics, and a benchmark matrix comparing configurations.
>
> **Deliverables**:
> - Unit-hardened depth retrieval (set `coordinate_units=METER`, guard double-conversion)
> - Robust optimization objective using `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`
> - Confidence-weighted depth residuals (toggleable via CLI flag)
> - Best-frame selection replacing naive "latest valid frame"
> - Rich optimizer diagnostics and acceptance gates
> - Benchmark matrix comparing baseline/robust/+confidence/+best-frame
> - Updated tests for all new functionality
>
> **Estimated Effort**: Medium (3-4 hours implementation)
> **Parallel Execution**: YES - 2 waves
> **Critical Path**: Task 1 (units) → Task 2 (robust optimizer) → Task 3 (confidence) → Task 5 (diagnostics) → Task 6 (benchmark)
---
## Context
### Original Request
Implement the 5 items from "Recommended Implementation Order" in `docs/calibrate-extrinsics-workflow.md`, plus research and choose the best optimization method for depth-based camera extrinsic refinement.
### Interview Summary
**Key Discussions**:
- Requirements were explicitly specified in the documentation (no interactive interview needed)
- Research confirmed `scipy.optimize.least_squares` is superior to `scipy.optimize.minimize` for this problem class
**Research Findings**:
- **freemocap/anipose** (production multi-camera calibration) uses exactly `least_squares(method="trf", loss=loss, f_scale=threshold)` for bundle adjustment — validates our approach
- **scipy docs** recommend `soft_l1` or `huber` for robust fitting; `f_scale` controls the inlier/outlier threshold
- **Current output JSONs** confirm catastrophic failure: RMSE 5000+ meters (`aligned_refined_extrinsics_fast.json`), RMSE ~11.6m (`test_refine_current.json`), iterations=0/1, success=false across all cameras
- **Unit mismatch** still active despite `/1000.0` conversion — ZED defaults to mm, code divides by 1000, but no `coordinate_units=METER` set
- **Confidence map** retrieved but only used in verify filtering, not in optimizer objective
### Metis Review
**Identified Gaps** (addressed):
- Output JSON schema backward compatibility → New fields are additive only (existing fields preserved)
- Confidence weighting can interact with robust loss → Made toggleable, logged statistics
- Best-frame selection changes behavior → Deterministic scoring, old behavior available as fallback
- Zero valid points edge case → Explicit early exit with diagnostic
- Numerical pass/fail gate → Added RMSE threshold checks
- Regression guard → Default CLI behavior unchanged unless user opts into new features
---
## Work Objectives
### Core Objective
Make depth-based extrinsic refinement actually work by fixing the unit mismatch, switching to a robust optimizer, incorporating confidence weighting, and selecting the best frame for refinement.
### Concrete Deliverables
- Modified `aruco/svo_sync.py` with unit hardening
- Rewritten `aruco/depth_refine.py` using `least_squares` with robust loss
- Updated `aruco/depth_verify.py` with confidence weight extraction helper
- Updated `calibrate_extrinsics.py` with frame scoring, diagnostics, new CLI flags
- New and updated tests in `tests/`
- Updated `docs/calibrate-extrinsics-workflow.md` with new behavior docs
### Definition of Done
- [x] `uv run pytest` passes with 0 failures
- [x] Synthetic test: robust optimizer converges (success=True, nfev > 1) with injected outliers
- [x] Existing tests still pass (backward compatibility)
- [x] Benchmark matrix produces 4 comparable result records
### Must Have
- `coordinate_units = sl.UNIT.METER` set in SVOReader
- `least_squares` with `loss="soft_l1"` and `f_scale=0.1` as default optimizer
- Confidence weighting via `--use-confidence-weights` flag
- Best-frame selection with deterministic scoring
- Optimizer diagnostics in output JSON and logs
- All changes covered by automated tests
### Must NOT Have (Guardrails)
- Must NOT change unrelated calibration logic (marker detection, PnP, pose averaging, alignment)
- Must NOT change file I/O formats or break JSON schema (only additive fields)
- Must NOT introduce new dependencies beyond scipy/numpy already in use
- Must NOT implement multi-optimizer auto-selection or hyperparameter search
- Must NOT turn frame scoring into a ML quality model — simple weighted heuristic only
- Must NOT add premature abstractions or over-engineer the API
- Must NOT remove existing CLI flags or change their default behavior
---
## Verification Strategy
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
>
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
> Every criterion is verified by running `uv run pytest` or inspecting code.
### Test Decision
- **Infrastructure exists**: YES (pytest configured in pyproject.toml, tests/ directory)
- **Automated tests**: YES (tests-after, matching existing project pattern)
- **Framework**: pytest (via `uv run pytest`)
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
**Verification Tool by Deliverable Type:**
| Type | Tool | How Agent Verifies |
|------|------|-------------------|
| Python module changes | Bash (`uv run pytest`) | Run tests, assert 0 failures |
| New functions | Bash (`uv run pytest -k test_name`) | Run specific test, assert pass |
| CLI behavior | Bash (`uv run python calibrate_extrinsics.py --help`) | Verify new flags present |
---
## Execution Strategy
### Parallel Execution Waves
```
Wave 1 (Start Immediately):
├── Task 1: Unit hardening (svo_sync.py) [no dependencies]
└── Task 4: Best-frame selection (calibrate_extrinsics.py) [no dependencies]
Wave 2 (After Wave 1):
├── Task 2: Robust optimizer (depth_refine.py) [depends: 1]
├── Task 3: Confidence weighting (depth_verify.py + depth_refine.py) [depends: 2]
└── Task 5: Diagnostics and acceptance gates [depends: 2]
Wave 3 (After Wave 2):
└── Task 6: Benchmark matrix [depends: 2, 3, 4, 5]
Wave 4 (After All):
└── Task 7: Documentation update [depends: all]
Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 6
```
### Dependency Matrix
| Task | Depends On | Blocks | Can Parallelize With |
|------|------------|--------|---------------------|
| 1 | None | 2, 3 | 4 |
| 2 | 1 | 3, 5, 6 | - |
| 3 | 2 | 6 | 5 |
| 4 | None | 6 | 1 |
| 5 | 2 | 6 | 3 |
| 6 | 2, 3, 4, 5 | 7 | - |
| 7 | All | None | - |
### Agent Dispatch Summary
| Wave | Tasks | Recommended Agents |
|------|-------|-------------------|
| 1 | 1, 4 | `category="quick"` for T1; `category="unspecified-low"` for T4 |
| 2 | 2, 3, 5 | `category="deep"` for T2; `category="quick"` for T3, T5 |
| 3 | 6 | `category="unspecified-low"` |
| 4 | 7 | `category="writing"` |
---
## TODOs
- [x] 1. Unit Hardening (P0)
**What to do**:
- In `aruco/svo_sync.py`, add `init_params.coordinate_units = sl.UNIT.METER` in the `SVOReader.__init__` method, right after `init_params.set_from_svo_file(path)` (around line 42)
- Guard the existing `/1000.0` conversion: check whether `coordinate_units` is already METER. If METER is set, skip the division. If not set or MILLIMETER, apply the division. Add a log warning if division is applied as fallback
- Add depth sanity logging under `--debug` mode: after retrieving depth, log `min/median/max/p95` of valid depth values. This goes in the `_retrieve_depth` method
- Write a test that verifies the unit-hardened path doesn't double-convert
**Must NOT do**:
- Do NOT change depth retrieval for confidence maps
- Do NOT modify the `grab_synced()` or `grab_all()` methods
- Do NOT add new CLI parameters for this task
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Small, focused change in one file + one test file
- **Skills**: [`git-master`]
- `git-master`: Atomic commit of unit hardening change
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 1 (with Task 4)
- **Blocks**: Tasks 2, 3
- **Blocked By**: None
**References**:
**Pattern References** (existing code to follow):
- `aruco/svo_sync.py:40-44` — Current `init_params` setup where `coordinate_units` must be added
- `aruco/svo_sync.py:180-189` — Current `_retrieve_depth` method with `/1000.0` conversion to modify
- `aruco/svo_sync.py:191-196` — Confidence retrieval pattern (do NOT modify, but understand adjacency)
**API/Type References** (contracts to implement against):
- ZED SDK `InitParameters.coordinate_units` — Set to `sl.UNIT.METER`
- `loguru.logger` — Used project-wide for debug logging
**Test References** (testing patterns to follow):
- `tests/test_depth_verify.py:36-66` — Test pattern using synthetic depth maps (follow this style)
- `tests/test_depth_refine.py:21-39` — Test pattern with synthetic K matrix and depth maps
**Documentation References**:
- `docs/calibrate-extrinsics-workflow.md:116-132` — Documents the unit mismatch problem and mitigation strategy
- `docs/calibrate-extrinsics-workflow.md:166-169` — Specifies the exact implementation steps for unit hardening
**Acceptance Criteria**:
- [ ] `init_params.coordinate_units = sl.UNIT.METER` is set in SVOReader.__init__ before `cam.open()`
- [ ] The `/1000.0` division in `_retrieve_depth` is guarded (only applied if units are NOT meters)
- [ ] Debug logging of depth statistics (min/median/max) is added to `_retrieve_depth` when depth mode is active
- [ ] `uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q` → all pass (no regressions)
**Agent-Executed QA Scenarios:**
```
Scenario: Verify unit hardening doesn't break existing tests
Tool: Bash (uv run pytest)
Preconditions: All dependencies installed
Steps:
1. Run: uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q
2. Assert: exit code 0
3. Assert: output contains "passed" and no "FAILED"
Expected Result: All existing tests pass
Evidence: Terminal output captured
Scenario: Verify coordinate_units is set in code
Tool: Bash (grep)
Preconditions: File modified
Steps:
1. Run: grep -n "coordinate_units" aruco/svo_sync.py
2. Assert: output contains "UNIT.METER" or "METER"
Expected Result: Unit setting is present
Evidence: Grep output
```
**Commit**: YES
- Message: `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion`
- Files: `aruco/svo_sync.py`, `tests/test_depth_refine.py`
- Pre-commit: `uv run pytest tests/ -q`
---
- [x] 2. Robust Optimizer — Replace MSE with `least_squares` + Soft-L1 Loss (P0)
**What to do**:
- **Rewrite `depth_residual_objective`** → Replace with a **residual vector function** `depth_residuals(params, ...)` that returns an array of residuals (not a scalar cost). Each element is `(z_measured - z_predicted)` for one marker corner. This is what `least_squares` expects.
- **Add regularization as pseudo-residuals**: Append `[reg_weight_rot * delta_rvec, reg_weight_trans * delta_tvec]` to the residual vector. This naturally penalizes deviation from the initial pose. Split into separate rotation and translation regularization weights (default: `reg_rot=0.1`, `reg_trans=1.0` — translation more tightly regularized in meters scale).
- **Replace `minimize(method="L-BFGS-B")` with `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`**:
- `method="trf"` — Trust Region Reflective, handles bounds naturally
- `loss="soft_l1"` — Smooth robust loss, downweights outliers beyond `f_scale`
- `f_scale=0.1` — Residuals >0.1m are treated as outliers (matches ZED depth noise ~1-5cm)
- `bounds` — Same ±5°/±5cm bounds, expressed as `(lower_bounds_array, upper_bounds_array)` tuple
- `x_scale="jac"` — Automatic Jacobian-based scaling (prevents ill-conditioning)
- `max_nfev=200` — Maximum function evaluations
- **Update `refine_extrinsics_with_depth` signature**: Add parameters for `loss`, `f_scale`, `reg_rot`, `reg_trans`. Keep backward-compatible defaults. Return enriched stats dict including: `termination_message`, `nfev`, `optimality`, `active_mask`, `cost`.
- **Handle zero residuals**: If residual vector is empty (no valid depth points), return initial pose unchanged with stats indicating `"reason": "no_valid_depth_points"`.
- **Maintain backward-compatible scalar cost reporting**: Compute `initial_cost` and `final_cost` from the residual vector for comparison with old output format.
**Must NOT do**:
- Do NOT change `extrinsics_to_params` or `params_to_extrinsics` (the Rodrigues parameterization is correct)
- Do NOT modify `depth_verify.py` in this task
- Do NOT add confidence weighting here (that's Task 3)
- Do NOT add CLI flags here (that's Task 5)
**Recommended Agent Profile**:
- **Category**: `deep`
- Reason: Core algorithmic change, requires understanding of optimization theory and careful residual construction
- **Skills**: []
- No specialized skills needed — pure Python/numpy/scipy work
**Parallelization**:
- **Can Run In Parallel**: NO
- **Parallel Group**: Wave 2 (sequential after Wave 1)
- **Blocks**: Tasks 3, 5, 6
- **Blocked By**: Task 1
**References**:
**Pattern References** (existing code to follow):
- `aruco/depth_refine.py:19-47` — Current `depth_residual_objective` function to REPLACE
- `aruco/depth_refine.py:50-112` — Current `refine_extrinsics_with_depth` function to REWRITE
- `aruco/depth_refine.py:1-16` — Import block and helper functions (keep `extrinsics_to_params`, `params_to_extrinsics`)
- `aruco/depth_verify.py:27-67` — `compute_depth_residual` function — this is the per-point residual computation called from the objective. Understand its contract: returns `float(z_measured - z_predicted)` or `None`.
**API/Type References**:
- `scipy.optimize.least_squares` — [scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html): `fun(x, *args) -> residuals_array`; parameters: `method="trf"`, `loss="soft_l1"`, `f_scale=0.1`, `bounds=(lb, ub)`, `x_scale="jac"`, `max_nfev=200`
- Return type: `OptimizeResult` with attributes: `.x`, `.cost`, `.fun`, `.jac`, `.grad`, `.optimality`, `.active_mask`, `.nfev`, `.njev`, `.status`, `.message`, `.success`
**External References** (production examples):
- `freemocap/anipose` bundle_adjust method — Uses `least_squares(error_fun, x0, jac_sparsity=jac_sparse, f_scale=f_scale, x_scale="jac", loss=loss, ftol=ftol, method="trf", tr_solver="lsmr")` for multi-camera calibration. Key pattern: residual function returns per-point reprojection errors.
- scipy Context7 docs — Example shows `least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train))` where `fun` returns residual vector
**Test References**:
- `tests/test_depth_refine.py` — ALL 4 existing tests must still pass. They test: roundtrip, no-change convergence, offset correction, and bounds respect. The new optimizer must satisfy these same properties.
**Acceptance Criteria**:
- [ ] `from scipy.optimize import least_squares` replaces `from scipy.optimize import minimize`
- [ ] `depth_residuals()` returns `np.ndarray` (vector), not scalar float
- [ ] `least_squares(method="trf", loss="soft_l1", f_scale=0.1)` is the optimizer call
- [ ] Regularization is split: separate `reg_rot` and `reg_trans` weights, appended as pseudo-residuals
- [ ] Stats dict includes: `termination_message`, `nfev`, `optimality`, `cost`
- [ ] Zero-residual case returns initial pose with `reason: "no_valid_depth_points"`
- [ ] `uv run pytest tests/test_depth_refine.py -q` → all 4 existing tests pass
- [ ] New test: synthetic data with 30% outlier depths → robust optimizer converges (success=True, nfev > 1) with lower median residual than would occur with pure MSE
**Agent-Executed QA Scenarios:**
```
Scenario: All existing depth_refine tests pass after rewrite
Tool: Bash (uv run pytest)
Preconditions: Task 1 completed, aruco/depth_refine.py rewritten
Steps:
1. Run: uv run pytest tests/test_depth_refine.py -v
2. Assert: exit code 0
3. Assert: output contains "4 passed"
Expected Result: All 4 existing tests pass
Evidence: Terminal output captured
Scenario: Robust optimizer handles outliers better than MSE
Tool: Bash (uv run pytest)
Preconditions: New test added
Steps:
1. Run: uv run pytest tests/test_depth_refine.py::test_robust_loss_handles_outliers -v
2. Assert: exit code 0
3. Assert: test passes
Expected Result: With 30% outliers, robust optimizer has lower median abs residual
Evidence: Terminal output captured
```
**Commit**: YES
- Message: `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer`
- Files: `aruco/depth_refine.py`, `tests/test_depth_refine.py`
- Pre-commit: `uv run pytest tests/test_depth_refine.py -q`
---
- [x] 3. Confidence-Weighted Depth Residuals (P0)
**What to do**:
- **Add confidence weight extraction helper** to `aruco/depth_verify.py`: Create a function `get_confidence_weight(confidence_map, u, v, confidence_thresh=50) -> float` that returns a normalized weight in [0, 1]. ZED confidence: [1, 100] where higher = LESS confident. Normalize as `max(0, (confidence_thresh - conf_value)) / confidence_thresh`. Values above threshold → weight 0. Clamp to `[eps, 1.0]` where eps=1e-6.
- **Update `depth_residuals()` in `aruco/depth_refine.py`**: Accept optional `confidence_map` and `confidence_thresh` parameters. If confidence_map is provided, multiply each depth residual by `sqrt(weight)` before returning. This implements weighted least squares within the `least_squares` framework.
- **Update `refine_extrinsics_with_depth` signature**: Add `confidence_map=None`, `confidence_thresh=50` parameters. Pass through to `depth_residuals()`.
- **Update `calibrate_extrinsics.py`**: Pass `confidence_map=frame.confidence_map` and `confidence_thresh=depth_confidence_threshold` to `refine_extrinsics_with_depth` when confidence weighting is requested
- **Add `--use-confidence-weights/--no-confidence-weights` CLI flag** (default: False for backward compatibility)
- **Log confidence statistics** under `--debug`: After computing weights, log `n_zero_weight`, `mean_weight`, `median_weight`
**Must NOT do**:
- Do NOT change the verification logic in `verify_extrinsics_with_depth` (it already uses confidence correctly)
- Do NOT change confidence semantics (higher ZED value = less confident)
- Do NOT make confidence weighting the default behavior
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Adding parameters and weight multiplication — straightforward plumbing
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: NO (depends on Task 2)
- **Parallel Group**: Wave 2 (after Task 2)
- **Blocks**: Task 6
- **Blocked By**: Task 2
**References**:
**Pattern References**:
- `aruco/depth_verify.py:82-96` — Existing confidence handling pattern (filtering, NOT weighting). Follow this semantics but produce a continuous weight instead of binary skip
- `aruco/depth_verify.py:93-95` — ZED confidence semantics: "Higher confidence value means LESS confident... Range [1, 100], where 100 is typically occlusion/invalid"
- `aruco/depth_refine.py` — Updated in Task 2 with `depth_residuals()` function. Add `confidence_map` parameter here
- `calibrate_extrinsics.py:136-148` — Current call site for `refine_extrinsics_with_depth`. Add confidence_map/thresh forwarding
**Test References**:
- `tests/test_depth_verify.py:69-84` — Test pattern for `compute_marker_corner_residuals`. Follow for confidence weight test
**Acceptance Criteria**:
- [ ] `get_confidence_weight()` function exists in `depth_verify.py`
- [ ] Confidence weighting is off by default (backward compatible)
- [ ] `--use-confidence-weights` flag exists in CLI
- [ ] Low-confidence points have lower influence on optimization (verified by test)
- [ ] `uv run pytest tests/ -q` → all pass
**Agent-Executed QA Scenarios:**
```
Scenario: Confidence weighting reduces outlier influence
Tool: Bash (uv run pytest)
Steps:
1. Run: uv run pytest tests/test_depth_refine.py::test_confidence_weighting -v
2. Assert: exit code 0
Expected Result: With low-confidence outlier points, weighted optimizer ignores them
Evidence: Terminal output
Scenario: CLI flag exists
Tool: Bash
Steps:
1. Run: uv run python calibrate_extrinsics.py --help | grep -i confidence-weight
2. Assert: output contains "--use-confidence-weights"
Expected Result: Flag is available
Evidence: Help text
```
**Commit**: YES
- Message: `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag`
- Files: `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
- Pre-commit: `uv run pytest tests/ -q`
---
- [x] 4. Best-Frame Selection (P1)
**What to do**:
- **Create `score_frame_quality()` function** in `calibrate_extrinsics.py` (or a new `aruco/frame_scoring.py` if cleaner). The function takes: `n_markers: int`, `reproj_error: float`, `depth_map: np.ndarray`, `marker_corners_world: Dict[int, np.ndarray]`, `T_world_cam: np.ndarray`, `K: np.ndarray` and returns a float score (higher = better).
- **Scoring formula**: `score = w_markers * n_markers + w_reproj * (1 / (reproj_error + eps)) + w_depth * valid_depth_ratio`
- `w_markers = 1.0` — more markers = better constraint
- `w_reproj = 5.0` — lower reprojection error = more accurate PnP
- `w_depth = 3.0` — higher ratio of valid depth at marker locations = better depth signal
- `valid_depth_ratio = n_valid_depths / n_total_corners`
- `eps = 1e-6` to avoid division by zero
- **Replace "last valid frame" logic** in `calibrate_extrinsics.py`: Instead of overwriting `verification_frames[serial]` every time (line 467-471), track ALL valid frames per camera with their scores. After the processing loop, select the frame with the highest score.
- **Log selected frame**: Under `--debug`, log the chosen frame index, score, and component breakdown for each camera
- **Ensure deterministic tiebreaking**: If scores are equal, pick the frame with the lower frame_index (earliest)
- **Keep frame storage bounded**: Store at most `max_stored_frames=10` candidates per camera (configurable), keeping the top-scoring ones
**Must NOT do**:
- Do NOT add ML-based frame scoring
- Do NOT change the frame grabbing/syncing logic
- Do NOT add new dependencies
**Recommended Agent Profile**:
- **Category**: `unspecified-low`
- Reason: New functionality but straightforward heuristic
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 1 (with Task 1)
- **Blocks**: Task 6
- **Blocked By**: None
**References**:
**Pattern References**:
- `calibrate_extrinsics.py:463-471` — Current "last valid frame" logic to REPLACE. Currently: `verification_frames[serial] = {"frame": frame, "ids": ids, "corners": corners}`
- `calibrate_extrinsics.py:452-478` — Full frame processing context (pose estimation, accumulation, frame caching)
- `aruco/depth_verify.py:27-67` — `compute_depth_residual` can be used to check valid depth at marker locations for scoring
**Test References**:
- `tests/test_depth_cli_postprocess.py` — Test pattern for calibrate_extrinsics functions
**Acceptance Criteria**:
- [ ] `score_frame_quality()` function exists and returns a float
- [ ] Best frame is selected (not last frame) for each camera
- [ ] Scoring is deterministic (same inputs → same selected frame)
- [ ] Frame selection metadata is logged under `--debug`
- [ ] `uv run pytest tests/ -q` → all pass (no regressions)
**Agent-Executed QA Scenarios:**
```
Scenario: Frame scoring is deterministic
Tool: Bash (uv run pytest)
Steps:
1. Run: uv run pytest tests/test_frame_scoring.py -v
2. Assert: exit code 0
Expected Result: Same inputs always produce same score and selection
Evidence: Terminal output
Scenario: Higher marker count increases score
Tool: Bash (uv run pytest)
Steps:
1. Run: uv run pytest tests/test_frame_scoring.py::test_more_markers_higher_score -v
2. Assert: exit code 0
Expected Result: Frame with more markers scores higher
Evidence: Terminal output
```
**Commit**: YES
- Message: `feat(calibrate): replace naive frame selection with quality-scored best-frame`
- Files: `calibrate_extrinsics.py`, `tests/test_frame_scoring.py`
- Pre-commit: `uv run pytest tests/ -q`
---
- [x] 5. Diagnostics and Acceptance Gates (P1)
**What to do**:
- **Enrich `refine_extrinsics_with_depth` stats dict**: The `least_squares` result (from Task 2) already provides `.status`, `.message`, `.nfev`, `.njev`, `.optimality`, `.active_mask`. Surface these in the returned stats dict as: `termination_status` (int), `termination_message` (str), `nfev` (int), `njev` (int), `optimality` (float), `n_active_bounds` (int, count of parameters at bound limits).
- **Add effective valid points count**: Log how many marker corners had valid (finite, positive) depth, and how many were used after confidence filtering. Add to stats: `n_depth_valid`, `n_confidence_filtered`.
- **Add RMSE improvement gate**: If `improvement_rmse < 1e-4` AND `nfev > 5`, log WARNING: "Refinement converged with negligible improvement — consider checking depth data quality"
- **Add failure diagnostic**: If `success == False` or `nfev <= 1`, log WARNING with termination message and suggest checking depth unit consistency
- **Log optimizer progress under `--debug`**: Before and after optimization, log: initial cost, final cost, delta_rotation, delta_translation, termination message, number of function evaluations
- **Surface diagnostics in JSON output**: Add fields to `refine_depth` dict in output JSON: `termination_status`, `termination_message`, `nfev`, `n_valid_points`, `loss_function`, `f_scale`
**Must NOT do**:
- Do NOT add automated "redo with different params" logic
- Do NOT add email/notification alerts
- Do NOT change the optimization algorithm or parameters (already done in Task 2)
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Adding logging and dict fields — no algorithmic changes
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES (with Task 3)
- **Parallel Group**: Wave 2
- **Blocks**: Task 6
- **Blocked By**: Task 2
**References**:
**Pattern References**:
- `aruco/depth_refine.py:103-111` — Current stats dict construction (to EXTEND, not replace)
- `calibrate_extrinsics.py:159-181` — Current refinement result logging and JSON field assignment
- `loguru.logger` — Project uses loguru for structured logging
**API/Type References**:
- `scipy.optimize.OptimizeResult` — `.status` (int: 1=convergence, 0=max_nfev, -1=improper), `.message` (str), `.nfev`, `.njev`, `.optimality` (gradient infinity norm)
**Acceptance Criteria**:
- [ ] Stats dict contains: `termination_status`, `termination_message`, `nfev`, `n_valid_points`
- [ ] Output JSON `refine_depth` section contains diagnostic fields
- [ ] WARNING log emitted when improvement < 1e-4 with nfev > 5
- [ ] WARNING log emitted when success=False or nfev <= 1
- [ ] `uv run pytest tests/ -q` → all pass
**Agent-Executed QA Scenarios:**
```
Scenario: Diagnostics present in refine stats
Tool: Bash (uv run pytest)
Steps:
1. Run: uv run pytest tests/test_depth_refine.py -v
2. Assert: All tests pass
3. Check that stats dict from refine function contains "termination_message" key
Expected Result: Diagnostics are in stats output
Evidence: Terminal output
```
**Commit**: YES
- Message: `feat(refine): add rich optimizer diagnostics and acceptance gates`
- Files: `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
- Pre-commit: `uv run pytest tests/ -q`
---
- [x] 6. Benchmark Matrix (P1)
**What to do**:
- **Add `--benchmark-matrix` flag** to `calibrate_extrinsics.py` CLI
- **When enabled**, run the depth refinement pipeline 4 times per camera with different configurations:
1. **baseline**: `loss="linear"` (no robust loss), no confidence weights
2. **robust**: `loss="soft_l1"`, `f_scale=0.1`, no confidence weights
3. **robust+confidence**: `loss="soft_l1"`, `f_scale=0.1`, confidence weighting ON
4. **robust+confidence+best-frame**: Same as #3 but using best-frame selection
- **Output**: For each configuration, report per-camera: pre-refinement RMSE, post-refinement RMSE, improvement, iteration count, success/failure, termination reason
- **Format**: Print a formatted table to stdout (using click.echo) AND save to a benchmark section in the output JSON
- **Implementation**: Create a helper function `run_benchmark_matrix(T_initial, marker_corners_world, depth_map, K, confidence_map, ...)` that returns a list of result dicts
**Must NOT do**:
- Do NOT implement automated configuration tuning
- Do NOT add visualization/plotting dependencies
- Do NOT change the default (non-benchmark) codepath behavior
**Recommended Agent Profile**:
- **Category**: `unspecified-low`
- Reason: Orchestration code, calling existing functions with different params
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: NO (depends on all previous tasks)
- **Parallel Group**: Wave 3 (after all)
- **Blocks**: Task 7
- **Blocked By**: Tasks 2, 3, 4, 5
**References**:
**Pattern References**:
- `calibrate_extrinsics.py:73-196` — `apply_depth_verify_refine_postprocess` function. The benchmark matrix calls this logic with varied parameters
- `aruco/depth_refine.py` — Updated `refine_extrinsics_with_depth` with `loss`, `f_scale`, `confidence_map` params
**Acceptance Criteria**:
- [ ] `--benchmark-matrix` flag exists in CLI
- [ ] When enabled, 4 configurations are run per camera
- [ ] Output table is printed to stdout
- [ ] Benchmark results are in output JSON under `benchmark` key
- [ ] `uv run pytest tests/ -q` → all pass
**Agent-Executed QA Scenarios:**
```
Scenario: Benchmark flag in CLI help
Tool: Bash
Steps:
1. Run: uv run python calibrate_extrinsics.py --help | grep benchmark
2. Assert: output contains "--benchmark-matrix"
Expected Result: Flag is present
Evidence: Help text output
```
**Commit**: YES
- Message: `feat(calibrate): add --benchmark-matrix for comparing refinement configurations`
- Files: `calibrate_extrinsics.py`, `tests/test_benchmark.py`
- Pre-commit: `uv run pytest tests/ -q`
---
- [x] 7. Documentation Update
**What to do**:
- Update `docs/calibrate-extrinsics-workflow.md`:
- Add new CLI flags: `--use-confidence-weights`, `--benchmark-matrix`
- Update "Depth Verification & Refinement" section with new optimizer details
- Update "Refinement" section: document `least_squares` with `soft_l1` loss, `f_scale`, confidence weighting
- Add "Best-Frame Selection" section explaining the scoring formula
- Add "Diagnostics" section documenting new output JSON fields
- Update "Example Workflow" commands to show new flags
- Mark the "Known Unexpected Behavior" unit mismatch section as RESOLVED with the fix description
**Must NOT do**:
- Do NOT rewrite unrelated documentation sections
- Do NOT add tutorial-style content
**Recommended Agent Profile**:
- **Category**: `writing`
- Reason: Pure documentation writing
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: NO
- **Parallel Group**: Wave 4 (final)
- **Blocks**: None
- **Blocked By**: All previous tasks
**References**:
**Pattern References**:
- `docs/calibrate-extrinsics-workflow.md` — Entire file. Follow existing section structure and formatting
**Acceptance Criteria**:
- [ ] New CLI flags documented
- [ ] `least_squares` optimizer documented with parameter explanations
- [ ] Best-frame selection documented
- [ ] Unit mismatch section updated as resolved
- [ ] Example commands include new flags
**Commit**: YES
- Message: `docs: update calibrate-extrinsics-workflow for robust refinement changes`
- Files: `docs/calibrate-extrinsics-workflow.md`
- Pre-commit: `uv run pytest tests/ -q`
---
## Commit Strategy
| After Task | Message | Files | Verification |
|------------|---------|-------|--------------|
| 1 | `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion` | `aruco/svo_sync.py`, tests | `uv run pytest tests/ -q` |
| 2 | `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer` | `aruco/depth_refine.py`, tests | `uv run pytest tests/ -q` |
| 3 | `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag` | `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
| 4 | `feat(calibrate): replace naive frame selection with quality-scored best-frame` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
| 5 | `feat(refine): add rich optimizer diagnostics and acceptance gates` | `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
| 6 | `feat(calibrate): add --benchmark-matrix for comparing refinement configurations` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
| 7 | `docs: update calibrate-extrinsics-workflow for robust refinement changes` | `docs/calibrate-extrinsics-workflow.md` | `uv run pytest tests/ -q` |
---
## Success Criteria
### Verification Commands
```bash
uv run pytest tests/ -q # Expected: all pass, 0 failures
uv run pytest tests/test_depth_refine.py -v # Expected: all tests pass including new robust/confidence tests
```
### Final Checklist
- [x] All "Must Have" items present
- [x] All "Must NOT Have" items absent
- [x] All tests pass (`uv run pytest tests/ -q`)
- [x] Output JSON backward compatible (existing fields preserved, new fields additive)
- [x] Default CLI behavior unchanged (new features opt-in)
- [x] Optimizer actually converges on synthetic test data (success=True, nfev > 1)
+184 -33
View File
@@ -1,37 +1,188 @@
# Python Agent Context
# AGENTS.md — Python Workspace Guide
## Environment
- **Directory**: `/workspaces/zed-playground/py_workspace`
- **Package Manager**: `uv`
- **Python Version**: 3.12+ (Managed by `uv`)
- **Dependencies**: Defined in `pyproject.toml`
- `pyzed`: ZED SDK Python wrapper
- `opencv-python`: GUI and image processing
- `click`: CLI argument parsing
- `numpy`, `cupy-cuda12x`: Data manipulation
This file defines coding-agent guidance for:
`/workspaces/zed-playground/py_workspace`
## Workflow & Commands
- **Run Scripts**: Always use `uv run` to ensure correct environment.
```bash
uv run streaming_receiver.py --help
uv run recording_multi.py
```
- **New Dependencies**: Add with `uv add <package>` (e.g., `uv add requests`).
Use this as the primary reference for Python work in this repository.
## Architecture & Patterns
- **Network Camera Handling**:
- Use `zed_network_utils.py` for all network config parsing.
- Config file: `/workspaces/zed-playground/zed_settings/inside_network.json`
- **Threading Model**:
- **Main Thread**: MUST handle all OpenCV GUI (`cv2.imshow`, `cv2.waitKey`).
- **Worker Threads**: Handle `camera.grab()` and data retrieval.
- **Communication**: Use `queue.Queue` to pass frames from workers to main.
- **ZED API Patterns**:
- Streaming Input: `init_params.set_from_stream(ip, port)`
- Serial Number: Use `camera.get_camera_information().serial_number`.
---
## Documentation & References
- **Python API Docs**: `/usr/local/zed/doc/API/html/python/index.html`
- **ZED SDK General Docs**: `/usr/local/zed/doc/`
- **C++ Headers (Reference)**: `/usr/local/zed/include/sl/`
- Useful for understanding underlying enum values or behaviors not fully detailed in Python docstrings.
## 1) Scope & Environment
- Package manager: **uv**
- Python: **3.12+**
- Project file: `pyproject.toml`
- Main package/work area: top-level scripts + `aruco/` + `tests/`
- Non-primary/vendor-like areas (avoid unless explicitly asked):
- `loguru/`
- `tmp/`
- `libs/`
Core dependencies include:
- `pyzed`, `opencv-python`, `click`, `numpy`, `scipy`
- `loguru`, `awkward`, `jaxtyping`, `pyarrow`, `pandas`
Dev dependencies:
- `pytest`, `basedpyright`
---
## 2) Build / Run / Lint / Test Commands
Run commands from:
`/workspaces/zed-playground/py_workspace`
Environment sync:
```bash
uv sync
uv run python -V
```
Run common scripts:
```bash
uv run streaming_receiver.py --help
uv run recording_multi.py
uv run calibrate_extrinsics.py --help
```
Type-check / lint-equivalent:
```bash
uv run basedpyright
```
Run all tests:
```bash
uv run pytest
```
Run a single test file:
```bash
uv run pytest tests/test_depth_refine.py
```
Run a single test function:
```bash
uv run pytest tests/test_depth_refine.py::test_refine_extrinsics_with_depth_with_offset
```
Run subset by keyword:
```bash
uv run pytest -k "depth and refine"
```
Useful options:
```bash
uv run pytest -x -vv
```
Notes from `pyproject.toml`:
- `testpaths = ["tests"]`
- `norecursedirs = ["loguru", "tmp", "libs"]`
---
## 3) Rules Files (Cursor / Copilot)
Latest scan in this workspace found:
- No `.cursorrules`
- No `.cursor/rules/`
- No `.github/copilot-instructions.md`
If these files appear later, treat them as higher-priority local instructions.
---
## 4) Python Code Style Conventions
### Imports
- Group imports: standard library → third-party → local modules.
- Use ZED import style:
- `import pyzed.sl as sl`
- In package modules (`aruco/*`), prefer relative imports:
- `from .pose_math import ...`
- In top-level scripts, absolute imports are common:
- `from aruco.detector import ...`
### Formatting & structure
- 4-space indentation.
- PEP8-style layout.
- Keep functions focused and composable.
- Prefer explicit CLI options over positional ambiguity.
### Typing
- Add type hints on public and most internal functions.
- Existing code uses both:
- `typing.Optional/List/Dict/Tuple`
- modern `|` unions
Stay consistent with the surrounding file.
- When arrays/matrices are central, use `jaxtyping` shape aliases (with `TYPE_CHECKING` guards) where already established.
- Avoid broad `Any` unless unavoidable at library boundaries (OpenCV/pyzed interop).
### Naming
- `snake_case`: functions, variables, modules.
- `PascalCase`: classes.
- `UPPER_SNAKE_CASE`: constants.
### Docstrings
- Use concise purpose + `Args` / `Returns`.
- Document expected array shapes and units for geometry/math functions.
### Logging & output
- User-facing CLI output: `click.echo`.
- Diagnostic logs: `loguru` (`logger.debug/info/warning`).
- Keep verbose logs behind a `--debug` flag.
### Error handling
- Raise specific exceptions (`ValueError`, `FileNotFoundError`, etc.) with actionable messages.
- For CLI fatal paths, use `click.UsageError` or `SystemExit(1)` patterns.
- Validate early (shape/range/None) before expensive compute.
---
## 5) Testing Conventions
- Framework: `pytest`
- Numerical checks: use `numpy.testing.assert_allclose` where appropriate.
- Exception checks: `pytest.raises(..., match=...)`.
- Place/add tests under `tests/`.
- For `aruco/*` behavior changes, update related tests (`test_depth_*`, `test_alignment`, etc.).
---
## 6) Project-Specific ZED Guidance
### Streaming vs Fusion architecture
- Streaming API sends compressed video; host computes depth/tracking.
- Fusion API sends metadata; host does lightweight fusion.
- Do not assume built-in depth-map streaming parity with metadata fusion.
### Units
- Keep units explicit and consistent end-to-end.
- Marker parquet geometry is meter-based in this workspace.
- Be careful with ZED depth unit configuration and conversions.
### Threading
- OpenCV GUI (`cv2.imshow`, `cv2.waitKey`) should stay on main thread.
- Use worker thread(s) for grab/retrieve and queue handoff patterns.
### Network config
- Follow `zed_network_utils.py` and `zed_settings/inside_network.json` patterns.
---
## 7) Agent Workflow Checklist
Before editing:
1. Identify target file/module and nearest existing pattern.
2. Confirm expected command(s) from this guide.
3. Check for relevant existing tests.
After editing:
1. Run focused tests first.
2. Run broader test selection as needed.
3. Run `uv run basedpyright` on final pass.
4. Keep changes minimal and avoid unrelated churn.
If uncertain:
- Prefer small, verifiable changes.
- Document assumptions in PR/commit notes.
+68 -13
View File
@@ -1,8 +1,12 @@
import numpy as np
from typing import Dict, Tuple, Any
from typing import Dict, Tuple, Any, Optional
from scipy.optimize import least_squares
from .pose_math import rvec_tvec_to_matrix, matrix_to_rvec_tvec
from .depth_verify import compute_depth_residual
from .depth_verify import (
compute_depth_residual,
get_confidence_weight,
project_point_to_pixel,
)
def extrinsics_to_params(T: np.ndarray) -> np.ndarray:
@@ -24,6 +28,8 @@ def depth_residuals(
initial_params: np.ndarray,
reg_rot: float = 0.1,
reg_trans: float = 1.0,
confidence_map: Optional[np.ndarray] = None,
confidence_thresh: float = 100.0,
) -> np.ndarray:
T = params_to_extrinsics(params)
residuals = []
@@ -32,15 +38,25 @@ def depth_residuals(
for corner in corners:
residual = compute_depth_residual(corner, T, depth_map, K, window_size=5)
if residual is not None:
if confidence_map is not None:
u, v = project_point_to_pixel(
(np.linalg.inv(T) @ np.append(corner, 1.0))[:3], K
)
if u is not None and v is not None:
h, w = confidence_map.shape[:2]
if 0 <= u < w and 0 <= v < h:
conf = confidence_map[v, u]
weight = get_confidence_weight(conf, confidence_thresh)
residual *= np.sqrt(weight)
residuals.append(residual)
# Regularization as pseudo-residuals
param_diff = params - initial_params
# Rotation regularization (first 3 params)
if reg_rot > 0:
residuals.extend(param_diff[:3] * reg_rot)
# Translation regularization (last 3 params)
if reg_trans > 0:
residuals.extend(param_diff[3:] * reg_trans)
@@ -60,6 +76,8 @@ def refine_extrinsics_with_depth(
f_scale: float = 0.1,
reg_rot: float | None = None,
reg_trans: float | None = None,
confidence_map: Optional[np.ndarray] = None,
confidence_thresh: float = 100.0,
) -> Tuple[np.ndarray, dict[str, Any]]:
initial_params = extrinsics_to_params(T_initial)
@@ -72,14 +90,29 @@ def refine_extrinsics_with_depth(
reg_trans = regularization_weight * 10.0
# Check for valid depth points first
data_residual_count = 0
n_points_total = 0
n_depth_valid = 0
n_confidence_rejected = 0
for marker_id, corners in marker_corners_world.items():
for corner in corners:
n_points_total += 1
res = compute_depth_residual(corner, T_initial, depth_map, K, window_size=5)
if res is not None:
data_residual_count += 1
if data_residual_count == 0:
n_depth_valid += 1
if confidence_map is not None:
u, v = project_point_to_pixel(
(np.linalg.inv(T_initial) @ np.append(corner, 1.0))[:3], K
)
if u is not None and v is not None:
h, w = confidence_map.shape[:2]
if 0 <= u < w and 0 <= v < h:
conf = confidence_map[v, u]
weight = get_confidence_weight(conf, confidence_thresh)
if weight <= 0:
n_confidence_rejected += 1
if n_depth_valid == 0:
return T_initial, {
"success": False,
"reason": "no_valid_depth_points",
@@ -89,22 +122,30 @@ def refine_extrinsics_with_depth(
"delta_rotation_deg": 0.0,
"delta_translation_norm_m": 0.0,
"termination_message": "No valid depth points found at marker corners",
"termination_status": -1,
"nfev": 0,
"njev": 0,
"optimality": 0.0,
"n_active_bounds": 0,
"active_mask": np.zeros(6, dtype=int),
"cost": 0.0
"cost": 0.0,
"n_points_total": n_points_total,
"n_depth_valid": n_depth_valid,
"n_confidence_rejected": n_confidence_rejected,
"loss_function": loss,
"f_scale": f_scale,
}
max_rotation_rad = np.deg2rad(max_rotation_deg)
lower_bounds = initial_params.copy()
upper_bounds = initial_params.copy()
lower_bounds[:3] -= max_rotation_rad
upper_bounds[:3] += max_rotation_rad
lower_bounds[3:] -= max_translation_m
upper_bounds[3:] += max_translation_m
bounds = (lower_bounds, upper_bounds)
result = least_squares(
@@ -117,6 +158,8 @@ def refine_extrinsics_with_depth(
initial_params,
reg_rot,
reg_trans,
confidence_map,
confidence_thresh,
),
method="trf",
loss=loss,
@@ -142,6 +185,8 @@ def refine_extrinsics_with_depth(
initial_params,
reg_rot,
reg_trans,
confidence_map,
confidence_thresh,
)
initial_cost = 0.5 * np.sum(initial_residuals**2)
@@ -153,10 +198,20 @@ def refine_extrinsics_with_depth(
"delta_rotation_deg": float(delta_rotation_deg),
"delta_translation_norm_m": float(delta_translation),
"termination_message": result.message,
"nfev": result.nfev,
"termination_status": int(result.status),
"nfev": int(result.nfev),
"njev": int(getattr(result, "njev", 0)),
"optimality": float(result.optimality),
"active_mask": result.active_mask,
"n_active_bounds": int(np.sum(result.active_mask != 0)),
"active_mask": result.active_mask.tolist()
if hasattr(result.active_mask, "tolist")
else result.active_mask,
"cost": float(result.cost),
"n_points_total": n_points_total,
"n_depth_valid": n_depth_valid,
"n_confidence_rejected": n_confidence_rejected,
"loss_function": loss,
"f_scale": f_scale,
}
return T_refined, stats
+12
View File
@@ -24,6 +24,18 @@ def project_point_to_pixel(P_cam: np.ndarray, K: np.ndarray):
return u, v
def get_confidence_weight(confidence: float, threshold: float = 100.0) -> float:
"""
Convert ZED confidence value to a weight in [0, 1].
ZED semantics: 1 is most confident, 100 is least confident.
"""
if not np.isfinite(confidence) or confidence < 0:
return 0.0
# Linear weight from 1.0 (at confidence=0) to 0.0 (at confidence=threshold)
weight = 1.0 - (confidence / threshold)
return float(np.clip(weight, 0.0, 1.0))
def compute_depth_residual(
P_world: np.ndarray,
T_world_cam: np.ndarray,
+267 -9
View File
@@ -70,6 +70,51 @@ ARUCO_DICT_MAP = {
}
def score_frame(
n_markers: int,
reproj_err: float,
corners: np.ndarray,
depth_map: Optional[np.ndarray],
depth_confidence_threshold: int = 50,
confidence_map: Optional[np.ndarray] = None,
) -> float:
"""
Compute a quality score for a frame to select the best one for depth verification.
Higher is better.
"""
# Base score: more markers is better, lower reprojection error is better.
# We weight markers heavily as they provide more constraints.
score = n_markers * 100.0 - reproj_err
if depth_map is not None:
# Calculate depth validity ratio at marker corners.
# This ensures we pick a frame where depth is actually available where we need it.
valid_count = 0
total_count = 0
h, w = depth_map.shape[:2]
# corners shape is (N, 4, 2)
flat_corners = corners.reshape(-1, 2)
for pt in flat_corners:
x, y = int(round(pt[0])), int(round(pt[1]))
if 0 <= x < w and 0 <= y < h:
total_count += 1
d = depth_map[y, x]
if np.isfinite(d) and d > 0:
if confidence_map is not None:
# ZED confidence: lower is more confident
if confidence_map[y, x] <= depth_confidence_threshold:
valid_count += 1
else:
valid_count += 1
if total_count > 0:
depth_ratio = valid_count / total_count
score += depth_ratio * 50.0
return score
def apply_depth_verify_refine_postprocess(
results: Dict[str, Any],
verification_frames: Dict[str, Any],
@@ -77,6 +122,7 @@ def apply_depth_verify_refine_postprocess(
camera_matrices: Dict[str, Any],
verify_depth: bool,
refine_depth: bool,
use_confidence_weights: bool,
depth_confidence_threshold: int,
report_csv_path: Optional[str] = None,
) -> Tuple[Dict[str, Any], List[List[Any]]]:
@@ -145,6 +191,10 @@ def apply_depth_verify_refine_postprocess(
marker_corners_world,
frame.depth_map,
cam_matrix,
confidence_map=frame.confidence_map
if use_confidence_weights
else None,
confidence_thresh=depth_confidence_threshold,
)
verify_res_post = verify_extrinsics_with_depth(
@@ -180,6 +230,18 @@ def apply_depth_verify_refine_postprocess(
f"Trans={refine_stats['delta_translation_norm_m']:.3f}m"
)
# Warning gates
if improvement < 1e-4 and refine_stats["nfev"] > 5:
click.echo(
f" WARNING: Optimization ran for {refine_stats['nfev']} steps but improvement was negligible ({improvement:.6f}m).",
err=True,
)
if not refine_stats["success"] or refine_stats["nfev"] <= 1:
click.echo(
f" WARNING: Optimization might have failed or stalled. Success: {refine_stats['success']}, Steps: {refine_stats['nfev']}. Message: {refine_stats['termination_message']}",
err=True,
)
verify_res = verify_res_post
if report_csv_path:
@@ -196,6 +258,144 @@ def apply_depth_verify_refine_postprocess(
return results, csv_rows
def run_benchmark_matrix(
results: Dict[str, Any],
verification_frames: Dict[Any, Any],
first_frames: Dict[Any, Any],
marker_geometry: Dict[int, Any],
camera_matrices: Dict[Any, Any],
depth_confidence_threshold: int,
) -> Dict[str, Any]:
"""
Run benchmark matrix comparing 4 configurations:
1) baseline (linear loss, no confidence weights)
2) robust (soft_l1, f_scale=0.1, no confidence)
3) robust+confidence
4) robust+confidence+best-frame
"""
benchmark_results = {}
configs = [
{
"name": "baseline",
"loss": "linear",
"use_confidence": False,
"use_best_frame": False,
},
{
"name": "robust",
"loss": "soft_l1",
"use_confidence": False,
"use_best_frame": False,
},
{
"name": "robust+confidence",
"loss": "soft_l1",
"use_confidence": True,
"use_best_frame": False,
},
{
"name": "robust+confidence+best-frame",
"loss": "soft_l1",
"use_confidence": True,
"use_best_frame": True,
},
]
click.echo("\nRunning Benchmark Matrix...")
for serial in results.keys():
serial_int = int(serial)
if serial_int not in first_frames or serial_int not in verification_frames:
continue
cam_matrix = camera_matrices[serial_int]
pose_str = results[serial]["pose"]
T_initial = np.fromstring(pose_str, sep=" ").reshape(4, 4)
cam_bench = {}
for config in configs:
name = config["name"]
use_best = config["use_best_frame"]
vf = (
verification_frames[serial_int]
if use_best
else first_frames[serial_int]
)
frame = vf["frame"]
ids = vf["ids"]
marker_corners_world = {
int(mid): marker_geometry[int(mid)]
for mid in ids.flatten()
if int(mid) in marker_geometry
}
if not marker_corners_world or frame.depth_map is None:
continue
# Pre-refinement verification
verify_pre = verify_extrinsics_with_depth(
T_initial,
marker_corners_world,
frame.depth_map,
cam_matrix,
confidence_map=frame.confidence_map,
confidence_thresh=depth_confidence_threshold,
)
# Refinement
T_refined, refine_stats = refine_extrinsics_with_depth(
T_initial,
marker_corners_world,
frame.depth_map,
cam_matrix,
confidence_map=frame.confidence_map
if config["use_confidence"]
else None,
confidence_thresh=depth_confidence_threshold,
loss=str(config["loss"]),
f_scale=0.1,
)
# Post-refinement verification
verify_post = verify_extrinsics_with_depth(
T_refined,
marker_corners_world,
frame.depth_map,
cam_matrix,
confidence_map=frame.confidence_map,
confidence_thresh=depth_confidence_threshold,
)
cam_bench[name] = {
"rmse_pre": verify_pre.rmse,
"rmse_post": verify_post.rmse,
"improvement": verify_pre.rmse - verify_post.rmse,
"delta_rot_deg": refine_stats["delta_rotation_deg"],
"delta_trans_m": refine_stats["delta_translation_norm_m"],
"nfev": refine_stats["nfev"],
"success": refine_stats["success"],
"frame_index": vf["frame_index"],
}
benchmark_results[serial] = cam_bench
# Print summary table for this camera
click.echo(f"\nBenchmark Results for Camera {serial}:")
header = f"{'Config':<30} | {'RMSE Pre':<10} | {'RMSE Post':<10} | {'Improv':<10} | {'Iter':<5}"
click.echo(header)
click.echo("-" * len(header))
for name, stats in cam_bench.items():
click.echo(
f"{name:<30} | {stats['rmse_pre']:<10.4f} | {stats['rmse_post']:<10.4f} | "
f"{stats['improvement']:<10.4f} | {stats['nfev']:<5}"
)
return benchmark_results
@click.command()
@click.option("--svo", "-s", multiple=True, required=False, help="Path to SVO files.")
@click.option("--markers", "-m", required=True, help="Path to markers parquet file.")
@@ -223,6 +423,11 @@ def apply_depth_verify_refine_postprocess(
@click.option(
"--refine-depth/--no-refine-depth", default=False, help="Enable depth refinement."
)
@click.option(
"--use-confidence-weights/--no-confidence-weights",
default=False,
help="Use confidence-weighted residuals in depth refinement.",
)
@click.option(
"--depth-mode",
default="NEURAL",
@@ -272,6 +477,11 @@ def apply_depth_verify_refine_postprocess(
type=int,
help="Maximum number of samples to process before stopping.",
)
@click.option(
"--benchmark-matrix/--no-benchmark-matrix",
default=False,
help="Run benchmark matrix comparing different refinement configurations.",
)
def main(
svo: tuple[str, ...],
markers: str,
@@ -283,6 +493,7 @@ def main(
self_check: bool,
verify_depth: bool,
refine_depth: bool,
use_confidence_weights: bool,
depth_mode: str,
depth_confidence_threshold: int,
report_csv: str | None,
@@ -293,6 +504,7 @@ def main(
min_markers: int,
debug: bool,
max_samples: int | None,
benchmark_matrix: bool,
):
"""
Calibrate camera extrinsics relative to a global coordinate system defined by ArUco markers.
@@ -313,7 +525,7 @@ def main(
}
sl_depth_mode = depth_mode_map.get(depth_mode, sl.DEPTH_MODE.NONE)
if not (verify_depth or refine_depth):
if not (verify_depth or refine_depth or benchmark_matrix):
sl_depth_mode = sl.DEPTH_MODE.NONE
# Expand SVO paths (files or directories)
@@ -406,6 +618,8 @@ def main(
# Store verification frames for post-process check
verification_frames = {}
# Store first valid frame for benchmarking
first_frames = {}
# Track all visible marker IDs for heuristic ground detection
all_visible_ids = set()
@@ -460,15 +674,43 @@ def main(
# We want T_world_from_cam
T_world_cam = invert_transform(T_cam_world)
# Save latest valid frame for verification
# Save best frame for verification based on scoring
if (
verify_depth or refine_depth
verify_depth or refine_depth or benchmark_matrix
) and frame.depth_map is not None:
verification_frames[serial] = {
"frame": frame,
"ids": ids,
"corners": corners,
}
current_score = score_frame(
n_markers,
reproj_err,
corners,
frame.depth_map,
depth_confidence_threshold,
frame.confidence_map,
)
if serial not in first_frames:
first_frames[serial] = {
"frame": frame,
"ids": ids,
"corners": corners,
"score": current_score,
"frame_index": frame_count,
}
best_so_far = verification_frames.get(serial)
if (
best_so_far is None
or current_score > best_so_far["score"]
):
verification_frames[serial] = {
"frame": frame,
"ids": ids,
"corners": corners,
"score": current_score,
"frame_index": frame_count,
}
logger.debug(
f"Cam {serial}: New best frame {frame_count} with score {current_score:.2f}"
)
accumulators[serial].add_pose(
T_world_cam, reproj_err, frame_count
@@ -550,11 +792,27 @@ def main(
camera_matrices,
verify_depth,
refine_depth,
use_confidence_weights,
depth_confidence_threshold,
report_csv,
)
# 5. Optional Ground Plane Alignment
# 5. Run Benchmark Matrix if requested
if benchmark_matrix:
benchmark_results = run_benchmark_matrix(
results,
verification_frames,
first_frames,
marker_geometry,
camera_matrices,
depth_confidence_threshold,
)
# Add to results for saving
for serial, bench in benchmark_results.items():
if serial in results:
results[serial]["benchmark"] = bench
# 6. Optional Ground Plane Alignment
if auto_align:
click.echo("\nPerforming ground plane alignment...")
target_face = ground_face
@@ -12,6 +12,8 @@ The script calibrates camera extrinsics using ArUco markers detected in SVO reco
- `--auto-align`: Enables automatic ground plane alignment (opt-in).
- `--verify-depth`: Enables depth-based verification of computed poses.
- `--refine-depth`: Enables optimization of poses using depth data (requires `--verify-depth`).
- `--use-confidence-weights`: Uses ZED depth confidence map to weight residuals in optimization.
- `--benchmark-matrix`: Runs a comparison of baseline vs. robust refinement configurations.
- `--max-samples`: Limits the number of processed samples for fast iteration.
- `--debug`: Enables verbose debug logging (default is INFO).
@@ -63,13 +65,35 @@ This workflow uses the ZED camera's depth map to verify and improve the ArUco-ba
### 2. Refinement (`--refine-depth`)
- **Trigger**: Runs only if verification is enabled and enough valid depth points (>4) are found.
- **Process**:
- Uses `scipy.optimize.minimize` (L-BFGS-B) to adjust the 6-DOF pose parameters (rotation vector + translation vector).
- **Objective Function**: Minimizes the squared difference between computed depth and measured depth for all visible marker corners.
- Uses `scipy.optimize.least_squares` with a robust loss function (`soft_l1`) to handle outliers.
- **Objective Function**: Minimizes the robust residual between computed depth and measured depth for all visible marker corners.
- **Confidence Weighting** (`--use-confidence-weights`): If enabled, residuals are weighted by the ZED confidence map (higher confidence = higher weight).
- **Constraints**: Bounded optimization to prevent drifting too far from the initial ArUco pose (default: ±5 degrees, ±5cm).
- **Output**:
- Refined pose replaces the original pose in the JSON output.
- Improvement stats (delta rotation, delta translation, RMSE reduction) added under `refine_depth`.
### 3. Best Frame Selection
When multiple frames are available, the system scores them to pick the best candidate for verification/refinement:
- **Criteria**:
- Number of detected markers (primary factor).
- Reprojection error (lower is better).
- Valid depth ratio (percentage of marker corners with valid depth data).
- Depth confidence (if available).
- **Benefit**: Ensures refinement uses high-quality data rather than just the last valid frame.
## Benchmark Matrix (`--benchmark-matrix`)
This mode runs a comparative analysis of different refinement configurations on the same data to evaluate improvements. It compares:
1. **Baseline**: Linear loss (MSE), no confidence weighting.
2. **Robust**: Soft-L1 loss, no confidence weighting.
3. **Robust + Confidence**: Soft-L1 loss with confidence-weighted residuals.
4. **Robust + Confidence + Best Frame**: All of the above, using the highest-scored frame.
**Output:**
- Prints a summary table for each camera showing RMSE improvement and iteration counts.
- Adds a `benchmark` object to the JSON output containing detailed stats for each configuration.
## Fast Iteration (`--max-samples`)
For development or quick checks, processing thousands of frames is unnecessary.
@@ -78,7 +102,7 @@ For development or quick checks, processing thousands of frames is unnecessary.
## Example Workflow
**Full Run with Alignment and Refinement:**
**Full Run with Alignment and Robust Refinement:**
```bash
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
@@ -88,9 +112,19 @@ uv run calibrate_extrinsics.py \
--ground-marker-id 21 \
--verify-depth \
--refine-depth \
--use-confidence-weights \
--output output/calibrated.json
```
**Benchmark Run:**
```bash
uv run calibrate_extrinsics.py \
--svo output/recording.svo \
--markers aruco/markers/box.parquet \
--benchmark-matrix \
--max-samples 100
```
**Fast Debug Run:**
```bash
uv run calibrate_extrinsics.py \
@@ -104,89 +138,18 @@ uv run calibrate_extrinsics.py \
## Known Unexpected Behavior / Troubleshooting
### Depth Refinement Failure (Unit Mismatch)
### Resolved: Depth Refinement Failure (Unit Mismatch)
**Symptoms:**
*Note: This issue has been resolved in the latest version by enforcing explicit meter units in the SVO reader and removing ambiguous manual conversions.*
**Previous Symptoms:**
- `depth_verify` reports extremely large RMSE values (e.g., > 1000).
- `refine_depth` reports `success: false`, `iterations: 0`, and near-zero improvement.
- The optimization fails to converge or produces nonsensical results.
**Root Cause:**
The ZED SDK `retrieve_measure(sl.MEASURE.DEPTH)` returns depth values in the unit defined by `InitParameters.coordinate_units`. The default is **MILLIMETERS**. However, the calibration system (extrinsics, marker geometry) operates in **METERS**.
**Resolution:**
The system now explicitly sets `InitParameters.coordinate_units = sl.UNIT.METER` when opening SVO files, ensuring consistent units across the pipeline.
This scale mismatch (factor of 1000) causes the residuals in the optimization objective function to be massive, breaking the numerical stability of the L-BFGS-B solver.
**Mitigation:**
The `SVOReader` class in `aruco/svo_sync.py` explicitly converts the retrieved depth map to meters:
```python
# aruco/svo_sync.py
return depth_data / 1000.0
```
This ensures that all geometric math downstream remains consistent in meters.
**Diagnostic Check:**
If you suspect a unit mismatch, check the `depth_verify` RMSE in the output JSON.
- **Healthy:** RMSE < 0.5 (meters)
- **Mismatch:** RMSE > 100 (likely millimeters)
*Note: Confidence filtering (`--depth-confidence-threshold`) is orthogonal to this issue. A unit mismatch affects all valid pixels regardless of confidence.*
## Findings Summary (2026-02-07)
This section summarizes the latest deep investigation across local code, outputs, and external docs.
### Confirmed Facts
1. **Marker geometry parquet is in meters**
- `aruco/markers/standard_box_markers_600mm.parquet` stores values around `0.3` (meters), not `300` (millimeters).
- `docs/marker-parquet-format.md` also documents meter-scale coordinates.
2. **Depth unit contract is still fragile**
- ZED defaults to millimeters unless `InitParameters.coordinate_units` is explicitly set.
- Current reader path converts depth by dividing by `1000.0` in `aruco/svo_sync.py`.
- This works only if incoming depth is truly millimeters. It can become fragile if unit config changes elsewhere.
3. **Observed runtime behavior still indicates refinement instability**
- Existing outputs (for example `output/aligned_refined_extrinsics*.json`) show very large `depth_verify.rmse`, often `refine_depth.success: false`, `iterations: 0`, and negligible improvement.
- This indicates that refinement quality is currently limited beyond the original mm↔m mismatch narrative.
4. **Current refinement objective is not robust enough**
- Objective is plain squared depth residuals + simple regularization.
- It does **not** currently include robust loss (Huber/Soft-L1), confidence weighting in the objective, or strong convergence diagnostics.
### Likely Contributors to Poor Refinement
- Depth outliers are not sufficiently down-weighted in optimization.
- Confidence map is used for verification filtering, but not as residual weights in the optimizer objective.
- Representative frame choice uses the latest valid frame, not necessarily the best-quality frame.
- Optimizer diagnostics are limited, making it hard to distinguish "real convergence" from "stuck at initialization".
### Recommended Implementation Order (for next session)
1. **Unit hardening (P0)**
- Explicitly set `init_params.coordinate_units = sl.UNIT.METER` in SVO reader.
- Remove or guard manual `/1000.0` conversion to avoid double-scaling risk.
- Add depth sanity logs (min/median/max sampled depth) under `--debug`.
2. **Robust objective (P0)**
- Replace MSE-only residual with Huber (or Soft-L1) in meters.
- Add confidence-weighted depth residuals in objective function.
- Split translation/rotation regularization coefficients.
3. **Frame quality selection (P1)**
- Replace "latest valid frame" with best-frame scoring:
- marker count (higher better)
- median reprojection error (lower better)
- valid depth ratio (higher better)
4. **Diagnostics and acceptance gates (P1)**
- Log optimizer termination reason, gradient/step behavior, and effective valid points.
- Treat tiny RMSE changes as "no effective refinement" even if optimizer returns.
5. **Benchmark matrix (P1)**
- Compare baseline vs robust loss vs robust+confidence vs robust+confidence+best-frame.
- Report per-camera pre/post RMSE, iteration count, and success/failure reason.
### Practical note
The previous troubleshooting section correctly explains one important failure mode (unit mismatch), but current evidence shows that **robust objective design and frame quality control** are now the primary bottlenecks for meaningful depth refinement gains.
### Optimization Stalls
If `refine_depth` shows `success: false` but `nfev` (evaluations) is high, the optimizer may have hit a flat region or local minimum.
- **Check**: Look at `termination_message` in the JSON output.
- **Fix**: Try enabling `--use-confidence-weights` or checking if the initial ArUco pose is too far off (reprojection error > 2.0).
@@ -14,7 +14,10 @@ sys.path.append(str(Path(__file__).parent.parent))
# I'll use a dynamic import or just import the module and access the function dynamically if needed,
# but standard import is better. I'll write the test file, but I won't run it until I refactor the code.
from calibrate_extrinsics import apply_depth_verify_refine_postprocess
from calibrate_extrinsics import (
apply_depth_verify_refine_postprocess,
run_benchmark_matrix,
)
@pytest.fixture
@@ -38,6 +41,9 @@ def mock_dependencies():
mock_refine_res_stats = {
"delta_rotation_deg": 1.0,
"delta_translation_norm_m": 0.1,
"success": True,
"nfev": 10,
"termination_message": "Success",
}
# refine returns (new_pose_matrix, stats)
mock_refine.return_value = (np.eye(4), mock_refine_res_stats)
@@ -45,6 +51,50 @@ def mock_dependencies():
yield mock_verify, mock_refine, mock_echo
def test_benchmark_matrix(mock_dependencies):
mock_verify, mock_refine, _ = mock_dependencies
serial = "123456"
serial_int = int(serial)
results = {serial: {"pose": "1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1"}}
frame_mock = MagicMock(
depth_map=np.zeros((10, 10)), confidence_map=np.zeros((10, 10))
)
vf = {
"frame": frame_mock,
"ids": np.array([[1]]),
"frame_index": 100,
}
verification_frames = {serial_int: vf}
first_frames = {serial_int: vf}
marker_geometry = {1: np.zeros((4, 3))}
camera_matrices = {serial_int: np.eye(3)}
bench_results = run_benchmark_matrix(
results,
verification_frames,
first_frames,
marker_geometry,
camera_matrices,
depth_confidence_threshold=50,
)
assert serial in bench_results
assert "baseline" in bench_results[serial]
assert "robust" in bench_results[serial]
assert "robust+confidence" in bench_results[serial]
assert "robust+confidence+best-frame" in bench_results[serial]
# 4 configs * (1 verify_pre + 1 refine + 1 verify_post) = 12 calls to verify, 4 to refine
assert (
mock_verify.call_count == 8
) # Wait, verify_pre and verify_post are called for each config.
# Actually, 4 configs * 2 verify calls = 8.
assert mock_refine.call_count == 4
def test_verify_only(mock_dependencies, tmp_path):
mock_verify, mock_refine, _ = mock_dependencies
@@ -75,6 +125,7 @@ def test_verify_only(mock_dependencies, tmp_path):
camera_matrices=camera_matrices,
verify_depth=True,
refine_depth=False,
use_confidence_weights=False,
depth_confidence_threshold=50,
report_csv_path=None,
)
@@ -130,6 +181,7 @@ def test_refine_depth(mock_dependencies):
camera_matrices=camera_matrices,
verify_depth=False, # refine implies verify usually, but let's check logic
refine_depth=True,
use_confidence_weights=False,
depth_confidence_threshold=50,
)
@@ -143,6 +195,103 @@ def test_refine_depth(mock_dependencies):
mock_refine.assert_called_once()
def test_refine_depth_warning_negligible_improvement(mock_dependencies):
mock_verify, mock_refine, mock_echo = mock_dependencies
serial = "123456"
results = {serial: {"pose": "1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1", "stats": {}}}
verification_frames = {
serial: {
"frame": MagicMock(depth_map=np.zeros((10, 10))),
"ids": np.array([[1]]),
}
}
marker_geometry = {1: np.zeros((4, 3))}
camera_matrices = {serial: np.eye(3)}
# RMSE stays almost same
res_pre = MagicMock(rmse=0.1, n_valid=10, residuals=[])
res_post = MagicMock(rmse=0.099999, n_valid=10, residuals=[])
mock_verify.side_effect = [res_pre, res_post]
# nfev > 5
mock_refine.return_value = (
np.eye(4),
{
"delta_rotation_deg": 0.0,
"delta_translation_norm_m": 0.0,
"success": True,
"nfev": 10,
"termination_message": "Converged",
},
)
apply_depth_verify_refine_postprocess(
results=results,
verification_frames=verification_frames,
marker_geometry=marker_geometry,
camera_matrices=camera_matrices,
verify_depth=False,
refine_depth=True,
use_confidence_weights=False,
depth_confidence_threshold=50,
)
# Check if warning was echoed
# "WARNING: Optimization ran for 10 steps but improvement was negligible"
any_negligible = any(
"negligible" in str(call.args[0]) for call in mock_echo.call_args_list
)
assert any_negligible
def test_refine_depth_warning_failed_or_stalled(mock_dependencies):
mock_verify, mock_refine, mock_echo = mock_dependencies
serial = "123456"
results = {serial: {"pose": "1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1", "stats": {}}}
verification_frames = {
serial: {
"frame": MagicMock(depth_map=np.zeros((10, 10))),
"ids": np.array([[1]]),
}
}
marker_geometry = {1: np.zeros((4, 3))}
camera_matrices = {serial: np.eye(3)}
res_pre = MagicMock(rmse=0.1, n_valid=10, residuals=[])
res_post = MagicMock(rmse=0.1, n_valid=10, residuals=[])
mock_verify.side_effect = [res_pre, res_post]
# success=False
mock_refine.return_value = (
np.eye(4),
{
"delta_rotation_deg": 0.0,
"delta_translation_norm_m": 0.0,
"success": False,
"nfev": 1,
"termination_message": "Failed",
},
)
apply_depth_verify_refine_postprocess(
results=results,
verification_frames=verification_frames,
marker_geometry=marker_geometry,
camera_matrices=camera_matrices,
verify_depth=False,
refine_depth=True,
use_confidence_weights=False,
depth_confidence_threshold=50,
)
any_failed = any(
"failed or stalled" in str(call.args[0]) for call in mock_echo.call_args_list
)
assert any_failed
def test_csv_output(mock_dependencies, tmp_path):
mock_verify, _, _ = mock_dependencies
@@ -169,6 +318,7 @@ def test_csv_output(mock_dependencies, tmp_path):
camera_matrices=camera_matrices,
verify_depth=True,
refine_depth=False,
use_confidence_weights=False,
depth_confidence_threshold=50,
report_csv_path=str(csv_path),
)
+79 -20
View File
@@ -37,6 +37,14 @@ def test_refine_extrinsics_with_depth_no_change():
# np.testing.assert_allclose(T_initial, T_refined, atol=1e-5)
# assert stats["success"] is True
assert stats["final_cost"] <= stats["initial_cost"] + 1e-10
assert "termination_status" in stats
assert "nfev" in stats
assert "optimality" in stats
assert "n_active_bounds" in stats
assert "n_depth_valid" in stats
assert "n_points_total" in stats
assert "loss_function" in stats
assert "f_scale" in stats
def test_refine_extrinsics_with_depth_with_offset():
@@ -95,48 +103,50 @@ def test_refine_extrinsics_respects_bounds():
def test_robust_loss_handles_outliers():
K = np.array([[1000, 0, 640], [0, 1000, 360], [0, 0, 1]], dtype=np.float64)
# True pose: camera moved 0.1m forward
T_true = np.eye(4)
T_true[2, 3] = 0.1
# Initial pose: identity
T_initial = np.eye(4)
# Create synthetic depth map
# Marker at (0,0,2.1) in world -> (0,0,2.0) in camera (since cam moved 0.1 forward)
depth_map = np.full((720, 1280), 2.0, dtype=np.float32)
# Add outliers: 30% of pixels are garbage (e.g. 0.5m or 5.0m)
# We'll simulate this by having multiple markers, some with bad depth
marker_corners_world = {}
# 7 good markers (depth 2.0)
# 3 bad markers (depth 5.0 - huge outlier)
# We need to ensure these project to unique pixels.
# K = 1000 focal.
# x = 0.1 * i. Z = 2.1 (world).
# u = 1000 * x / Z + 640
marker_corners_world[0] = []
for i in range(10):
u = int(50 * i + 640)
v = 360
world_pt = np.array([0.1 * i, 0, 2.1])
marker_corners_world[0].append(world_pt)
# Paint a wide strip to cover T_initial to T_true movement
# u_initial = 47.6 * i + 640. u_true = 50 * i + 640.
# Diff is ~2.4 * i. Max diff (i=9) is ~22 pixels.
# So +/- 30 pixels should cover it.
if i < 7:
depth_map[v-5:v+6, u-30:u+31] = 2.0 # Good measurement
depth_map[v - 5 : v + 6, u - 30 : u + 31] = 2.0 # Good measurement
else:
depth_map[v-5:v+6, u-30:u+31] = 5.0 # Outlier measurement (3m error)
depth_map[v - 5 : v + 6, u - 30 : u + 31] = (
5.0 # Outlier measurement (3m error)
)
marker_corners_world[0] = np.array(marker_corners_world[0])
@@ -148,15 +158,17 @@ def test_robust_loss_handles_outliers():
K,
max_translation_m=0.2,
max_rotation_deg=5.0,
regularization_weight=0.0, # Disable reg to see if data term wins
regularization_weight=0.0, # Disable reg to see if data term wins
loss="soft_l1",
f_scale=0.1
f_scale=0.1,
)
# With robust loss, it should ignore the 3m errors and converge to the 0.1m shift
# The 0.1m shift explains the 7 inliers perfectly.
# T_refined[2, 3] should be close to 0.1
assert abs(T_refined[2, 3] - 0.1) < 0.02 # Allow small error due to outliers pulling slightly
assert (
abs(T_refined[2, 3] - 0.1) < 0.02
) # Allow small error due to outliers pulling slightly
assert stats["success"] is True
# Run with linear loss (MSE) - should fail or be pulled significantly
@@ -168,14 +180,61 @@ def test_robust_loss_handles_outliers():
max_translation_m=0.2,
max_rotation_deg=5.0,
regularization_weight=0.0,
loss="linear"
loss="linear",
)
# MSE will try to average 0.0 error (7 points) and 3.0 error (3 points)
# Mean error target ~ 0.9m
# So it will likely pull the camera way back to reduce the 3m errors
# The result should be WORSE than the robust one
error_robust = abs(T_refined[2, 3] - 0.1)
error_mse = abs(T_refined_mse[2, 3] - 0.1)
assert error_robust < error_mse
def test_refine_with_confidence_weights():
K = np.array([[1000, 0, 640], [0, 1000, 360], [0, 0, 1]], dtype=np.float64)
T_initial = np.eye(4)
# 2 points: one with good depth, one with bad depth but low confidence
# Point 1: World (0,0,2.1), Depth 2.0 (True shift 0.1)
# Point 2: World (0.5,0,2.1), Depth 5.0 (Outlier)
marker_corners_world = {1: np.array([[0, 0, 2.1], [0.5, 0, 2.1]])}
depth_map = np.full((720, 1280), 2.0, dtype=np.float32)
# Paint outlier depth
depth_map[360, int(1000 * 0.5 / 2.1 + 640)] = 5.0
# Confidence map: Point 1 is confident (1), Point 2 is NOT confident (90)
confidence_map = np.full((720, 1280), 1.0, dtype=np.float32)
confidence_map[360, int(1000 * 0.5 / 2.1 + 640)] = 90.0
# 1. Without weights: Outlier should pull the result significantly
T_no_weights, stats_no_weights = refine_extrinsics_with_depth(
T_initial,
marker_corners_world,
depth_map,
K,
regularization_weight=0.0,
confidence_map=None,
loss="linear", # Use linear to make weighting effect more obvious
)
# 2. With weights: Outlier should be suppressed
T_weighted, stats_weighted = refine_extrinsics_with_depth(
T_initial,
marker_corners_world,
depth_map,
K,
regularization_weight=0.0,
confidence_map=confidence_map,
confidence_thresh=100.0,
loss="linear",
)
error_no_weights = abs(T_no_weights[2, 3] - 0.1)
error_weighted = abs(T_weighted[2, 3] - 0.1)
# Weighted error should be much smaller because the 5.0 depth was suppressed
assert error_weighted < error_no_weights
assert error_weighted < 0.06
+59
View File
@@ -0,0 +1,59 @@
import numpy as np
import pyzed.sl as sl
from unittest.mock import MagicMock
from aruco.svo_sync import SVOReader
def test_retrieve_depth_unit_guard():
# Setup SVOReader with depth enabled
reader = SVOReader([], depth_mode=sl.DEPTH_MODE.ULTRA)
# Mock Camera
mock_cam = MagicMock(spec=sl.Camera)
# Mock depth data (e.g., 2.0 meters)
depth_data = np.full((100, 100), 2.0, dtype=np.float32)
mock_mat = MagicMock(spec=sl.Mat)
mock_mat.get_data.return_value = depth_data
# Mock retrieve_measure to "fill" the mat
mock_cam.retrieve_measure.return_value = sl.ERROR_CODE.SUCCESS
# Case 1: Units are METER -> Should NOT divide by 1000
mock_init_params_meter = MagicMock(spec=sl.InitParameters)
mock_init_params_meter.coordinate_units = sl.UNIT.METER
mock_cam.get_init_parameters.return_value = mock_init_params_meter
# We need to patch sl.Mat in the test or just rely on the fact that
# _retrieve_depth creates a new sl.Mat() and calls get_data() on it.
# Since we can't easily mock the sl.Mat() call inside the method without patching,
# let's use a slightly different approach: mock the sl.Mat class itself.
with MagicMock() as mock_mat_class:
from aruco import svo_sync
original_mat = svo_sync.sl.Mat
svo_sync.sl.Mat = mock_mat_class
mock_mat_instance = mock_mat_class.return_value
mock_mat_instance.get_data.return_value = depth_data
# Test METER path
depth_meter = reader._retrieve_depth(mock_cam)
assert depth_meter is not None
assert np.allclose(depth_meter, 2.0)
# Case 2: Units are MILLIMETER -> Should divide by 1000
mock_init_params_mm = MagicMock(spec=sl.InitParameters)
mock_init_params_mm.coordinate_units = sl.UNIT.MILLIMETER
mock_cam.get_init_parameters.return_value = mock_init_params_mm
depth_mm = reader._retrieve_depth(mock_cam)
assert depth_mm is not None
assert np.allclose(depth_mm, 0.002)
# Restore original sl.Mat
svo_sync.sl.Mat = original_mat
if __name__ == "__main__":
test_retrieve_depth_unit_guard()
+72
View File
@@ -0,0 +1,72 @@
import pytest
import numpy as np
from calibrate_extrinsics import score_frame
def test_score_frame_basic():
# More markers should have higher score
corners = np.zeros((1, 4, 2))
score1 = score_frame(n_markers=1, reproj_err=1.0, corners=corners, depth_map=None)
score2 = score_frame(n_markers=2, reproj_err=1.0, corners=corners, depth_map=None)
assert score2 > score1
def test_score_frame_reproj_err():
# Lower reprojection error should have higher score
corners = np.zeros((1, 4, 2))
score1 = score_frame(n_markers=1, reproj_err=2.0, corners=corners, depth_map=None)
score2 = score_frame(n_markers=1, reproj_err=1.0, corners=corners, depth_map=None)
assert score2 > score1
def test_score_frame_depth_validity():
# Better depth validity should have higher score
# Create a 10x10 depth map
depth_map = np.ones((10, 10))
# Corners at (2, 2)
corners = np.array([[[2, 2], [2, 2], [2, 2], [2, 2]]], dtype=np.float32)
# Case 1: Depth is valid at (2, 2)
score1 = score_frame(
n_markers=1, reproj_err=1.0, corners=corners, depth_map=depth_map
)
# Case 2: Depth is invalid (NaN) at (2, 2)
depth_map_invalid = depth_map.copy()
depth_map_invalid[2, 2] = np.nan
score2 = score_frame(
n_markers=1, reproj_err=1.0, corners=corners, depth_map=depth_map_invalid
)
assert score1 > score2
def test_score_frame_confidence():
# Better confidence should have higher score
depth_map = np.ones((10, 10))
confidence_map = np.zeros((10, 10)) # 0 is most confident
corners = np.array([[[2, 2], [2, 2], [2, 2], [2, 2]]], dtype=np.float32)
# Case 1: High confidence (0)
score1 = score_frame(
n_markers=1,
reproj_err=1.0,
corners=corners,
depth_map=depth_map,
confidence_map=confidence_map,
depth_confidence_threshold=50,
)
# Case 2: Low confidence (100)
confidence_map_low = np.ones((10, 10)) * 100
score2 = score_frame(
n_markers=1,
reproj_err=1.0,
corners=corners,
depth_map=depth_map,
confidence_map=confidence_map_low,
depth_confidence_threshold=50,
)
assert score1 > score2