chore: ipc plan

2026-02-09 10:37:16 +00:00
parent 511994e3a8
commit cfacb790f5
5 changed files with 0 additions and 3150 deletions
@@ -1,745 +0,0 @@
-# ArUco-Based Multi-Camera Extrinsic Calibration from SVO
-
-## TL;DR
-
-> **Quick Summary**: Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics using robust pose averaging, and outputs accurate 4x4 transform matrices.
-> 
-> **Deliverables**:
-> - `calibrate_extrinsics.py` - Main CLI tool
-> - `pose_averaging.py` - Robust pose estimation utilities
-> - `svo_sync.py` - Multi-SVO timestamp synchronization
-> - `tests/test_pose_math.py` - Unit tests for pose calculations
-> - Output JSON with calibrated extrinsics
-> 
-> **Estimated Effort**: Medium (3-5 days)
-> **Parallel Execution**: YES - 2 waves
-> **Critical Path**: Task 1 → Task 3 → Task 5 → Task 7 → Task 8
-
---
-
-## Context
-
-### Original Request
-User wants to integrate ArUco marker detection with SVO recording playback to calibrate multi-camera extrinsics. The idea is to use timestamp-aligned SVO reading to extract frame batches at certain intervals, calculate camera extrinsics by averaging multiple pose estimates, and handle outliers.
-
-### Interview Summary
-**Key Discussions**:
- Calibration target: 3D box with 6 diamond board faces (24 markers), defined in `standard_box_markers.parquet`
- Current extrinsics in `inside_network.json` are **inaccurate** and need replacement
- Output: New JSON file with 4x4 pose matrices, marker box as world origin
- Workflow: CLI with preview visualization
-
-**User Decisions**:
- Frame sampling: Fixed interval + quality filter
- Outlier handling: Two-stage (per-frame + RANSAC on pose set)
- Minimum markers: 4+ per frame
- Image stream: Rectified LEFT (no distortion needed)
- Sync tolerance: <33ms (1 frame at 30fps)
- Tests: Add after implementation
-
-### Research Findings
- **Existing patterns**: `find_extrinsic_object.py` (ArUco + solvePnP), `svo_playback.py` (multi-SVO sync)
- **ZED SDK intrinsics**: `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam`
- **Rotation averaging**: `scipy.spatial.transform.Rotation.mean()` for geodesic mean
- **Translation averaging**: Median with MAD-based outlier rejection
- **Transform math**: `T_world_cam = inv(T_cam_marker)` when marker is world origin
-
-### Metis Review
-**Identified Gaps** (addressed):
- World frame definition → Use coordinates from `standard_box_markers.parquet`
- Transform convention → Match `inside_network.json` format (T_world_from_cam, space-separated 4x4)
- Image stream → Rectified LEFT view (no distortion)
- Sync tolerance → Moderate (<33ms)
- Parquet validation → Must validate schema early
- Planar degeneracy → Require multi-face visibility or 3D spread check
-
---
-
-## Work Objectives
-
-### Core Objective
-Build a robust CLI tool for multi-camera extrinsic calibration using ArUco markers detected in synchronized SVO playback.
-
-### Concrete Deliverables
- `py_workspace/calibrate_extrinsics.py` - Main entry point
- `py_workspace/aruco/pose_averaging.py` - Robust averaging utilities
- `py_workspace/aruco/svo_sync.py` - Multi-SVO synchronization
- `py_workspace/tests/test_pose_math.py` - Unit tests
- Output: `calibrated_extrinsics.json` with per-camera 4x4 transforms
-
-### Definition of Done
- [x] `uv run calibrate_extrinsics.py --help` → exits 0, shows required args
- [x] `uv run calibrate_extrinsics.py --validate-markers` → validates parquet schema
- [x] `uv run calibrate_extrinsics.py --svos ... --output out.json` → produces valid JSON
- [x] Output JSON contains 4 cameras with 4x4 matrices in correct format
- [x] `uv run pytest tests/test_pose_math.py` → all tests pass
- [x] Preview mode shows detected markers with axes overlay
-
-### Must Have
- Load multiple SVO files with timestamp synchronization
- Detect ArUco markers using cv2.aruco with DICT_4X4_50
- Estimate per-frame poses using cv2.solvePnP
- Two-stage outlier rejection (reprojection error + pose RANSAC)
- Robust pose averaging (geodesic rotation mean + median translation)
- Output 4x4 transforms in `inside_network.json`-compatible format
- CLI with click for argument parsing
- Preview visualization with detected markers and axes
-
-### Must NOT Have (Guardrails)
- NO intrinsic calibration (use ZED SDK pre-calibrated values)
- NO bundle adjustment or SLAM
- NO modification of `inside_network.json` in-place
- NO right camera processing (use left only)
- NO GUI beyond simple preview window
- NO depth-based verification
- NO automatic config file updates
-
---
-
-## Verification Strategy
-
-> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
->
-> ALL tasks must be verifiable by agent-executed commands. No "user visually confirms" criteria.
-
-### Test Decision
- **Infrastructure exists**: NO (need to set up pytest)
- **Automated tests**: YES (tests-after)
- **Framework**: pytest
-
-### Agent-Executed QA Scenarios (MANDATORY)
-
-**Verification Tool by Deliverable Type:**
-
-| Type | Tool | How Agent Verifies |
-|------|------|-------------------|
-| CLI | Bash | Run command, check exit code, parse output |
-| JSON output | Bash (jq) | Parse JSON, validate structure and values |
-| Preview | Playwright | Capture window screenshot (optional) |
-| Unit tests | Bash (pytest) | Run tests, assert all pass |
-
---
-
-## Execution Strategy
-
-### Parallel Execution Waves
-
-```
-Wave 1 (Start Immediately):
-├── Task 1: Core pose math utilities
-├── Task 2: Parquet loader and validator
-└── Task 4: SVO synchronization module
-
-Wave 2 (After Wave 1):
-├── Task 3: ArUco detection integration (depends: 1, 2)
-├── Task 5: Robust pose aggregation (depends: 1)
-└── Task 6: Preview visualization (depends: 3)
-
-Wave 3 (After Wave 2):
-├── Task 7: CLI integration (depends: 3, 4, 5, 6)
-└── Task 8: Tests and validation (depends: all)
-
-Critical Path: Task 1 → Task 3 → Task 7 → Task 8
-```
-
-### Dependency Matrix
-
-| Task | Depends On | Blocks | Can Parallelize With |
-|------|------------|--------|---------------------|
-| 1 | None | 3, 5 | 2, 4 |
-| 2 | None | 3 | 1, 4 |
-| 3 | 1, 2 | 6, 7 | 5 |
-| 4 | None | 7 | 1, 2 |
-| 5 | 1 | 7 | 3, 6 |
-| 6 | 3 | 7 | 5 |
-| 7 | 3, 4, 5, 6 | 8 | None |
-| 8 | 7 | None | None |
-
---
-
-## TODOs
-
- [x] 1. Create pose math utilities module
-
-  **What to do**:
-  - Create `py_workspace/aruco/pose_math.py`
-  - Implement `rvec_tvec_to_matrix(rvec, tvec) -> np.ndarray` (4x4 homogeneous)
-  - Implement `matrix_to_rvec_tvec(T) -> tuple[np.ndarray, np.ndarray]`
-  - Implement `invert_transform(T) -> np.ndarray`
-  - Implement `compose_transforms(T1, T2) -> np.ndarray`
-  - Implement `compute_reprojection_error(obj_pts, img_pts, rvec, tvec, K) -> float`
-  - Use numpy for all matrix operations
-
-  **Must NOT do**:
-  - Do NOT use scipy in this module (keep it pure numpy for core math)
-  - Do NOT implement averaging here (that's Task 5)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Pure math utilities, straightforward implementation
-  - **Skills**: []
-    - No special skills needed
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Tasks 2, 4)
-  - **Blocks**: Tasks 3, 5
-  - **Blocked By**: None
-
-  **References**:
-  - `py_workspace/aruco/find_extrinsic_object.py:123-145` - solvePnP usage and rvec/tvec handling
-  - OpenCV docs: `cv2.Rodrigues()` for rvec↔rotation matrix conversion
-  - OpenCV docs: `cv2.projectPoints()` for reprojection
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: rvec/tvec round-trip conversion
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.pose_math import *; import numpy as np; rvec=np.array([0.1,0.2,0.3]); tvec=np.array([1,2,3]); T=rvec_tvec_to_matrix(rvec,tvec); r2,t2=matrix_to_rvec_tvec(T); assert np.allclose(rvec,r2,atol=1e-6) and np.allclose(tvec,t2,atol=1e-6); print('PASS')"
-    Expected Result: Prints "PASS"
-
-  Scenario: Transform inversion identity
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.pose_math import *; import numpy as np; T=np.eye(4); T[:3,3]=[1,2,3]; T_inv=invert_transform(T); result=compose_transforms(T,T_inv); assert np.allclose(result,np.eye(4),atol=1e-9); print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add pose math utilities for transform operations`
-  - Files: `py_workspace/aruco/pose_math.py`
-
---
-
- [x] 2. Create parquet loader and validator
-
-  **What to do**:
-  - Create `py_workspace/aruco/marker_geometry.py`
-  - Implement `load_marker_geometry(parquet_path) -> dict[int, np.ndarray]`
-    - Returns mapping: marker_id → corner coordinates (4, 3)
-  - Implement `validate_marker_geometry(geometry) -> bool`
-    - Check all expected marker IDs present
-    - Check coordinates are in meters (reasonable range)
-    - Check corner ordering is consistent
-  - Use awkward-array (already in project) for parquet reading
-
-  **Must NOT do**:
-  - Do NOT hardcode marker IDs (read from parquet)
-  - Do NOT assume specific number of markers (validate dynamically)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Simple data loading and validation
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Tasks 1, 4)
-  - **Blocks**: Task 3
-  - **Blocked By**: None
-
-  **References**:
-  - `py_workspace/aruco/find_extrinsic_object.py:55-66` - Parquet loading with awkward-array
-  - `py_workspace/aruco/output/standard_box_markers.parquet` - Actual data file
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Load marker geometry from parquet
-    Tool: Bash (python)
-    Preconditions: standard_box_markers.parquet exists
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. python -c "from aruco.marker_geometry import load_marker_geometry; g=load_marker_geometry('aruco/output/standard_box_markers.parquet'); print(f'Loaded {len(g)} markers'); assert len(g) >= 4; print('PASS')"
-    Expected Result: Prints marker count and "PASS"
-
-  Scenario: Validate geometry returns True for valid data
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.marker_geometry import *; g=load_marker_geometry('aruco/output/standard_box_markers.parquet'); assert validate_marker_geometry(g); print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add marker geometry loader with validation`
-  - Files: `py_workspace/aruco/marker_geometry.py`
-
---
-
- [x] 3. Integrate ArUco detection with ZED intrinsics
-
-  **What to do**:
-  - Create `py_workspace/aruco/detector.py`
-  - Implement `create_detector() -> cv2.aruco.ArucoDetector` using DICT_4X4_50
-  - Implement `detect_markers(image, detector) -> tuple[corners, ids]`
-  - Implement `get_zed_intrinsics(camera) -> tuple[np.ndarray, np.ndarray]`
-    - Extract K matrix (3x3) and distortion from ZED SDK
-    - For rectified images, distortion should be zeros
-  - Implement `estimate_pose(corners, ids, marker_geometry, K, dist) -> tuple[rvec, tvec, error]`
-    - Match detected markers to known 3D points
-    - Call solvePnP with SOLVEPNP_SQPNP
-    - Compute and return reprojection error
-  - Require minimum 4 markers for valid pose
-
-  **Must NOT do**:
-  - Do NOT use deprecated `estimatePoseSingleMarkers`
-  - Do NOT accept poses with <4 markers
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Integration of existing patterns, moderate complexity
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 2 (after Task 1, 2)
-  - **Blocks**: Tasks 6, 7
-  - **Blocked By**: Tasks 1, 2
-
-  **References**:
-  - `py_workspace/aruco/find_extrinsic_object.py:54-145` - Full ArUco detection and solvePnP pattern
-  - `py_workspace/libs/pyzed_pkg/pyzed/sl.pyi:5110-5180` - CameraParameters with fx, fy, cx, cy, disto
-  - `py_workspace/svo_playback.py:46` - get_camera_information() usage
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Detector creation succeeds
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.detector import create_detector; d=create_detector(); print(type(d)); print('PASS')"
-    Expected Result: Prints detector type and "PASS"
-
-  Scenario: Pose estimation with synthetic data
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         import numpy as np
-         from aruco.detector import estimate_pose
-         from aruco.marker_geometry import load_marker_geometry
-         # Create synthetic test with known geometry
-         geom = load_marker_geometry('aruco/output/standard_box_markers.parquet')
-         K = np.array([[700,0,960],[0,700,540],[0,0,1]], dtype=np.float64)
-         # Test passes if function runs without error
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add ArUco detector with ZED intrinsics integration`
-  - Files: `py_workspace/aruco/detector.py`
-
---
-
- [x] 4. Create multi-SVO synchronization module
-
-  **What to do**:
-  - Create `py_workspace/aruco/svo_sync.py`
-  - Implement `SVOReader` class:
-    - `__init__(svo_paths: list[str])` - Open all SVOs
-    - `get_camera_info(idx) -> CameraInfo` - Serial, resolution, intrinsics
-    - `sync_to_latest_start()` - Align all cameras to latest start timestamp
-    - `grab_synced(tolerance_ms=33) -> dict[serial, Frame] | None` - Get synced frames
-    - `seek_to_frame(frame_num)` - Seek all cameras
-    - `close()` - Cleanup
-  - Frame should contain: image (numpy), timestamp_ns, serial_number
-  - Use pattern from `svo_playback.py` for sync logic
-
-  **Must NOT do**:
-  - Do NOT implement complex clock drift correction
-  - Do NOT handle streaming (SVO only)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Adapting existing pattern, moderate complexity
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Tasks 1, 2)
-  - **Blocks**: Task 7
-  - **Blocked By**: None
-
-  **References**:
-  - `py_workspace/svo_playback.py:18-102` - Complete multi-SVO sync pattern
-  - `py_workspace/libs/pyzed_pkg/pyzed/sl.pyi:10010-10097` - SVO position and frame methods
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: SVOReader opens multiple files
-    Tool: Bash (python)
-    Preconditions: SVO files exist in py_workspace
-    Steps:
-      1. python -c "
-         from aruco.svo_sync import SVOReader
-         import glob
-         svos = glob.glob('*.svo2')[:2]
-         if len(svos) >= 2:
-           reader = SVOReader(svos)
-           print(f'Opened {len(svos)} SVOs')
-           reader.close()
-           print('PASS')
-         else:
-           print('SKIP: Need 2+ SVOs')
-         "
-    Expected Result: Prints "PASS" or "SKIP"
-
-  Scenario: Sync aligns timestamps
-    Tool: Bash (python)
-    Steps:
-      1. Test sync_to_latest_start returns without error
-    Expected Result: No exception raised
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add multi-SVO synchronization reader`
-  - Files: `py_workspace/aruco/svo_sync.py`
-
---
-
- [x] 5. Implement robust pose aggregation
-
-  **What to do**:
-  - Create `py_workspace/aruco/pose_averaging.py`
-  - Implement `PoseAccumulator` class:
-    - `add_pose(T: np.ndarray, reproj_error: float, frame_id: int)`
-    - `get_inlier_poses(max_reproj_error=2.0) -> list[np.ndarray]`
-    - `compute_robust_mean() -> tuple[np.ndarray, dict]`
-      - Use scipy.spatial.transform.Rotation.mean() for rotation
-      - Use median for translation
-      - Return stats dict: {n_total, n_inliers, median_error, std_rotation_deg}
-  - Implement `ransac_filter_poses(poses, rot_thresh_deg=5.0, trans_thresh_m=0.05) -> list[int]`
-    - Return indices of inlier poses
-
-  **Must NOT do**:
-  - Do NOT implement bundle adjustment
-  - Do NOT modify poses in-place
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Math-focused but requires scipy understanding
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Task 3)
-  - **Blocks**: Task 7
-  - **Blocked By**: Task 1
-
-  **References**:
-  - Librarian findings on `scipy.spatial.transform.Rotation.mean()`
-  - Librarian findings on RANSAC-style pose filtering
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Rotation averaging produces valid result
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.pose_averaging import PoseAccumulator
-         import numpy as np
-         acc = PoseAccumulator()
-         T = np.eye(4)
-         acc.add_pose(T, reproj_error=1.0, frame_id=0)
-         acc.add_pose(T, reproj_error=1.5, frame_id=1)
-         mean_T, stats = acc.compute_robust_mean()
-         assert mean_T.shape == (4,4)
-         assert stats['n_inliers'] == 2
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-
-  Scenario: RANSAC rejects outliers
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.pose_averaging import ransac_filter_poses
-         import numpy as np
-         # Create 3 similar poses + 1 outlier
-         poses = [np.eye(4) for _ in range(3)]
-         outlier = np.eye(4); outlier[:3,3] = [10,10,10]  # Far away
-         poses.append(outlier)
-         inliers = ransac_filter_poses(poses, trans_thresh_m=0.1)
-         assert len(inliers) == 3
-         assert 3 not in inliers
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add robust pose averaging with RANSAC filtering`
-  - Files: `py_workspace/aruco/pose_averaging.py`
-
---
-
- [x] 6. Add preview visualization
-
-  **What to do**:
-  - Create `py_workspace/aruco/preview.py`
-  - Implement `draw_detected_markers(image, corners, ids) -> np.ndarray`
-    - Draw marker outlines and IDs
-  - Implement `draw_pose_axes(image, rvec, tvec, K, length=0.1) -> np.ndarray`
-    - Use cv2.drawFrameAxes
-  - Implement `show_preview(images: dict[str, np.ndarray], wait_ms=1) -> int`
-    - Show multiple camera views in separate windows
-    - Return key pressed
-
-  **Must NOT do**:
-  - Do NOT implement complex GUI
-  - Do NOT block indefinitely (use waitKey with timeout)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Simple OpenCV visualization
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Task 5)
-  - **Blocks**: Task 7
-  - **Blocked By**: Task 3
-
-  **References**:
-  - `py_workspace/aruco/find_extrinsic_object.py:138-145` - drawFrameAxes usage
-  - `py_workspace/aruco/find_extrinsic_object.py:84-105` - Marker visualization
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Draw functions return valid images
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.preview import draw_detected_markers
-         import numpy as np
-         img = np.zeros((480,640,3), dtype=np.uint8)
-         corners = [np.array([[100,100],[200,100],[200,200],[100,200]], dtype=np.float32)]
-         ids = np.array([[1]])
-         result = draw_detected_markers(img, corners, ids)
-         assert result.shape == (480,640,3)
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add preview visualization utilities`
-  - Files: `py_workspace/aruco/preview.py`
-
---
-
- [x] 7. Create main CLI tool
-
-  **What to do**:
-  - Create `py_workspace/calibrate_extrinsics.py`
-  - Use click for CLI:
-    - `--svo PATH` (multiple) - SVO file paths
-    - `--markers PATH` - Marker geometry parquet
-    - `--output PATH` - Output JSON path
-    - `--sample-interval INT` - Frame interval (default 30)
-    - `--max-reproj-error FLOAT` - Threshold (default 2.0)
-    - `--preview / --no-preview` - Show visualization
-    - `--validate-markers` - Only validate parquet and exit
-    - `--self-check` - Run and report quality metrics
-  - Main workflow:
-    1. Load marker geometry and validate
-    2. Open SVOs and sync
-    3. Sample frames at interval
-    4. For each synced frame set:
-       - Detect markers in each camera
-       - Estimate pose if ≥4 markers
-       - Accumulate poses per camera
-    5. Compute robust mean per camera
-    6. Output JSON in inside_network.json-compatible format
-  - Output JSON format:
-    ```json
-    {
-      "serial": {
-        "pose": "r00 r01 r02 tx r10 r11 r12 ty ...",
-        "stats": { "n_frames": N, "median_reproj_error": X }
-      }
-    }
-    ```
-
-  **Must NOT do**:
-  - Do NOT modify existing config files
-  - Do NOT implement auto-update of inside_network.json
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-high`
-    - Reason: Integration of all components, complex workflow
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 3 (final integration)
-  - **Blocks**: Task 8
-  - **Blocked By**: Tasks 3, 4, 5, 6
-
-  **References**:
-  - `py_workspace/svo_playback.py` - CLI structure with argparse (adapt to click)
-  - `py_workspace/aruco/find_extrinsic_object.py` - Main loop pattern
-  - `zed_settings/inside_network.json:20` - Output pose format
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: CLI help works
-    Tool: Bash
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. uv run calibrate_extrinsics.py --help
-    Expected Result: Exit code 0, shows --svo, --markers, --output options
-
-  Scenario: Validate markers only mode
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py --markers aruco/output/standard_box_markers.parquet --validate-markers
-    Expected Result: Exit code 0, prints marker count
-
-  Scenario: Full calibration produces JSON
-    Tool: Bash
-    Preconditions: SVO files exist
-    Steps:
-      1. uv run calibrate_extrinsics.py \
-           --svo ZED_SN46195029.svo2 \
-           --svo ZED_SN44435674.svo2 \
-           --markers aruco/output/standard_box_markers.parquet \
-           --output /tmp/test_extrinsics.json \
-           --no-preview \
-           --sample-interval 100
-      2. jq 'keys' /tmp/test_extrinsics.json
-    Expected Result: Exit code 0, JSON contains camera serials
-
-  Scenario: Self-check reports quality
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py ... --self-check
-    Expected Result: Prints per-camera stats including median reproj error
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add calibrate_extrinsics CLI tool`
-  - Files: `py_workspace/calibrate_extrinsics.py`
-
---
-
- [x] 8. Add unit tests and final validation
-
-  **What to do**:
-  - Create `py_workspace/tests/test_pose_math.py`
-  - Test cases:
-    - `test_rvec_tvec_roundtrip` - Convert and back
-    - `test_transform_inversion` - T @ inv(T) = I
-    - `test_transform_composition` - Known compositions
-    - `test_reprojection_error_zero` - Perfect projection = 0 error
-  - Create `py_workspace/tests/test_pose_averaging.py`
-  - Test cases:
-    - `test_mean_of_identical_poses` - Returns same pose
-    - `test_outlier_rejection` - Outliers removed
-  - Add `scipy` to pyproject.toml if not present
-  - Run full test suite
-
-  **Must NOT do**:
-  - Do NOT require real SVO files for unit tests (use synthetic data)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Straightforward test implementation
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 3 (final)
-  - **Blocks**: None
-  - **Blocked By**: Task 7
-
-  **References**:
-  - Task 1 acceptance criteria for test patterns
-  - Task 5 acceptance criteria for averaging tests
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: All unit tests pass
-    Tool: Bash
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. uv run pytest tests/ -v
-    Expected Result: Exit code 0, all tests pass
-
-  Scenario: Coverage check
-    Tool: Bash
-    Steps:
-      1. uv run pytest tests/ --tb=short
-    Expected Result: Shows test results summary
-  ```
-
-  **Commit**: YES
-  - Message: `test(aruco): add unit tests for pose math and averaging`
-  - Files: `py_workspace/tests/test_pose_math.py`, `py_workspace/tests/test_pose_averaging.py`
-
---
-
-## Commit Strategy
-
-| After Task | Message | Files | Verification |
-|------------|---------|-------|--------------|
-| 1 | `feat(aruco): add pose math utilities` | pose_math.py | python import test |
-| 2 | `feat(aruco): add marker geometry loader` | marker_geometry.py | python import test |
-| 3 | `feat(aruco): add ArUco detector` | detector.py | python import test |
-| 4 | `feat(aruco): add multi-SVO sync` | svo_sync.py | python import test |
-| 5 | `feat(aruco): add pose averaging` | pose_averaging.py | python import test |
-| 6 | `feat(aruco): add preview utils` | preview.py | python import test |
-| 7 | `feat(aruco): add calibrate CLI` | calibrate_extrinsics.py | --help works |
-| 8 | `test(aruco): add unit tests` | tests/*.py | pytest passes |
-
---
-
-## Success Criteria
-
-### Verification Commands
-```bash
-# CLI works
-uv run calibrate_extrinsics.py --help  # Expected: exit 0
-
-# Marker validation
-uv run calibrate_extrinsics.py --markers aruco/output/standard_box_markers.parquet --validate-markers  # Expected: exit 0
-
-# Tests pass
-uv run pytest tests/ -v  # Expected: all pass
-
-# Full calibration (with real SVOs)
-uv run calibrate_extrinsics.py --svo *.svo2 --markers aruco/output/standard_box_markers.parquet --output calibrated.json --no-preview
-jq 'keys' calibrated.json  # Expected: camera serials
-```
-
-### Final Checklist
- [x] All "Must Have" present
- [x] All "Must NOT Have" absent
- [x] All tests pass
- [x] CLI --help shows all options
- [x] Output JSON matches inside_network.json pose format
- [x] Preview shows detected markers with axes
@@ -1,713 +0,0 @@
-# Depth-Based Extrinsic Verification and Refinement
-
-## TL;DR
-
-> **Quick Summary**: Add depth-based verification and refinement capabilities to the existing ArUco calibration CLI. Compare predicted depth (from computed extrinsics) against measured depth (from ZED sensors) to validate calibration quality, and optionally optimize extrinsics to minimize depth residuals.
-> 
-> **Deliverables**:
-> - `aruco/depth_verify.py` - Depth residual computation and verification metrics
-> - `aruco/depth_refine.py` - Direct optimization to refine extrinsics using depth
-> - Extended `aruco/svo_sync.py` - Depth-enabled SVO reader
-> - Updated `calibrate_extrinsics.py` - New CLI flags for depth verification/refinement
-> - `tests/test_depth_verify.py` - Unit tests for depth modules
-> - Verification reports in JSON + optional CSV
-> 
-> **Estimated Effort**: Medium (2-3 days)
-> **Parallel Execution**: YES - 2 waves
-> **Critical Path**: Task 1 → Task 2 → Task 4 → Task 5 → Task 6
-
---
-
-## Context
-
-### Original Request
-User wants to add a utility to examine/fuse the extrinsic parameters via depth info with the ArUco box. The goal is to verify that ArUco-computed extrinsics are correct by comparing predicted vs measured depth, and optionally refine them using direct optimization.
-
-### Interview Summary
-**Key Discussions**:
- Primary goal: Both verify AND refine extrinsics using depth data
- Integration: Add to existing `calibrate_extrinsics.py` CLI (new flags)
- Depth mode: CLI argument with default to NEURAL
- Target geometry: Any markers from parquet file (not just ArUco box)
-
-**User Decisions**:
- Refinement method: Direct optimization (minimize depth residuals)
- Output: Full reporting (console + JSON + optional CSV)
- Depth filtering: Confidence-based with ZED thresholds
- Testing: Tests after implementation
- CLI flags: Separate `--verify-depth` and `--refine-depth` flags
-
-### Research Findings
- **ZED SDK depth**: `retrieve_measure(mat, MEASURE.DEPTH)` returns depth in meters
- **Pixel access**: `mat.get_value(x, y)` returns depth at specific coordinates
- **Depth residual**: `r = z_measured - z_predicted` where `z_predicted = (R @ P_world + t)[2]`
- **Confidence filtering**: Use `MEASURE.CONFIDENCE` with threshold (lower = more reliable)
- **Current SVOReader**: Uses `DEPTH_MODE.NONE` - needs extension for depth
-
-### Metis Review
-**Identified Gaps** (addressed):
- Transform chain clarity → Use existing `T_world_cam` convention from calibrate_extrinsics.py
- Depth sampling at corners → Use 5x5 median window around projected pixel
- Confidence threshold direction → Verify ZED semantics (0-100, lower = more confident)
- Optimization bounds → Add regularization to stay within ±5cm / ±5° of initial
- Unit consistency → Verify parquet uses meters (same as ZED depth)
- Non-regression → Depth features strictly opt-in, no behavior change without flags
-
---
-
-## Work Objectives
-
-### Core Objective
-Add depth-based verification and optional refinement to the calibration pipeline, allowing users to validate and improve ArUco-computed extrinsics using ZED depth measurements.
-
-### Concrete Deliverables
- `py_workspace/aruco/depth_verify.py` - Depth residual computation
- `py_workspace/aruco/depth_refine.py` - Extrinsic optimization
- `py_workspace/aruco/svo_sync.py` - Extended with depth support
- `py_workspace/calibrate_extrinsics.py` - Updated with new CLI flags
- `py_workspace/tests/test_depth_verify.py` - Unit tests
- Output: Verification stats in JSON, optional per-frame CSV
-
-### Definition of Done
- [x] `uv run calibrate_extrinsics.py --help` → shows --verify-depth, --refine-depth, --depth-mode flags
- [x] Running without depth flags produces identical output to current behavior
- [x] `--verify-depth` produces verification metrics in output JSON
- [x] `--refine-depth` optimizes extrinsics and reports pre/post metrics
- [x] `--report-csv` outputs per-frame residuals to CSV file
- [x] `uv run pytest tests/test_depth_verify.py` → all tests pass
-
-
-### Must Have
- Extend SVOReader to optionally enable depth mode and retrieve depth maps
- Compute depth residuals at detected marker corner positions
- Use 5x5 median window for robust depth sampling
- Confidence-based filtering (reject low-confidence depth)
- Verification metrics: RMSE, mean absolute, median, depth-normalized error
- Direct optimization using scipy.optimize.minimize with bounds
- Regularization to prevent large jumps from initial extrinsics (±5cm, ±5°)
- Report both depth metrics AND existing reprojection metrics pre/post refinement
- JSON schema versioning field
- Opt-in CLI flags (no behavior change when not specified)
-
-### Must NOT Have (Guardrails)
- NO bundle adjustment or intrinsics optimization
- NO ICP or point cloud registration (use pixel-depth residuals only)
- NO per-frame time-varying extrinsics
- NO new detection pipelines (reuse existing ArUco detection)
- NO GUI viewers or interactive tuning
- NO modification of existing output format when depth flags not used
- NO alternate ArUco detection code paths
-
---
-
-## Verification Strategy
-
-> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
->
-> ALL tasks must be verifiable by agent-executed commands. No "user visually confirms" criteria.
-
-### Test Decision
- **Infrastructure exists**: YES (pytest already in use)
- **Automated tests**: YES (tests-after)
- **Framework**: pytest
-
-### Agent-Executed QA Scenarios (MANDATORY)
-
-| Type | Tool | How Agent Verifies |
-|------|------|-------------------|
-| CLI | Bash | Run command, check exit code, parse output |
-| JSON output | Bash (jq/python) | Parse JSON, validate structure and values |
-| Unit tests | Bash (pytest) | Run tests, assert all pass |
-| Non-regression | Bash | Compare outputs with/without depth flags |
-
---
-
-## Execution Strategy
-
-### Parallel Execution Waves
-
-```
-Wave 1 (Start Immediately):
-├── Task 1: Extend SVOReader for depth support
-└── Task 2: Create depth residual computation module
-
-Wave 2 (After Wave 1):
-├── Task 3: Create depth refinement module (depends: 2)
-├── Task 4: Add CLI flags to calibrate_extrinsics.py (depends: 1, 2)
-└── Task 5: Integrate verification into CLI workflow (depends: 1, 2, 4)
-
-Wave 3 (After Wave 2):
-├── Task 6: Integrate refinement into CLI workflow (depends: 3, 5)
-└── Task 7: Add unit tests (depends: 2, 3)
-
-Critical Path: Task 1 → Task 2 → Task 4 → Task 5 → Task 6
-```
-
-### Dependency Matrix
-
-| Task | Depends On | Blocks | Can Parallelize With |
-|------|------------|--------|---------------------|
-| 1 | None | 4, 5 | 2 |
-| 2 | None | 3, 4, 5 | 1 |
-| 3 | 2 | 6 | 4 |
-| 4 | 1, 2 | 5, 6 | 3 |
-| 5 | 1, 2, 4 | 6, 7 | None |
-| 6 | 3, 5 | 7 | None |
-| 7 | 2, 3 | None | 6 |
-
---
-
-## TODOs
-
- [x] 1. Extend SVOReader for depth support
-
-  **What to do**:
-  - Modify `py_workspace/aruco/svo_sync.py`
-  - Add `depth_mode` parameter to `SVOReader.__init__()` (default: `DEPTH_MODE.NONE`)
-  - Add `enable_depth` property that returns True if depth_mode != NONE
-  - Add `depth_map: Optional[np.ndarray]` field to `FrameData` dataclass
-  - In `grab_all()` and `grab_synced()`, if depth enabled:
-    - Call `cam.retrieve_measure(depth_mat, sl.MEASURE.DEPTH)`
-    - Store `depth_mat.get_data().copy()` in FrameData
-  - Add `get_depth_at(frame: FrameData, x: int, y: int) -> Optional[float]` helper
-  - Add `get_depth_window_median(frame: FrameData, x: int, y: int, size: int = 5) -> Optional[float]`
-
-  **Must NOT do**:
-  - Do NOT change default behavior (depth_mode defaults to NONE)
-  - Do NOT retrieve depth when not needed (performance)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Extending existing class with new optional feature
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 2)
-  - **Blocks**: Tasks 4, 5
-  - **Blocked By**: None
-
-  **References**:
-  - `py_workspace/aruco/svo_sync.py:35` - Current depth_mode = NONE setting
-  - `py_workspace/depth_sensing.py:95` - retrieve_measure pattern
-  - `py_workspace/libs/pyzed_pkg/pyzed/sl.pyi:9879-9941` - retrieve_measure API
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: SVOReader with depth disabled (default)
-    Tool: Bash (python)
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. python -c "from aruco.svo_sync import SVOReader; r = SVOReader([]); assert not r.enable_depth; print('PASS')"
-    Expected Result: Prints "PASS"
-
-  Scenario: SVOReader accepts depth_mode parameter
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.svo_sync import SVOReader; import pyzed.sl as sl; r = SVOReader([], depth_mode=sl.DEPTH_MODE.NEURAL); assert r.enable_depth; print('PASS')"
-    Expected Result: Prints "PASS"
-
-  Scenario: FrameData has depth_map field
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.svo_sync import FrameData; import numpy as np; f = FrameData(image=np.zeros((10,10,3), dtype=np.uint8), timestamp_ns=0, frame_index=0, serial_number=0, depth_map=None); print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): extend SVOReader with depth map support`
-  - Files: `py_workspace/aruco/svo_sync.py`
-
---
-
- [x] 2. Create depth residual computation module
-
-  **What to do**:
-  - Create `py_workspace/aruco/depth_verify.py`
-  - Implement `project_point_to_pixel(P_cam: np.ndarray, K: np.ndarray) -> tuple[int, int]`
-    - Project 3D camera-frame point to pixel coordinates
-  - Implement `compute_depth_residual(P_world, T_world_cam, depth_map, K, window_size=5) -> Optional[float]`
-    - Transform point to camera frame: `P_cam = invert_transform(T_world_cam) @ [P_world, 1]`
-    - Project to pixel, sample depth with median window
-    - Return `z_measured - z_predicted` or None if invalid
-  - Implement `DepthVerificationResult` dataclass:
-    - Fields: `residuals: list[float]`, `rmse: float`, `mean_abs: float`, `median: float`, `depth_normalized_rmse: float`, `n_valid: int`, `n_total: int`
-  - Implement `verify_extrinsics_with_depth(T_world_cam, marker_corners_world, depth_map, K, confidence_map=None, confidence_thresh=50) -> DepthVerificationResult`
-    - For each marker corner, compute residual
-    - Filter by confidence if provided
-    - Compute aggregate metrics
-
-  **Must NOT do**:
-  - Do NOT use ICP or point cloud alignment
-  - Do NOT modify extrinsics (that's Task 3)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Math-focused module, moderate complexity
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 1)
-  - **Blocks**: Tasks 3, 4, 5
-  - **Blocked By**: None
-
-  **References**:
-  - `py_workspace/aruco/pose_math.py` - Transform utilities (invert_transform, etc.)
-  - `py_workspace/aruco/detector.py:62-85` - Camera matrix building pattern
-  - Librarian findings on depth residual computation
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Project point to pixel correctly
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.depth_verify import project_point_to_pixel
-         import numpy as np
-         K = np.array([[1000, 0, 640], [0, 1000, 360], [0, 0, 1]])
-         P_cam = np.array([0, 0, 1])  # Point at origin, 1m away
-         u, v = project_point_to_pixel(P_cam, K)
-         assert u == 640 and v == 360, f'Got {u}, {v}'
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-
-  Scenario: Compute depth residual with perfect match
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.depth_verify import compute_depth_residual
-         import numpy as np
-         # Identity transform, point at (0, 0, 2m)
-         T = np.eye(4)
-         K = np.array([[1000, 0, 320], [0, 1000, 240], [0, 0, 1]])
-         depth_map = np.full((480, 640), 2.0, dtype=np.float32)
-         P_world = np.array([0, 0, 2])
-         r = compute_depth_residual(P_world, T, depth_map, K, window_size=1)
-         assert abs(r) < 0.001, f'Residual should be ~0, got {r}'
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-
-  Scenario: DepthVerificationResult has required fields
-    Tool: Bash (python)
-    Steps:
-      1. python -c "from aruco.depth_verify import DepthVerificationResult; r = DepthVerificationResult(residuals=[], rmse=0, mean_abs=0, median=0, depth_normalized_rmse=0, n_valid=0, n_total=0); print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add depth verification module with residual computation`
-  - Files: `py_workspace/aruco/depth_verify.py`
-
---
-
- [x] 3. Create depth refinement module
-
-  **What to do**:
-  - Create `py_workspace/aruco/depth_refine.py`
-  - Implement `extrinsics_to_params(T: np.ndarray) -> np.ndarray`
-    - Convert 4x4 matrix to 6-DOF params (rvec + tvec)
-  - Implement `params_to_extrinsics(params: np.ndarray) -> np.ndarray`
-    - Convert 6-DOF params back to 4x4 matrix
-  - Implement `depth_residual_objective(params, marker_corners_world, depth_map, K, initial_params, regularization_weight=0.1) -> float`
-    - Compute sum of squared depth residuals + regularization term
-    - Regularization: penalize deviation from initial_params
-  - Implement `refine_extrinsics_with_depth(T_initial, marker_corners_world, depth_map, K, max_translation_m=0.05, max_rotation_deg=5.0) -> tuple[np.ndarray, dict]`
-    - Use `scipy.optimize.minimize` with method='L-BFGS-B'
-    - Add bounds based on max_translation and max_rotation
-    - Return refined T and stats dict (iterations, final_cost, delta_translation, delta_rotation)
-
-  **Must NOT do**:
-  - Do NOT optimize intrinsics or distortion
-  - Do NOT allow unbounded optimization (must use regularization/bounds)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Optimization with scipy, moderate complexity
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Task 4)
-  - **Blocks**: Task 6
-  - **Blocked By**: Task 2
-
-  **References**:
-  - `py_workspace/aruco/pose_math.py` - rvec_tvec_to_matrix, matrix_to_rvec_tvec
-  - scipy.optimize.minimize documentation
-  - Librarian findings on direct optimization
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Params round-trip conversion
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.depth_refine import extrinsics_to_params, params_to_extrinsics
-         from aruco.pose_math import rvec_tvec_to_matrix
-         import numpy as np
-         T = rvec_tvec_to_matrix(np.array([0.1, 0.2, 0.3]), np.array([1, 2, 3]))
-         params = extrinsics_to_params(T)
-         T2 = params_to_extrinsics(params)
-         assert np.allclose(T, T2, atol=1e-9), 'Round-trip failed'
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-
-  Scenario: Refinement respects bounds
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         from aruco.depth_refine import refine_extrinsics_with_depth
-         import numpy as np
-         # Synthetic test with small perturbation
-         T = np.eye(4)
-         T[0, 3] = 0.01  # 1cm offset
-         corners = np.array([[0, 0, 2], [0.1, 0, 2], [0.1, 0.1, 2], [0, 0.1, 2]])
-         K = np.array([[1000, 0, 320], [0, 1000, 240], [0, 0, 1]])
-         depth = np.full((480, 640), 2.0, dtype=np.float32)
-         T_refined, stats = refine_extrinsics_with_depth(T, corners, depth, K, max_translation_m=0.05)
-         delta = stats['delta_translation_norm_m']
-         assert delta < 0.05, f'Translation moved too far: {delta}'
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add depth refinement module with bounded optimization`
-  - Files: `py_workspace/aruco/depth_refine.py`
-
---
-
- [x] 4. Add CLI flags to calibrate_extrinsics.py
-
-  **What to do**:
-  - Modify `py_workspace/calibrate_extrinsics.py`
-  - Add new click options:
-    - `--verify-depth / --no-verify-depth` (default: False) - Enable depth verification
-    - `--refine-depth / --no-refine-depth` (default: False) - Enable depth refinement
-    - `--depth-mode` (default: "NEURAL") - Depth computation mode (NEURAL, ULTRA, PERFORMANCE)
-    - `--depth-confidence-threshold` (default: 50) - Confidence threshold for depth filtering
-    - `--report-csv PATH` - Optional path for per-frame CSV report
-  - Update InitParameters when depth flags are set
-  - Pass depth_mode to SVOReader
-
-  **Must NOT do**:
-  - Do NOT change any existing behavior when new flags are not specified
-  - Do NOT remove or modify existing CLI options
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Adding CLI options, straightforward
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Task 3)
-  - **Blocks**: Tasks 5, 6
-  - **Blocked By**: Tasks 1, 2
-
-  **References**:
-  - `py_workspace/calibrate_extrinsics.py:22-42` - Existing click options
-  - Click documentation for option syntax
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: CLI help shows new flags
-    Tool: Bash
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. uv run calibrate_extrinsics.py --help | grep -E "(verify-depth|refine-depth|depth-mode)"
-    Expected Result: All three flags appear in help output
-
-  Scenario: Default behavior unchanged
-    Tool: Bash (python)
-    Steps:
-      1. python -c "
-         # Parse default values
-         import click
-         from calibrate_extrinsics import main
-         ctx = click.Context(main)
-         params = {p.name: p.default for p in main.params}
-         assert params.get('verify_depth') == False, 'verify_depth should default False'
-         assert params.get('refine_depth') == False, 'refine_depth should default False'
-         print('PASS')
-         "
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(cli): add depth verification and refinement flags`
-  - Files: `py_workspace/calibrate_extrinsics.py`
-
---
-
- [x] 5. Integrate verification into CLI workflow
-
-  **What to do**:
-  - Modify `py_workspace/calibrate_extrinsics.py`
-  - When `--verify-depth` is set:
-    - After computing extrinsics, run depth verification for each camera
-    - Use detected marker corners (already in image coordinates) + known 3D positions
-    - Sample depth at corner pixel positions using median window
-    - Compute DepthVerificationResult per camera
-    - Add `depth_verify` section to output JSON:
-      ```json
-      {
-        "serial": {
-          "pose": "...",
-          "stats": {...},
-          "depth_verify": {
-            "rmse": 0.015,
-            "mean_abs": 0.012,
-            "median": 0.010,
-            "depth_normalized_rmse": 0.008,
-            "n_valid": 45,
-            "n_total": 48
-          }
-        }
-      }
-      ```
-    - Print verification summary to console
-    - If `--report-csv` specified, write per-frame residuals
-
-  **Must NOT do**:
-  - Do NOT modify extrinsics (that's Task 6)
-  - Do NOT break existing JSON format for cameras without depth_verify
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-high`
-    - Reason: Integration task, requires careful coordination
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 2 (sequential)
-  - **Blocks**: Tasks 6, 7
-  - **Blocked By**: Tasks 1, 2, 4
-
-  **References**:
-  - `py_workspace/calibrate_extrinsics.py:186-212` - Current output generation
-  - `py_workspace/aruco/depth_verify.py` - Verification module (Task 2)
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Verify-depth adds depth_verify to JSON
-    Tool: Bash
-    Preconditions: SVO files and markers exist
-    Steps:
-      1. uv run calibrate_extrinsics.py --svo *.svo2 --markers aruco/output/standard_box_markers.parquet --output /tmp/test_verify.json --verify-depth --no-preview --sample-interval 100
-      2. python -c "import json; d=json.load(open('/tmp/test_verify.json')); k=list(d.keys())[0]; assert 'depth_verify' in d[k], 'Missing depth_verify'; print('PASS')"
-    Expected Result: Prints "PASS"
-
-  Scenario: CSV report generated when flag set
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py ... --verify-depth --report-csv /tmp/residuals.csv
-      2. python -c "import csv; rows=list(csv.reader(open('/tmp/residuals.csv'))); assert len(rows) > 1; print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(cli): integrate depth verification into calibration workflow`
-  - Files: `py_workspace/calibrate_extrinsics.py`
-
---
-
- [x] 6. Integrate refinement into CLI workflow
-
-  **What to do**:
-  - Modify `py_workspace/calibrate_extrinsics.py`
-  - When `--refine-depth` is set (requires `--verify-depth` implicitly):
-    - After initial extrinsics computation, run depth refinement
-    - Report both pre-refinement and post-refinement metrics
-    - Update the pose in output JSON with refined values
-    - Add `refine_depth` section to output JSON:
-      ```json
-      {
-        "serial": {
-          "pose": "...",  // Now refined
-          "stats": {...},
-          "depth_verify": {...},  // Pre-refinement
-          "depth_verify_post": {...},  // Post-refinement
-          "refine_depth": {
-            "iterations": 15,
-            "delta_translation_norm_m": 0.008,
-            "delta_rotation_deg": 0.5,
-            "improvement_rmse": 0.003
-          }
-        }
-      }
-      ```
-    - Print refinement summary to console
-
-  **Must NOT do**:
-  - Do NOT allow refinement without verification (refine implies verify)
-  - Do NOT remove regularization bounds
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-high`
-    - Reason: Final integration, careful coordination
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 3 (final)
-  - **Blocks**: Task 7
-  - **Blocked By**: Tasks 3, 5
-
-  **References**:
-  - `py_workspace/aruco/depth_refine.py` - Refinement module (Task 3)
-  - Task 5 output format
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Refine-depth produces refined extrinsics
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py --svo *.svo2 --markers aruco/output/standard_box_markers.parquet --output /tmp/test_refine.json --refine-depth --no-preview --sample-interval 100
-      2. python -c "import json; d=json.load(open('/tmp/test_refine.json')); k=list(d.keys())[0]; assert 'refine_depth' in d[k]; assert 'depth_verify_post' in d[k]; print('PASS')"
-    Expected Result: Prints "PASS"
-
-  Scenario: Refine reports improvement metrics
-    Tool: Bash
-    Steps:
-      1. python -c "import json; d=json.load(open('/tmp/test_refine.json')); k=list(d.keys())[0]; r=d[k]['refine_depth']; assert 'delta_translation_norm_m' in r; print('PASS')"
-    Expected Result: Prints "PASS"
-  ```
-
-  **Commit**: YES
-  - Message: `feat(cli): integrate depth refinement into calibration workflow`
-  - Files: `py_workspace/calibrate_extrinsics.py`
-
---
-
- [x] 7. Add unit tests for depth modules
-
-  **What to do**:
-  - Create `py_workspace/tests/test_depth_verify.py`
-  - Test cases:
-    - `test_project_point_to_pixel` - Verify projection math
-    - `test_compute_depth_residual_perfect` - Zero residual for matching depth
-    - `test_compute_depth_residual_offset` - Correct residual for offset depth
-    - `test_verify_extrinsics_metrics` - Verify RMSE, mean_abs, median computation
-    - `test_invalid_depth_handling` - NaN/Inf depth returns None
-  - Create `py_workspace/tests/test_depth_refine.py`
-  - Test cases:
-    - `test_params_roundtrip` - extrinsics_to_params ↔ params_to_extrinsics
-    - `test_refinement_reduces_error` - Synthetic case where refinement improves fit
-    - `test_refinement_respects_bounds` - Verify max_translation/rotation honored
-
-  **Must NOT do**:
-  - Do NOT require real SVO files for unit tests (use synthetic data)
-  - Do NOT test CLI directly (that's integration testing)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Straightforward test implementation
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 3 (with Task 6)
-  - **Blocks**: None
-  - **Blocked By**: Tasks 2, 3
-
-  **References**:
-  - `py_workspace/tests/test_pose_math.py` - Existing test patterns
-  - `py_workspace/tests/test_pose_averaging.py` - More test patterns
-
-  **Acceptance Criteria**:
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: All depth unit tests pass
-    Tool: Bash
-    Steps:
-      1. cd /workspaces/zed-playground/py_workspace
-      2. uv run pytest tests/test_depth_verify.py tests/test_depth_refine.py -v
-    Expected Result: Exit code 0, all tests pass
-
-  Scenario: Test count is reasonable
-    Tool: Bash
-    Steps:
-      1. uv run pytest tests/test_depth_*.py --collect-only | grep "test_"
-    Expected Result: At least 8 tests collected
-  ```
-
-  **Commit**: YES
-  - Message: `test(aruco): add unit tests for depth verification and refinement`
-  - Files: `py_workspace/tests/test_depth_verify.py`, `py_workspace/tests/test_depth_refine.py`
-
---
-
-## Commit Strategy
-
-| After Task | Message | Files | Verification |
-|------------|---------|-------|--------------|
-| 1 | `feat(aruco): extend SVOReader with depth support` | svo_sync.py | python import test |
-| 2 | `feat(aruco): add depth verification module` | depth_verify.py | python import test |
-| 3 | `feat(aruco): add depth refinement module` | depth_refine.py | python import test |
-| 4 | `feat(cli): add depth flags` | calibrate_extrinsics.py | --help works |
-| 5 | `feat(cli): integrate depth verification` | calibrate_extrinsics.py | --verify-depth works |
-| 6 | `feat(cli): integrate depth refinement` | calibrate_extrinsics.py | --refine-depth works |
-| 7 | `test(aruco): add depth tests` | tests/test_depth_*.py | pytest passes |
-
---
-
-## Success Criteria
-
-### Verification Commands
-```bash
-# CLI shows new flags
-uv run calibrate_extrinsics.py --help  # Expected: shows --verify-depth, --refine-depth
-
-# Non-regression: without depth flags, behavior unchanged
-uv run calibrate_extrinsics.py --markers aruco/output/standard_box_markers.parquet --validate-markers  # Expected: exit 0
-
-# Depth verification works
-uv run calibrate_extrinsics.py --svo *.svo2 --markers aruco/output/standard_box_markers.parquet --output test.json --verify-depth --no-preview
-
-# Depth refinement works
-uv run calibrate_extrinsics.py --svo *.svo2 --markers aruco/output/standard_box_markers.parquet --output test.json --refine-depth --no-preview
-
-# Tests pass
-uv run pytest tests/test_depth_*.py -v  # Expected: all pass
-```
-
-### Final Checklist
- [x] All "Must Have" present
- [x] All "Must NOT Have" absent
- [x] All tests pass
- [x] CLI --help shows all new options
- [x] Output JSON includes depth_verify section when flag used
- [x] Output JSON includes refine_depth section when flag used
- [x] Refinement respects bounds (±5cm, ±5°)
- [x] Both pre/post refinement metrics reported
-
-#### Blocker Note
-Remaining unchecked items require an SVO dataset where ArUco markers are detected (current bundled SVOs appear to have 0 detections). See:
- `.sisyphus/notepads/depth-extrinsic-verify/issues.md`
- `.sisyphus/notepads/depth-extrinsic-verify/problems.md`
@@ -1,685 +0,0 @@
-# Robust Depth Refinement for Camera Extrinsics
-
-## TL;DR
-
-> **Quick Summary**: Replace the failing depth-based pose refinement pipeline with a robust optimizer (`scipy.optimize.least_squares` with soft-L1 loss), add unit hardening, confidence-weighted residuals, best-frame selection, rich diagnostics, and a benchmark matrix comparing configurations.
-> 
-> **Deliverables**:
-> - Unit-hardened depth retrieval (set `coordinate_units=METER`, guard double-conversion)
-> - Robust optimization objective using `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`
-> - Confidence-weighted depth residuals (toggleable via CLI flag)
-> - Best-frame selection replacing naive "latest valid frame"
-> - Rich optimizer diagnostics and acceptance gates
-> - Benchmark matrix comparing baseline/robust/+confidence/+best-frame
-> - Updated tests for all new functionality
-> 
-> **Estimated Effort**: Medium (3-4 hours implementation)
-> **Parallel Execution**: YES - 2 waves
-> **Critical Path**: Task 1 (units) → Task 2 (robust optimizer) → Task 3 (confidence) → Task 5 (diagnostics) → Task 6 (benchmark)
-
---
-
-## Context
-
-### Original Request
-Implement the 5 items from "Recommended Implementation Order" in `docs/calibrate-extrinsics-workflow.md`, plus research and choose the best optimization method for depth-based camera extrinsic refinement.
-
-### Interview Summary
-**Key Discussions**:
- Requirements were explicitly specified in the documentation (no interactive interview needed)
- Research confirmed `scipy.optimize.least_squares` is superior to `scipy.optimize.minimize` for this problem class
-
-**Research Findings**:
- **freemocap/anipose** (production multi-camera calibration) uses exactly `least_squares(method="trf", loss=loss, f_scale=threshold)` for bundle adjustment — validates our approach
- **scipy docs** recommend `soft_l1` or `huber` for robust fitting; `f_scale` controls the inlier/outlier threshold
- **Current output JSONs** confirm catastrophic failure: RMSE 5000+ meters (`aligned_refined_extrinsics_fast.json`), RMSE ~11.6m (`test_refine_current.json`), iterations=0/1, success=false across all cameras
- **Unit mismatch** still active despite `/1000.0` conversion — ZED defaults to mm, code divides by 1000, but no `coordinate_units=METER` set
- **Confidence map** retrieved but only used in verify filtering, not in optimizer objective
-
-### Metis Review
-**Identified Gaps** (addressed):
- Output JSON schema backward compatibility → New fields are additive only (existing fields preserved)
- Confidence weighting can interact with robust loss → Made toggleable, logged statistics
- Best-frame selection changes behavior → Deterministic scoring, old behavior available as fallback
- Zero valid points edge case → Explicit early exit with diagnostic
- Numerical pass/fail gate → Added RMSE threshold checks
- Regression guard → Default CLI behavior unchanged unless user opts into new features
-
---
-
-## Work Objectives
-
-### Core Objective
-Make depth-based extrinsic refinement actually work by fixing the unit mismatch, switching to a robust optimizer, incorporating confidence weighting, and selecting the best frame for refinement.
-
-### Concrete Deliverables
- Modified `aruco/svo_sync.py` with unit hardening
- Rewritten `aruco/depth_refine.py` using `least_squares` with robust loss
- Updated `aruco/depth_verify.py` with confidence weight extraction helper
- Updated `calibrate_extrinsics.py` with frame scoring, diagnostics, new CLI flags
- New and updated tests in `tests/`
- Updated `docs/calibrate-extrinsics-workflow.md` with new behavior docs
-
-### Definition of Done
- [x] `uv run pytest` passes with 0 failures
- [x] Synthetic test: robust optimizer converges (success=True, nfev > 1) with injected outliers
- [x] Existing tests still pass (backward compatibility)
- [x] Benchmark matrix produces 4 comparable result records
-
-### Must Have
- `coordinate_units = sl.UNIT.METER` set in SVOReader
- `least_squares` with `loss="soft_l1"` and `f_scale=0.1` as default optimizer
- Confidence weighting via `--use-confidence-weights` flag
- Best-frame selection with deterministic scoring
- Optimizer diagnostics in output JSON and logs
- All changes covered by automated tests
-
-### Must NOT Have (Guardrails)
- Must NOT change unrelated calibration logic (marker detection, PnP, pose averaging, alignment)
- Must NOT change file I/O formats or break JSON schema (only additive fields)
- Must NOT introduce new dependencies beyond scipy/numpy already in use
- Must NOT implement multi-optimizer auto-selection or hyperparameter search
- Must NOT turn frame scoring into a ML quality model — simple weighted heuristic only
- Must NOT add premature abstractions or over-engineer the API
- Must NOT remove existing CLI flags or change their default behavior
-
---
-
-## Verification Strategy
-
-> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
->
-> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
-> Every criterion is verified by running `uv run pytest` or inspecting code.
-
-### Test Decision
- **Infrastructure exists**: YES (pytest configured in pyproject.toml, tests/ directory)
- **Automated tests**: YES (tests-after, matching existing project pattern)
- **Framework**: pytest (via `uv run pytest`)
-
-### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
-
-**Verification Tool by Deliverable Type:**
-
-| Type | Tool | How Agent Verifies |
-|------|------|-------------------|
-| Python module changes | Bash (`uv run pytest`) | Run tests, assert 0 failures |
-| New functions | Bash (`uv run pytest -k test_name`) | Run specific test, assert pass |
-| CLI behavior | Bash (`uv run python calibrate_extrinsics.py --help`) | Verify new flags present |
-
---
-
-## Execution Strategy
-
-### Parallel Execution Waves
-
-```
-Wave 1 (Start Immediately):
-├── Task 1: Unit hardening (svo_sync.py) [no dependencies]
-└── Task 4: Best-frame selection (calibrate_extrinsics.py) [no dependencies]
-
-Wave 2 (After Wave 1):
-├── Task 2: Robust optimizer (depth_refine.py) [depends: 1]
-├── Task 3: Confidence weighting (depth_verify.py + depth_refine.py) [depends: 2]
-└── Task 5: Diagnostics and acceptance gates [depends: 2]
-
-Wave 3 (After Wave 2):
-└── Task 6: Benchmark matrix [depends: 2, 3, 4, 5]
-
-Wave 4 (After All):
-└── Task 7: Documentation update [depends: all]
-
-Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 6
-```
-
-### Dependency Matrix
-
-| Task | Depends On | Blocks | Can Parallelize With |
-|------|------------|--------|---------------------|
-| 1 | None | 2, 3 | 4 |
-| 2 | 1 | 3, 5, 6 | - |
-| 3 | 2 | 6 | 5 |
-| 4 | None | 6 | 1 |
-| 5 | 2 | 6 | 3 |
-| 6 | 2, 3, 4, 5 | 7 | - |
-| 7 | All | None | - |
-
-### Agent Dispatch Summary
-
-| Wave | Tasks | Recommended Agents |
-|------|-------|-------------------|
-| 1 | 1, 4 | `category="quick"` for T1; `category="unspecified-low"` for T4 |
-| 2 | 2, 3, 5 | `category="deep"` for T2; `category="quick"` for T3, T5 |
-| 3 | 6 | `category="unspecified-low"` |
-| 4 | 7 | `category="writing"` |
-
---
-
-## TODOs
-
- [x] 1. Unit Hardening (P0)
-
-  **What to do**:
-  - In `aruco/svo_sync.py`, add `init_params.coordinate_units = sl.UNIT.METER` in the `SVOReader.__init__` method, right after `init_params.set_from_svo_file(path)` (around line 42)
-  - Guard the existing `/1000.0` conversion: check whether `coordinate_units` is already METER. If METER is set, skip the division. If not set or MILLIMETER, apply the division. Add a log warning if division is applied as fallback
-  - Add depth sanity logging under `--debug` mode: after retrieving depth, log `min/median/max/p95` of valid depth values. This goes in the `_retrieve_depth` method
-  - Write a test that verifies the unit-hardened path doesn't double-convert
-
-  **Must NOT do**:
-  - Do NOT change depth retrieval for confidence maps
-  - Do NOT modify the `grab_synced()` or `grab_all()` methods
-  - Do NOT add new CLI parameters for this task
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Small, focused change in one file + one test file
-  - **Skills**: [`git-master`]
-    - `git-master`: Atomic commit of unit hardening change
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 4)
-  - **Blocks**: Tasks 2, 3
-  - **Blocked By**: None
-
-  **References**:
-
-  **Pattern References** (existing code to follow):
-  - `aruco/svo_sync.py:40-44` — Current `init_params` setup where `coordinate_units` must be added
-  - `aruco/svo_sync.py:180-189` — Current `_retrieve_depth` method with `/1000.0` conversion to modify
-  - `aruco/svo_sync.py:191-196` — Confidence retrieval pattern (do NOT modify, but understand adjacency)
-
-  **API/Type References** (contracts to implement against):
-  - ZED SDK `InitParameters.coordinate_units` — Set to `sl.UNIT.METER`
-  - `loguru.logger` — Used project-wide for debug logging
-
-  **Test References** (testing patterns to follow):
-  - `tests/test_depth_verify.py:36-66` — Test pattern using synthetic depth maps (follow this style)
-  - `tests/test_depth_refine.py:21-39` — Test pattern with synthetic K matrix and depth maps
-
-  **Documentation References**:
-  - `docs/calibrate-extrinsics-workflow.md:116-132` — Documents the unit mismatch problem and mitigation strategy
-  - `docs/calibrate-extrinsics-workflow.md:166-169` — Specifies the exact implementation steps for unit hardening
-
-  **Acceptance Criteria**:
-
-  - [ ] `init_params.coordinate_units = sl.UNIT.METER` is set in SVOReader.__init__ before `cam.open()`
-  - [ ] The `/1000.0` division in `_retrieve_depth` is guarded (only applied if units are NOT meters)
-  - [ ] Debug logging of depth statistics (min/median/max) is added to `_retrieve_depth` when depth mode is active
-  - [ ] `uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q` → all pass (no regressions)
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Verify unit hardening doesn't break existing tests
-    Tool: Bash (uv run pytest)
-    Preconditions: All dependencies installed
-    Steps:
-      1. Run: uv run pytest tests/test_depth_refine.py tests/test_depth_verify.py -q
-      2. Assert: exit code 0
-      3. Assert: output contains "passed" and no "FAILED"
-    Expected Result: All existing tests pass
-    Evidence: Terminal output captured
-
-  Scenario: Verify coordinate_units is set in code
-    Tool: Bash (grep)
-    Preconditions: File modified
-    Steps:
-      1. Run: grep -n "coordinate_units" aruco/svo_sync.py
-      2. Assert: output contains "UNIT.METER" or "METER"
-    Expected Result: Unit setting is present
-    Evidence: Grep output
-  ```
-
-  **Commit**: YES
-  - Message: `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion`
-  - Files: `aruco/svo_sync.py`, `tests/test_depth_refine.py`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
- [x] 2. Robust Optimizer — Replace MSE with `least_squares` + Soft-L1 Loss (P0)
-
-  **What to do**:
-  - **Rewrite `depth_residual_objective`** → Replace with a **residual vector function** `depth_residuals(params, ...)` that returns an array of residuals (not a scalar cost). Each element is `(z_measured - z_predicted)` for one marker corner. This is what `least_squares` expects.
-  - **Add regularization as pseudo-residuals**: Append `[reg_weight_rot * delta_rvec, reg_weight_trans * delta_tvec]` to the residual vector. This naturally penalizes deviation from the initial pose. Split into separate rotation and translation regularization weights (default: `reg_rot=0.1`, `reg_trans=1.0` — translation more tightly regularized in meters scale).
-  - **Replace `minimize(method="L-BFGS-B")` with `least_squares(method="trf", loss="soft_l1", f_scale=0.1)`**:
-    - `method="trf"` — Trust Region Reflective, handles bounds naturally
-    - `loss="soft_l1"` — Smooth robust loss, downweights outliers beyond `f_scale`
-    - `f_scale=0.1` — Residuals >0.1m are treated as outliers (matches ZED depth noise ~1-5cm)
-    - `bounds` — Same ±5°/±5cm bounds, expressed as `(lower_bounds_array, upper_bounds_array)` tuple
-    - `x_scale="jac"` — Automatic Jacobian-based scaling (prevents ill-conditioning)
-    - `max_nfev=200` — Maximum function evaluations
-  - **Update `refine_extrinsics_with_depth` signature**: Add parameters for `loss`, `f_scale`, `reg_rot`, `reg_trans`. Keep backward-compatible defaults. Return enriched stats dict including: `termination_message`, `nfev`, `optimality`, `active_mask`, `cost`.
-  - **Handle zero residuals**: If residual vector is empty (no valid depth points), return initial pose unchanged with stats indicating `"reason": "no_valid_depth_points"`.
-  - **Maintain backward-compatible scalar cost reporting**: Compute `initial_cost` and `final_cost` from the residual vector for comparison with old output format.
-
-  **Must NOT do**:
-  - Do NOT change `extrinsics_to_params` or `params_to_extrinsics` (the Rodrigues parameterization is correct)
-  - Do NOT modify `depth_verify.py` in this task
-  - Do NOT add confidence weighting here (that's Task 3)
-  - Do NOT add CLI flags here (that's Task 5)
-
-  **Recommended Agent Profile**:
-  - **Category**: `deep`
-    - Reason: Core algorithmic change, requires understanding of optimization theory and careful residual construction
-  - **Skills**: []
-    - No specialized skills needed — pure Python/numpy/scipy work
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 2 (sequential after Wave 1)
-  - **Blocks**: Tasks 3, 5, 6
-  - **Blocked By**: Task 1
-
-  **References**:
-
-  **Pattern References** (existing code to follow):
-  - `aruco/depth_refine.py:19-47` — Current `depth_residual_objective` function to REPLACE
-  - `aruco/depth_refine.py:50-112` — Current `refine_extrinsics_with_depth` function to REWRITE
-  - `aruco/depth_refine.py:1-16` — Import block and helper functions (keep `extrinsics_to_params`, `params_to_extrinsics`)
-  - `aruco/depth_verify.py:27-67` — `compute_depth_residual` function — this is the per-point residual computation called from the objective. Understand its contract: returns `float(z_measured - z_predicted)` or `None`.
-
-  **API/Type References**:
-  - `scipy.optimize.least_squares` — [scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html): `fun(x, *args) -> residuals_array`; parameters: `method="trf"`, `loss="soft_l1"`, `f_scale=0.1`, `bounds=(lb, ub)`, `x_scale="jac"`, `max_nfev=200`
-  - Return type: `OptimizeResult` with attributes: `.x`, `.cost`, `.fun`, `.jac`, `.grad`, `.optimality`, `.active_mask`, `.nfev`, `.njev`, `.status`, `.message`, `.success`
-
-  **External References** (production examples):
-  - `freemocap/anipose` bundle_adjust method — Uses `least_squares(error_fun, x0, jac_sparsity=jac_sparse, f_scale=f_scale, x_scale="jac", loss=loss, ftol=ftol, method="trf", tr_solver="lsmr")` for multi-camera calibration. Key pattern: residual function returns per-point reprojection errors.
-  - scipy Context7 docs — Example shows `least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train))` where `fun` returns residual vector
-
-  **Test References**:
-  - `tests/test_depth_refine.py` — ALL 4 existing tests must still pass. They test: roundtrip, no-change convergence, offset correction, and bounds respect. The new optimizer must satisfy these same properties.
-
-  **Acceptance Criteria**:
-
-  - [ ] `from scipy.optimize import least_squares` replaces `from scipy.optimize import minimize`
-  - [ ] `depth_residuals()` returns `np.ndarray` (vector), not scalar float
-  - [ ] `least_squares(method="trf", loss="soft_l1", f_scale=0.1)` is the optimizer call
-  - [ ] Regularization is split: separate `reg_rot` and `reg_trans` weights, appended as pseudo-residuals
-  - [ ] Stats dict includes: `termination_message`, `nfev`, `optimality`, `cost`
-  - [ ] Zero-residual case returns initial pose with `reason: "no_valid_depth_points"`
-  - [ ] `uv run pytest tests/test_depth_refine.py -q` → all 4 existing tests pass
-  - [ ] New test: synthetic data with 30% outlier depths → robust optimizer converges (success=True, nfev > 1) with lower median residual than would occur with pure MSE
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: All existing depth_refine tests pass after rewrite
-    Tool: Bash (uv run pytest)
-    Preconditions: Task 1 completed, aruco/depth_refine.py rewritten
-    Steps:
-      1. Run: uv run pytest tests/test_depth_refine.py -v
-      2. Assert: exit code 0
-      3. Assert: output contains "4 passed"
-    Expected Result: All 4 existing tests pass
-    Evidence: Terminal output captured
-
-  Scenario: Robust optimizer handles outliers better than MSE
-    Tool: Bash (uv run pytest)
-    Preconditions: New test added
-    Steps:
-      1. Run: uv run pytest tests/test_depth_refine.py::test_robust_loss_handles_outliers -v
-      2. Assert: exit code 0
-      3. Assert: test passes
-    Expected Result: With 30% outliers, robust optimizer has lower median abs residual
-    Evidence: Terminal output captured
-  ```
-
-  **Commit**: YES
-  - Message: `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer`
-  - Files: `aruco/depth_refine.py`, `tests/test_depth_refine.py`
-  - Pre-commit: `uv run pytest tests/test_depth_refine.py -q`
-
---
-
- [x] 3. Confidence-Weighted Depth Residuals (P0)
-
-  **What to do**:
-  - **Add confidence weight extraction helper** to `aruco/depth_verify.py`: Create a function `get_confidence_weight(confidence_map, u, v, confidence_thresh=50) -> float` that returns a normalized weight in [0, 1]. ZED confidence: [1, 100] where higher = LESS confident. Normalize as `max(0, (confidence_thresh - conf_value)) / confidence_thresh`. Values above threshold → weight 0. Clamp to `[eps, 1.0]` where eps=1e-6.
-  - **Update `depth_residuals()` in `aruco/depth_refine.py`**: Accept optional `confidence_map` and `confidence_thresh` parameters. If confidence_map is provided, multiply each depth residual by `sqrt(weight)` before returning. This implements weighted least squares within the `least_squares` framework.
-  - **Update `refine_extrinsics_with_depth` signature**: Add `confidence_map=None`, `confidence_thresh=50` parameters. Pass through to `depth_residuals()`.
-  - **Update `calibrate_extrinsics.py`**: Pass `confidence_map=frame.confidence_map` and `confidence_thresh=depth_confidence_threshold` to `refine_extrinsics_with_depth` when confidence weighting is requested
-  - **Add `--use-confidence-weights/--no-confidence-weights` CLI flag** (default: False for backward compatibility)
-  - **Log confidence statistics** under `--debug`: After computing weights, log `n_zero_weight`, `mean_weight`, `median_weight`
-
-  **Must NOT do**:
-  - Do NOT change the verification logic in `verify_extrinsics_with_depth` (it already uses confidence correctly)
-  - Do NOT change confidence semantics (higher ZED value = less confident)
-  - Do NOT make confidence weighting the default behavior
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Adding parameters and weight multiplication — straightforward plumbing
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO (depends on Task 2)
-  - **Parallel Group**: Wave 2 (after Task 2)
-  - **Blocks**: Task 6
-  - **Blocked By**: Task 2
-
-  **References**:
-
-  **Pattern References**:
-  - `aruco/depth_verify.py:82-96` — Existing confidence handling pattern (filtering, NOT weighting). Follow this semantics but produce a continuous weight instead of binary skip
-  - `aruco/depth_verify.py:93-95` — ZED confidence semantics: "Higher confidence value means LESS confident... Range [1, 100], where 100 is typically occlusion/invalid"
-  - `aruco/depth_refine.py` — Updated in Task 2 with `depth_residuals()` function. Add `confidence_map` parameter here
-  - `calibrate_extrinsics.py:136-148` — Current call site for `refine_extrinsics_with_depth`. Add confidence_map/thresh forwarding
-
-  **Test References**:
-  - `tests/test_depth_verify.py:69-84` — Test pattern for `compute_marker_corner_residuals`. Follow for confidence weight test
-
-  **Acceptance Criteria**:
-
-  - [ ] `get_confidence_weight()` function exists in `depth_verify.py`
-  - [ ] Confidence weighting is off by default (backward compatible)
-  - [ ] `--use-confidence-weights` flag exists in CLI
-  - [ ] Low-confidence points have lower influence on optimization (verified by test)
-  - [ ] `uv run pytest tests/ -q` → all pass
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Confidence weighting reduces outlier influence
-    Tool: Bash (uv run pytest)
-    Steps:
-      1. Run: uv run pytest tests/test_depth_refine.py::test_confidence_weighting -v
-      2. Assert: exit code 0
-    Expected Result: With low-confidence outlier points, weighted optimizer ignores them
-    Evidence: Terminal output
-
-  Scenario: CLI flag exists
-    Tool: Bash
-    Steps:
-      1. Run: uv run python calibrate_extrinsics.py --help | grep -i confidence-weight
-      2. Assert: output contains "--use-confidence-weights"
-    Expected Result: Flag is available
-    Evidence: Help text
-  ```
-
-  **Commit**: YES
-  - Message: `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag`
-  - Files: `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
- [x] 4. Best-Frame Selection (P1)
-
-  **What to do**:
-  - **Create `score_frame_quality()` function** in `calibrate_extrinsics.py` (or a new `aruco/frame_scoring.py` if cleaner). The function takes: `n_markers: int`, `reproj_error: float`, `depth_map: np.ndarray`, `marker_corners_world: Dict[int, np.ndarray]`, `T_world_cam: np.ndarray`, `K: np.ndarray` and returns a float score (higher = better).
-  - **Scoring formula**: `score = w_markers * n_markers + w_reproj * (1 / (reproj_error + eps)) + w_depth * valid_depth_ratio`
-    - `w_markers = 1.0` — more markers = better constraint
-    - `w_reproj = 5.0` — lower reprojection error = more accurate PnP
-    - `w_depth = 3.0` — higher ratio of valid depth at marker locations = better depth signal
-    - `valid_depth_ratio = n_valid_depths / n_total_corners`
-    - `eps = 1e-6` to avoid division by zero
-  - **Replace "last valid frame" logic** in `calibrate_extrinsics.py`: Instead of overwriting `verification_frames[serial]` every time (line 467-471), track ALL valid frames per camera with their scores. After the processing loop, select the frame with the highest score.
-  - **Log selected frame**: Under `--debug`, log the chosen frame index, score, and component breakdown for each camera
-  - **Ensure deterministic tiebreaking**: If scores are equal, pick the frame with the lower frame_index (earliest)
-  - **Keep frame storage bounded**: Store at most `max_stored_frames=10` candidates per camera (configurable), keeping the top-scoring ones
-
-  **Must NOT do**:
-  - Do NOT add ML-based frame scoring
-  - Do NOT change the frame grabbing/syncing logic
-  - Do NOT add new dependencies
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: New functionality but straightforward heuristic
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 1)
-  - **Blocks**: Task 6
-  - **Blocked By**: None
-
-  **References**:
-
-  **Pattern References**:
-  - `calibrate_extrinsics.py:463-471` — Current "last valid frame" logic to REPLACE. Currently: `verification_frames[serial] = {"frame": frame, "ids": ids, "corners": corners}`
-  - `calibrate_extrinsics.py:452-478` — Full frame processing context (pose estimation, accumulation, frame caching)
-  - `aruco/depth_verify.py:27-67` — `compute_depth_residual` can be used to check valid depth at marker locations for scoring
-
-  **Test References**:
-  - `tests/test_depth_cli_postprocess.py` — Test pattern for calibrate_extrinsics functions
-
-  **Acceptance Criteria**:
-
-  - [ ] `score_frame_quality()` function exists and returns a float
-  - [ ] Best frame is selected (not last frame) for each camera
-  - [ ] Scoring is deterministic (same inputs → same selected frame)
-  - [ ] Frame selection metadata is logged under `--debug`
-  - [ ] `uv run pytest tests/ -q` → all pass (no regressions)
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Frame scoring is deterministic
-    Tool: Bash (uv run pytest)
-    Steps:
-      1. Run: uv run pytest tests/test_frame_scoring.py -v
-      2. Assert: exit code 0
-    Expected Result: Same inputs always produce same score and selection
-    Evidence: Terminal output
-
-  Scenario: Higher marker count increases score
-    Tool: Bash (uv run pytest)
-    Steps:
-      1. Run: uv run pytest tests/test_frame_scoring.py::test_more_markers_higher_score -v
-      2. Assert: exit code 0
-    Expected Result: Frame with more markers scores higher
-    Evidence: Terminal output
-  ```
-
-  **Commit**: YES
-  - Message: `feat(calibrate): replace naive frame selection with quality-scored best-frame`
-  - Files: `calibrate_extrinsics.py`, `tests/test_frame_scoring.py`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
- [x] 5. Diagnostics and Acceptance Gates (P1)
-
-  **What to do**:
-  - **Enrich `refine_extrinsics_with_depth` stats dict**: The `least_squares` result (from Task 2) already provides `.status`, `.message`, `.nfev`, `.njev`, `.optimality`, `.active_mask`. Surface these in the returned stats dict as: `termination_status` (int), `termination_message` (str), `nfev` (int), `njev` (int), `optimality` (float), `n_active_bounds` (int, count of parameters at bound limits).
-  - **Add effective valid points count**: Log how many marker corners had valid (finite, positive) depth, and how many were used after confidence filtering. Add to stats: `n_depth_valid`, `n_confidence_filtered`.
-  - **Add RMSE improvement gate**: If `improvement_rmse < 1e-4` AND `nfev > 5`, log WARNING: "Refinement converged with negligible improvement — consider checking depth data quality"
-  - **Add failure diagnostic**: If `success == False` or `nfev <= 1`, log WARNING with termination message and suggest checking depth unit consistency
-  - **Log optimizer progress under `--debug`**: Before and after optimization, log: initial cost, final cost, delta_rotation, delta_translation, termination message, number of function evaluations
-  - **Surface diagnostics in JSON output**: Add fields to `refine_depth` dict in output JSON: `termination_status`, `termination_message`, `nfev`, `n_valid_points`, `loss_function`, `f_scale`
-
-  **Must NOT do**:
-  - Do NOT add automated "redo with different params" logic
-  - Do NOT add email/notification alerts
-  - Do NOT change the optimization algorithm or parameters (already done in Task 2)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Adding logging and dict fields — no algorithmic changes
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES (with Task 3)
-  - **Parallel Group**: Wave 2
-  - **Blocks**: Task 6
-  - **Blocked By**: Task 2
-
-  **References**:
-
-  **Pattern References**:
-  - `aruco/depth_refine.py:103-111` — Current stats dict construction (to EXTEND, not replace)
-  - `calibrate_extrinsics.py:159-181` — Current refinement result logging and JSON field assignment
-  - `loguru.logger` — Project uses loguru for structured logging
-
-  **API/Type References**:
-  - `scipy.optimize.OptimizeResult` — `.status` (int: 1=convergence, 0=max_nfev, -1=improper), `.message` (str), `.nfev`, `.njev`, `.optimality` (gradient infinity norm)
-
-  **Acceptance Criteria**:
-
-  - [ ] Stats dict contains: `termination_status`, `termination_message`, `nfev`, `n_valid_points`
-  - [ ] Output JSON `refine_depth` section contains diagnostic fields
-  - [ ] WARNING log emitted when improvement < 1e-4 with nfev > 5
-  - [ ] WARNING log emitted when success=False or nfev <= 1
-  - [ ] `uv run pytest tests/ -q` → all pass
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Diagnostics present in refine stats
-    Tool: Bash (uv run pytest)
-    Steps:
-      1. Run: uv run pytest tests/test_depth_refine.py -v
-      2. Assert: All tests pass
-      3. Check that stats dict from refine function contains "termination_message" key
-    Expected Result: Diagnostics are in stats output
-    Evidence: Terminal output
-  ```
-
-  **Commit**: YES
-  - Message: `feat(refine): add rich optimizer diagnostics and acceptance gates`
-  - Files: `aruco/depth_refine.py`, `calibrate_extrinsics.py`, `tests/test_depth_refine.py`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
- [x] 6. Benchmark Matrix (P1)
-
-  **What to do**:
-  - **Add `--benchmark-matrix` flag** to `calibrate_extrinsics.py` CLI
-  - **When enabled**, run the depth refinement pipeline 4 times per camera with different configurations:
-    1. **baseline**: `loss="linear"` (no robust loss), no confidence weights
-    2. **robust**: `loss="soft_l1"`, `f_scale=0.1`, no confidence weights
-    3. **robust+confidence**: `loss="soft_l1"`, `f_scale=0.1`, confidence weighting ON
-    4. **robust+confidence+best-frame**: Same as #3 but using best-frame selection
-  - **Output**: For each configuration, report per-camera: pre-refinement RMSE, post-refinement RMSE, improvement, iteration count, success/failure, termination reason
-  - **Format**: Print a formatted table to stdout (using click.echo) AND save to a benchmark section in the output JSON
-  - **Implementation**: Create a helper function `run_benchmark_matrix(T_initial, marker_corners_world, depth_map, K, confidence_map, ...)` that returns a list of result dicts
-
-  **Must NOT do**:
-  - Do NOT implement automated configuration tuning
-  - Do NOT add visualization/plotting dependencies
-  - Do NOT change the default (non-benchmark) codepath behavior
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Orchestration code, calling existing functions with different params
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO (depends on all previous tasks)
-  - **Parallel Group**: Wave 3 (after all)
-  - **Blocks**: Task 7
-  - **Blocked By**: Tasks 2, 3, 4, 5
-
-  **References**:
-
-  **Pattern References**:
-  - `calibrate_extrinsics.py:73-196` — `apply_depth_verify_refine_postprocess` function. The benchmark matrix calls this logic with varied parameters
-  - `aruco/depth_refine.py` — Updated `refine_extrinsics_with_depth` with `loss`, `f_scale`, `confidence_map` params
-
-  **Acceptance Criteria**:
-
-  - [ ] `--benchmark-matrix` flag exists in CLI
-  - [ ] When enabled, 4 configurations are run per camera
-  - [ ] Output table is printed to stdout
-  - [ ] Benchmark results are in output JSON under `benchmark` key
-  - [ ] `uv run pytest tests/ -q` → all pass
-
-  **Agent-Executed QA Scenarios:**
-
-  ```
-  Scenario: Benchmark flag in CLI help
-    Tool: Bash
-    Steps:
-      1. Run: uv run python calibrate_extrinsics.py --help | grep benchmark
-      2. Assert: output contains "--benchmark-matrix"
-    Expected Result: Flag is present
-    Evidence: Help text output
-  ```
-
-  **Commit**: YES
-  - Message: `feat(calibrate): add --benchmark-matrix for comparing refinement configurations`
-  - Files: `calibrate_extrinsics.py`, `tests/test_benchmark.py`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
- [x] 7. Documentation Update
-
-  **What to do**:
-  - Update `docs/calibrate-extrinsics-workflow.md`:
-    - Add new CLI flags: `--use-confidence-weights`, `--benchmark-matrix`
-    - Update "Depth Verification & Refinement" section with new optimizer details
-    - Update "Refinement" section: document `least_squares` with `soft_l1` loss, `f_scale`, confidence weighting
-    - Add "Best-Frame Selection" section explaining the scoring formula
-    - Add "Diagnostics" section documenting new output JSON fields
-    - Update "Example Workflow" commands to show new flags
-    - Mark the "Known Unexpected Behavior" unit mismatch section as RESOLVED with the fix description
-
-  **Must NOT do**:
-  - Do NOT rewrite unrelated documentation sections
-  - Do NOT add tutorial-style content
-
-  **Recommended Agent Profile**:
-  - **Category**: `writing`
-    - Reason: Pure documentation writing
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Wave 4 (final)
-  - **Blocks**: None
-  - **Blocked By**: All previous tasks
-
-  **References**:
-
-  **Pattern References**:
-  - `docs/calibrate-extrinsics-workflow.md` — Entire file. Follow existing section structure and formatting
-
-  **Acceptance Criteria**:
-
-  - [ ] New CLI flags documented
-  - [ ] `least_squares` optimizer documented with parameter explanations
-  - [ ] Best-frame selection documented
-  - [ ] Unit mismatch section updated as resolved
-  - [ ] Example commands include new flags
-
-  **Commit**: YES
-  - Message: `docs: update calibrate-extrinsics-workflow for robust refinement changes`
-  - Files: `docs/calibrate-extrinsics-workflow.md`
-  - Pre-commit: `uv run pytest tests/ -q`
-
---
-
-## Commit Strategy
-
-| After Task | Message | Files | Verification |
-|------------|---------|-------|--------------|
-| 1 | `fix(svo): harden depth units — set coordinate_units=METER, guard /1000 conversion` | `aruco/svo_sync.py`, tests | `uv run pytest tests/ -q` |
-| 2 | `feat(refine): replace L-BFGS-B MSE with least_squares soft-L1 robust optimizer` | `aruco/depth_refine.py`, tests | `uv run pytest tests/ -q` |
-| 3 | `feat(refine): add confidence-weighted depth residuals with --use-confidence-weights flag` | `aruco/depth_verify.py`, `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
-| 4 | `feat(calibrate): replace naive frame selection with quality-scored best-frame` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
-| 5 | `feat(refine): add rich optimizer diagnostics and acceptance gates` | `aruco/depth_refine.py`, `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
-| 6 | `feat(calibrate): add --benchmark-matrix for comparing refinement configurations` | `calibrate_extrinsics.py`, tests | `uv run pytest tests/ -q` |
-| 7 | `docs: update calibrate-extrinsics-workflow for robust refinement changes` | `docs/calibrate-extrinsics-workflow.md` | `uv run pytest tests/ -q` |
-
---
-
-## Success Criteria
-
-### Verification Commands
-```bash
-uv run pytest tests/ -q                    # Expected: all pass, 0 failures
-uv run pytest tests/test_depth_refine.py -v  # Expected: all tests pass including new robust/confidence tests
-```
-
-### Final Checklist
- [x] All "Must Have" items present
- [x] All "Must NOT Have" items absent
- [x] All tests pass (`uv run pytest tests/ -q`)
- [x] Output JSON backward compatible (existing fields preserved, new fields additive)
- [x] Default CLI behavior unchanged (new features opt-in)
- [x] Optimizer actually converges on synthetic test data (success=True, nfev > 1)
@@ -1,393 +0,0 @@
-# Ground Plane Detection and Auto-Alignment
-
-## TL;DR
-
-> **Quick Summary**: Add ground plane detection and optional world-frame alignment to `calibrate_extrinsics.py` so the output coordinate system always has Y-up, regardless of how the calibration box is placed.
-> 
-> **Deliverables**:
-> - New `aruco/alignment.py` module with ground detection and alignment utilities
-> - CLI options: `--auto-align`, `--ground-face`, `--ground-marker-id`
-> - Face metadata in marker parquet files (or hardcoded mapping)
-> - Debug logs for alignment decisions
-> 
-> **Estimated Effort**: Medium
-> **Parallel Execution**: NO - sequential (dependencies between tasks)
-> **Critical Path**: Task 1 → Task 2 → Task 3 → Task 4 → Task 5
-
---
-
-## Context
-
-### Original Request
-User wants to detect which side of the calibration box is on the ground and auto-align the world frame so Y is always up, matching the ZED convention seen in `inside_network.json`.
-
-### Interview Summary
-**Key Discussions**:
- Ground detection: support both heuristic (camera up-vector) AND user-specified (face name or marker ID)
- Alignment: opt-in via `--auto-align` flag (default OFF)
- Y-up convention confirmed from reference calibration
-
-**Research Findings**:
- `inside_network.json` shows Y-up convention (cameras at Y ≈ -1.2m)
- Camera 41831756 has identity rotation → its axes match world axes
- Marker parquet contains face names and corner coordinates
- Face normals can be computed from corners: `cross(c1-c0, c3-c0)`
- `object_points.parquet`: 3 faces (a, b, c) with 4 markers each
- `standard_box_markers.parquet`: 6 faces with 1 marker each (21=bottom)
-
---
-
-## Work Objectives
-
-### Core Objective
-Enable `calibrate_extrinsics.py` to detect the ground-facing box face and apply a corrective rotation so the output world frame has Y pointing up.
-
-### Concrete Deliverables
- `aruco/alignment.py`: Ground detection and alignment utilities
- Updated `calibrate_extrinsics.py` with new CLI options
- Updated marker parquet files with face metadata (optional enhancement)
-
-### Definition of Done
- [x] `uv run calibrate_extrinsics.py --auto-align ...` produces extrinsics with Y-up
- [x] `--ground-face` and `--ground-marker-id` work as explicit overrides
- [x] Debug logs show which face was detected as ground and alignment applied
- [x] Tests pass, basedpyright shows 0 errors
-
-### Must Have
- Heuristic ground detection using camera up-vector
- User override via `--ground-face` or `--ground-marker-id`
- Alignment rotation applied to all camera poses
- Debug logging for alignment decisions
-
-### Must NOT Have (Guardrails)
- Do NOT modify marker parquet file format (use code-level face mapping for now)
- Do NOT change behavior when `--auto-align` is not specified
- Do NOT assume IMU/gravity data is available
- Do NOT break existing calibration workflow
-
---
-
-## Verification Strategy
-
-> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
-> All tasks verifiable by agent using tools.
-
-### Test Decision
- **Infrastructure exists**: YES (pytest)
- **Automated tests**: YES (tests-after)
- **Framework**: pytest
-
-### Agent-Executed QA Scenarios (MANDATORY)
-
-**Scenario: Auto-align with heuristic detection**
-```
-Tool: Bash
-Steps:
-  1. uv run calibrate_extrinsics.py --svo output --markers aruco/markers/object_points.parquet --aruco-dictionary DICT_APRILTAG_36h11 --auto-align --no-preview --sample-interval 100
-  2. Parse output JSON
-  3. Assert: All camera poses have rotation matrices where Y-axis column ≈ [0, 1, 0] (within tolerance)
-Expected Result: Extrinsics aligned to Y-up
-```
-
-**Scenario: Explicit ground face override**
-```
-Tool: Bash
-Steps:
-  1. uv run calibrate_extrinsics.py --svo output --markers aruco/markers/object_points.parquet --aruco-dictionary DICT_APRILTAG_36h11 --auto-align --ground-face b --no-preview --sample-interval 100
-  2. Check debug logs mention "using specified ground face: b"
-Expected Result: Uses face 'b' as ground regardless of heuristic
-```
-
-**Scenario: No alignment when flag omitted**
-```
-Tool: Bash
-Steps:
-  1. uv run calibrate_extrinsics.py --svo output --markers aruco/markers/object_points.parquet --aruco-dictionary DICT_APRILTAG_36h11 --no-preview --sample-interval 100
-  2. Compare output to previous run without --auto-align
-Expected Result: Output unchanged from current behavior
-```
-
---
-
-## Execution Strategy
-
-### Dependency Chain
-```
-Task 1: Create alignment module
-    ↓
-Task 2: Add face-to-normal mapping
-    ↓
-Task 3: Implement ground detection heuristic
-    ↓
-Task 4: Add CLI options and integrate
-    ↓
-Task 5: Add tests and verify
-```
-
---
-
-## TODOs
-
- [x] 1. Create `aruco/alignment.py` module with core utilities
-
-  **What to do**:
-  - Create new file `aruco/alignment.py`
-  - Implement `compute_face_normal(corners: np.ndarray) -> np.ndarray`: compute unit normal from (4,3) corners
-  - Implement `rotation_align_vectors(from_vec: np.ndarray, to_vec: np.ndarray) -> np.ndarray`: compute 3x3 rotation matrix that aligns `from_vec` to `to_vec` using Rodrigues formula
-  - Implement `apply_alignment_to_pose(T: np.ndarray, R_align: np.ndarray) -> np.ndarray`: apply alignment rotation to 4x4 pose matrix
-  - Add type hints and docstrings
-
-  **Must NOT do**:
-  - Do not add CLI logic here (that's Task 4)
-  - Do not hardcode face mappings here (that's Task 2)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-  - **Skills**: [`git-master`]
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Blocks**: Task 2, 3, 4
-
-  **References**:
-  - `aruco/pose_math.py` - Similar matrix utilities (rvec_tvec_to_matrix, invert_transform)
-  - `aruco/marker_geometry.py` - Pattern for utility modules
-  - Rodrigues formula: `R = I + sin(θ)K + (1-cos(θ))K²` where K is skew-symmetric of axis
-
-  **Acceptance Criteria**:
-  - [x] File `aruco/alignment.py` exists
-  - [x] `compute_face_normal` returns unit vector for valid (4,3) corners
-  - [x] `rotation_align_vectors([0,0,1], [0,1,0])` produces 90° rotation about X
-  - [x] `uv run python -c "from aruco.alignment import compute_face_normal, rotation_align_vectors, apply_alignment_to_pose"` → no errors
-  - [x] `.venv/bin/basedpyright aruco/alignment.py` → 0 errors
-
-  **Commit**: YES
-  - Message: `feat(aruco): add alignment utilities for ground plane detection`
-  - Files: `aruco/alignment.py`
-
---
-
- [x] 2. Add face-to-marker-id mapping
-
-  **What to do**:
-  - In `aruco/alignment.py`, add `FACE_MARKER_MAP` constant:
-    ```python
-    FACE_MARKER_MAP: dict[str, list[int]] = {
-        # object_points.parquet
-        "a": [16, 17, 18, 19],
-        "b": [20, 21, 22, 23],
-        "c": [24, 25, 26, 27],
-        # standard_box_markers.parquet
-        "bottom": [21],
-        "top": [23],
-        "front": [24],
-        "back": [22],
-        "left": [25],
-        "right": [26],
-    }
-    ```
-  - Implement `get_face_normal_from_geometry(face_name: str, marker_geometry: dict[int, np.ndarray]) -> np.ndarray | None`:
-    - Look up marker IDs for face
-    - Get corners from geometry
-    - Compute and return average normal across markers in that face
-
-  **Must NOT do**:
-  - Do not modify parquet files
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-  - **Skills**: [`git-master`]
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Blocked By**: Task 1
-  - **Blocks**: Task 3, 4
-
-  **References**:
-  - Bash output from parquet inspection (earlier in conversation):
-    - Face a: IDs [16-19], normal ≈ [0,0,1]
-    - Face b: IDs [20-23], normal ≈ [0,1,0]
-    - Face c: IDs [24-27], normal ≈ [1,0,0]
-
-  **Acceptance Criteria**:
-  - [x] `FACE_MARKER_MAP` contains mappings for both parquet files
-  - [x] `get_face_normal_from_geometry("b", geometry)` returns ≈ [0,1,0]
-  - [x] Returns `None` for unknown face names
-
-  **Commit**: YES (group with Task 1)
-
---
-
- [x] 3. Implement ground detection heuristic
-
-  **What to do**:
-  - In `aruco/alignment.py`, implement:
-    ```python
-    def detect_ground_face(
-        visible_marker_ids: set[int],
-        marker_geometry: dict[int, np.ndarray],
-        camera_up_vector: np.ndarray = np.array([0, -1, 0]),  # -Y in camera frame
-    ) -> tuple[str, np.ndarray] | None:
-    ```
-  - Logic:
-    1. For each face in `FACE_MARKER_MAP`:
-       - Check if any of its markers are in `visible_marker_ids`
-       - If yes, compute face normal from geometry
-    2. Find the face whose normal most closely aligns with `camera_up_vector` (highest dot product)
-    3. Return (face_name, face_normal) or None if no faces visible
-  - Add debug logging with loguru
-
-  **Must NOT do**:
-  - Do not transform normals by camera pose here (that's done in caller)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-  - **Skills**: [`git-master`]
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Blocked By**: Task 2
-  - **Blocks**: Task 4
-
-  **References**:
-  - `calibrate_extrinsics.py:385` - Where marker IDs are detected
-  - Dot product alignment: `np.dot(normal, up_vec)` → highest = most aligned
-
-  **Acceptance Criteria**:
-  - [x] Function returns face with normal most aligned to camera up
-  - [x] Returns None when no mapped markers are visible
-  - [x] Debug log shows which faces were considered and scores
-
-  **Commit**: YES (group with Task 1, 2)
-
---
-
- [x] 4. Integrate into `calibrate_extrinsics.py`
-
-  **What to do**:
-  - Add CLI options:
-    - `--auto-align/--no-auto-align` (default: False)
-    - `--ground-face` (optional string, e.g., "b", "bottom")
-    - `--ground-marker-id` (optional int)
-  - Add imports from `aruco.alignment`
-  - After computing all camera poses (after the main loop, before saving):
-    1. If `--auto-align` is False, skip alignment
-    2. Determine ground face:
-       - If `--ground-face` specified: use it directly
-       - If `--ground-marker-id` specified: find which face contains that ID
-       - Else: use heuristic `detect_ground_face()` with visible markers from first camera
-    3. Get ground face normal from geometry
-    4. Compute `R_align = rotation_align_vectors(ground_normal, [0, 1, 0])`
-    5. Apply to all camera poses: `T_aligned = R_align @ T`
-    6. Log alignment info
-  - Update results dict with aligned poses
-
-  **Must NOT do**:
-  - Do not change behavior when `--auto-align` is not specified
-  - Do not modify per-frame pose computation (only post-process)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-high`
-  - **Skills**: [`git-master`]
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Blocked By**: Task 3
-  - **Blocks**: Task 5
-
-  **References**:
-  - `calibrate_extrinsics.py:456-477` - Where final poses are computed and stored
-  - `calibrate_extrinsics.py:266-271` - Existing CLI option pattern
-  - `aruco/alignment.py` - New utilities from Tasks 1-3
-
-  **Acceptance Criteria**:
-  - [x] `--auto-align` flag exists and defaults to False
-  - [x] `--ground-face` accepts string face names
-  - [x] `--ground-marker-id` accepts integer marker ID
-  - [x] When `--auto-align` used, output poses are rotated
-  - [x] Debug logs show: "Detected ground face: X, normal: [a,b,c], applying alignment"
-  - [x] `uv run python -m py_compile calibrate_extrinsics.py` → success
-  - [x] `.venv/bin/basedpyright calibrate_extrinsics.py` → 0 errors
-
-  **Commit**: YES
-  - Message: `feat(calibrate): add --auto-align for ground plane detection and Y-up alignment`
-  - Files: `calibrate_extrinsics.py`
-
---
-
- [x] 5. Add tests and verify end-to-end
-
-  **What to do**:
-  - Create `tests/test_alignment.py`:
-    - Test `compute_face_normal` with known corners
-    - Test `rotation_align_vectors` with various axis pairs
-    - Test `detect_ground_face` with mock marker data
-  - Run full calibration with `--auto-align` and verify output
-  - Compare aligned output to reference `inside_network.json` Y-up convention
-
-  **Must NOT do**:
-  - Do not require actual SVO files for unit tests (mock data)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-  - **Skills**: [`git-master`]
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Blocked By**: Task 4
-
-  **References**:
-  - `tests/test_depth_cli_postprocess.py` - Existing test pattern
-  - `/workspaces/zed-playground/zed_settings/inside_network.json` - Reference for Y-up verification
-
-  **Acceptance Criteria**:
-  - [x] `uv run pytest tests/test_alignment.py` → all pass
-  - [x] `uv run pytest` → all tests pass (including existing)
-  - [x] Manual verification: aligned poses have Y-axis column ≈ [0,1,0] in rotation
-
-  **Commit**: YES
-  - Message: `test(aruco): add alignment module tests`
-  - Files: `tests/test_alignment.py`
-
---
-
-## Commit Strategy
-
-| After Task | Message | Files | Verification |
-|------------|---------|-------|--------------|
-| 1, 2, 3 | `feat(aruco): add alignment utilities for ground plane detection` | `aruco/alignment.py` | `uv run python -c "from aruco.alignment import *"` |
-| 4 | `feat(calibrate): add --auto-align for ground plane detection and Y-up alignment` | `calibrate_extrinsics.py` | `uv run python -m py_compile calibrate_extrinsics.py` |
-| 5 | `test(aruco): add alignment module tests` | `tests/test_alignment.py` | `uv run pytest tests/test_alignment.py` |
-
---
-
-## Success Criteria
-
-### Verification Commands
-```bash
-# Compile check
-uv run python -m py_compile calibrate_extrinsics.py
-
-# Type check
-.venv/bin/basedpyright aruco/alignment.py calibrate_extrinsics.py
-
-# Unit tests
-uv run pytest tests/test_alignment.py
-
-# Integration test (requires SVO files)
-uv run calibrate_extrinsics.py --svo output --markers aruco/markers/object_points.parquet --aruco-dictionary DICT_APRILTAG_36h11 --auto-align --no-preview --sample-interval 100 --output aligned_extrinsics.json
-
-# Verify Y-up in output
-uv run python -c "import json, numpy as np; d=json.load(open('aligned_extrinsics.json')); T=np.fromstring(list(d.values())[0]['pose'], sep=' ').reshape(4,4); print('Y-axis:', T[:3,1])"
-# Expected: Y-axis ≈ [0, 1, 0]
-```
-
-### Final Checklist
- [x] `--auto-align` flag works
- [x] `--ground-face` override works
- [x] `--ground-marker-id` override works
- [x] Heuristic detection works without explicit face specification
- [x] Output extrinsics have Y-up when aligned
- [x] No behavior change when `--auto-align` not specified
- [x] All tests pass
- [x] Type checks pass
@@ -1,614 +0,0 @@
-# Multi-Frame Depth Pooling for Extrinsic Calibration
-
-## TL;DR
-
-> **Quick Summary**: Replace single-best-frame depth verification/refinement with top-N temporal pooling to reduce noise sensitivity and improve calibration robustness, while keeping existing verify/refine function signatures untouched.
-> 
-> **Deliverables**:
-> - New `pool_depth_maps()` utility function in `aruco/depth_pool.py`
-> - Extended frame collection (top-N per camera) in main loop
-> - New `--depth-pool-size` CLI option (default 1 = backward compatible)
-> - Unit tests for pooling, fallback, and N=1 equivalence
-> - E2E smoke comparison (pooled vs single-frame RMSE)
-> 
-> **Estimated Effort**: Medium
-> **Parallel Execution**: YES — 3 waves
-> **Critical Path**: Task 1 → Task 3 → Task 5 → Task 7
-
---
-
-## Context
-
-### Original Request
-User asked: "Is `apply_depth_verify_refine_postprocess` optimal? When `depth_mode` is not NONE, every frame computes depth regardless of whether it's used. Is there a better way to utilize every depth map when verify/refine is enabled?"
-
-### Interview Summary
-**Key Discussions**:
- Oracle confirmed single-best-frame is simplicity-biased but leaves accuracy on the table
- Recommended top 3–5 frame temporal pooling with confidence gating
- Phased approach: quick win (pooling), medium (weighted selection), advanced (joint optimization)
-
-**Research Findings**:
- `calibrate_extrinsics.py:682-714`: Current loop stores exactly one `verification_frames[serial]` per camera (best-scored)
- `aruco/depth_verify.py`: `verify_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
- `aruco/depth_refine.py`: `refine_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
- `aruco/svo_sync.py:FrameData`: Each frame already carries `depth_map` + `confidence_map`
- Memory: each depth map is ~3.5MB (720×1280 float32); storing 5 per camera = ~17.5MB/cam, ~70MB total for 4 cameras — acceptable
- Existing tests use synthetic depth maps, so new tests can follow same pattern
-
-### Metis Review
-**Identified Gaps** (addressed):
- Camera motion during capture → addressed via assumption that cameras are static during calibration; documented as guardrail
- "Top-N by score" may not correlate with depth quality → addressed by keeping confidence gating in pooling function
- Fewer than N frames available → addressed with explicit fallback behavior
- All pixels invalid after gating → addressed with fallback to best single frame
- N=1 must reproduce baseline exactly → addressed with explicit equivalence test
-
---
-
-## Work Objectives
-
-### Core Objective
-Pool depth maps from the top-N scored frames per camera to produce a more robust single depth target for verification and refinement, reducing sensitivity to single-frame noise.
-
-### Concrete Deliverables
- `aruco/depth_pool.py` — new module with `pool_depth_maps()` function
- Modified `calibrate_extrinsics.py` — top-N collection + pooling integration + CLI flag
- `tests/test_depth_pool.py` — unit tests for pooling logic
- Updated `tests/test_depth_cli_postprocess.py` — integration test for N=1 equivalence
-
-### Definition of Done
- [x] `uv run pytest -k "depth_pool"` → all tests pass
- [x] `uv run basedpyright` → 0 new errors
- [x] `--depth-pool-size 1` produces identical output to current baseline
- [x] `--depth-pool-size 5` produces equal or lower post-RMSE on test SVOs
-
-### Must Have
- Feature-flagged behind `--depth-pool-size` (default 1)
- Pure function `pool_depth_maps()` with deterministic output
- Confidence gating during pooling
- Graceful fallback when pooling fails (insufficient valid pixels)
- N=1 code path identical to current behavior
-
-### Must NOT Have (Guardrails)
- NO changes to `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` signatures
- NO scoring function redesign (use existing `score_frame()` as-is)
- NO cross-camera fusion or spatial alignment/warping between frames
- NO GPU acceleration or threading changes
- NO new artifact files or dashboards
- NO "unbounded history" — enforce max pool size cap (10)
- NO optical flow, Kalman filters, or temporal alignment beyond frame selection
-
---
-
-## Verification Strategy (MANDATORY)
-
-> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
->
-> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
-
-### Test Decision
- **Infrastructure exists**: YES
- **Automated tests**: YES (Tests-after, matching existing pattern)
- **Framework**: pytest (via `uv run pytest`)
-
-### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
-
-**Verification Tool by Deliverable Type:**
-
-| Type | Tool | How Agent Verifies |
-|------|------|-------------------|
-| Library/Module | Bash (uv run pytest) | Run targeted tests, compare output |
-| CLI | Bash (uv run calibrate_extrinsics.py) | Run with flags, check JSON output |
-| Type safety | Bash (uv run basedpyright) | Zero new errors |
-
---
-
-## Execution Strategy
-
-### Parallel Execution Waves
-
-```
-Wave 1 (Start Immediately):
-├── Task 1: Create pool_depth_maps() utility
-└── Task 2: Unit tests for pool_depth_maps()
-
-Wave 2 (After Wave 1):
-├── Task 3: Extend main loop to collect top-N frames
-├── Task 4: Add --depth-pool-size CLI option
-└── Task 5: Integrate pooling into postprocess function
-
-Wave 3 (After Wave 2):
-├── Task 6: N=1 equivalence regression test
-└── Task 7: E2E smoke comparison (pooled vs single-frame)
-```
-
-### Dependency Matrix
-
-| Task | Depends On | Blocks | Can Parallelize With |
-|------|------------|--------|---------------------|
-| 1 | None | 2, 3, 5 | 2 |
-| 2 | 1 | None | 1 |
-| 3 | 1 | 5, 6 | 4 |
-| 4 | None | 5 | 3 |
-| 5 | 1, 3, 4 | 6, 7 | None |
-| 6 | 5 | None | 7 |
-| 7 | 5 | None | 6 |
-
---
-
-## TODOs
-
- [x] 1. Create `pool_depth_maps()` utility in `aruco/depth_pool.py`
-
-  **What to do**:
-  - Create new file `aruco/depth_pool.py`
-  - Implement `pool_depth_maps(depth_maps: list[np.ndarray], confidence_maps: list[np.ndarray | None], confidence_thresh: float = 50.0, min_valid_count: int = 1) -> tuple[np.ndarray, np.ndarray | None]`
-  - Algorithm:
-    1. Stack depth maps along new axis → shape (N, H, W)
-    2. For each pixel position, mask invalid values (NaN, inf, ≤ 0) AND confidence-rejected pixels (conf > thresh)
-    3. Compute per-pixel **median** across valid frames → pooled depth
-    4. For confidence: compute per-pixel **minimum** (most confident) across frames → pooled confidence
-    5. Pixels with < `min_valid_count` valid observations → set to NaN in pooled depth
-  - Handle edge cases:
-    - Empty input list → raise ValueError
-    - Single map (N=1) → return copy of input (exact equivalence path)
-    - All maps invalid at a pixel → NaN in output
-    - Shape mismatch across maps → raise ValueError
-    - Mixed None confidence maps → pool only non-None, or return None if all None
-  - Add type hints, docstring with Args/Returns
-
-  **Must NOT do**:
-  - No weighted mean (median is more robust to outliers; keep simple for Phase 1)
-  - No spatial alignment or warping
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Single focused module, pure function, no complex dependencies
-  - **Skills**: []
-    - No special skills needed; standard Python/numpy work
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 2)
-  - **Blocks**: Tasks 2, 3, 5
-  - **Blocked By**: None
-
-  **References**:
-
-  **Pattern References**:
-  - `aruco/depth_verify.py:39-79` — `compute_depth_residual()` shows how invalid depth is handled (NaN, ≤0, window median pattern)
-  - `aruco/depth_verify.py:27-36` — `get_confidence_weight()` shows confidence semantics (ZED: 1=most confident, 100=least; threshold default 50)
-
-  **API/Type References**:
-  - `aruco/svo_sync.py:10-18` — `FrameData` dataclass: `depth_map: np.ndarray | None`, `confidence_map: np.ndarray | None`
-
-  **Test References**:
-  - `tests/test_depth_verify.py:36-60` — Pattern for creating synthetic depth maps and testing residual computation
-
-  **WHY Each Reference Matters**:
-  - `depth_verify.py:39-79`: Defines the invalid-depth encoding convention (NaN/≤0) that pooling must respect
-  - `depth_verify.py:27-36`: Defines confidence semantics and threshold convention; pooling gating must match
-  - `svo_sync.py:10-18`: Defines the data types the pooling function will receive
-
-  **Acceptance Criteria**:
-  - [ ] File `aruco/depth_pool.py` exists with `pool_depth_maps()` function
-  - [ ] Function handles N=1 by returning exact copy of input
-  - [ ] Function raises ValueError on empty input or shape mismatch
-  - [ ] `uv run basedpyright aruco/depth_pool.py` → 0 errors
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: Module imports without error
-    Tool: Bash
-    Steps:
-      1. uv run python -c "from aruco.depth_pool import pool_depth_maps; print('OK')"
-      2. Assert: stdout contains "OK"
-    Expected Result: Clean import
-  ```
-
-  **Commit**: YES
-  - Message: `feat(aruco): add pool_depth_maps utility for multi-frame depth pooling`
-  - Files: `aruco/depth_pool.py`
-
---
-
- [x] 2. Unit tests for `pool_depth_maps()`
-
-  **What to do**:
-  - Create `tests/test_depth_pool.py`
-  - Test cases:
-    1. **Single map (N=1)**: output equals input exactly
-    2. **Two maps, clean**: median of two values at each pixel
-    3. **Three maps with NaN**: median ignores NaN pixels correctly
-    4. **Confidence gating**: pixels above threshold excluded from median
-    5. **All invalid at pixel**: output is NaN
-    6. **Empty input**: raises ValueError
-    7. **Shape mismatch**: raises ValueError
-    8. **min_valid_count**: pixel with fewer valid observations → NaN
-    9. **None confidence maps**: graceful handling (pools depth only, returns None confidence)
-  - Use `numpy.testing.assert_allclose` for numerical checks
-  - Use `pytest.raises(ValueError, match=...)` for error cases
-
-  **Must NOT do**:
-  - No integration with calibrate_extrinsics.py yet (unit tests only)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Focused test file creation following existing patterns
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 1 (with Task 1)
-  - **Blocks**: None
-  - **Blocked By**: Task 1
-
-  **References**:
-
-  **Test References**:
-  - `tests/test_depth_verify.py:36-60` — Pattern for synthetic depth map creation and assertion style
-  - `tests/test_depth_refine.py:10-18` — Pattern for roundtrip/equivalence testing
-
-  **WHY Each Reference Matters**:
-  - Shows the exact assertion patterns and synthetic data conventions used in this codebase
-
-  **Acceptance Criteria**:
-  - [ ] `uv run pytest tests/test_depth_pool.py -v` → all tests pass
-  - [ ] At least 9 test cases covering the enumerated scenarios
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: All pool tests pass
-    Tool: Bash
-    Steps:
-      1. uv run pytest tests/test_depth_pool.py -v
-      2. Assert: exit code 0
-      3. Assert: output contains "passed" with 0 "failed"
-    Expected Result: All tests green
-  ```
-
-  **Commit**: YES (groups with Task 1)
-  - Message: `test(aruco): add unit tests for pool_depth_maps`
-  - Files: `tests/test_depth_pool.py`
-
---
-
- [x] 3. Extend main loop to collect top-N frames per camera
-
-  **What to do**:
-  - In `calibrate_extrinsics.py`, modify the verification frame collection (lines ~682-714):
-    - Change `verification_frames` from `dict[serial, single_frame_dict]` to `dict[serial, list[frame_dict]]`
-    - Maintain list sorted by score (descending), truncated to `depth_pool_size`
-    - Use `heapq` or sorted insertion to keep top-N efficiently
-    - When `depth_pool_size == 1`, behavior must be identical to current (store only best)
-  - Update all downstream references to `verification_frames` that assume single-frame structure
-  - The `first_frames` dict remains unchanged (it's for benchmarking, separate concern)
-
-  **Must NOT do**:
-  - Do NOT change the scoring function `score_frame()`
-  - Do NOT change `FrameData` structure
-  - Do NOT store frames outside the sampled loop (only collect from frames that already have depth)
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-low`
-    - Reason: Surgical modification to existing loop logic; requires careful attention to existing consumers
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Tasks 4)
-  - **Blocks**: Tasks 5, 6
-  - **Blocked By**: Task 1
-
-  **References**:
-
-  **Pattern References**:
-  - `calibrate_extrinsics.py:620-760` — Main loop where verification frames are collected; lines 682-714 are the critical section
-  - `calibrate_extrinsics.py:118-258` — `apply_depth_verify_refine_postprocess()` which consumes `verification_frames`
-
-  **API/Type References**:
-  - `aruco/svo_sync.py:10-18` — `FrameData` structure that's stored in verification_frames
-
-  **WHY Each Reference Matters**:
-  - `calibrate_extrinsics.py:682-714`: This is the exact code being modified; must understand score comparison and dict storage
-  - `calibrate_extrinsics.py:118-258`: Must understand how `verification_frames` is consumed downstream to know what structure changes are safe
-
-  **Acceptance Criteria**:
-  - [ ] `verification_frames[serial]` is now a list of frame dicts, sorted by score descending
-  - [ ] List length ≤ `depth_pool_size` for each camera
-  - [ ] When `depth_pool_size == 1`, list has exactly one element matching current best-frame behavior
-  - [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: Top-N collection works with pool size 3
-    Tool: Bash
-    Steps:
-      1. uv run python -c "
-         # Verify the data structure change is correct by inspecting types
-         import ast, inspect
-         # If this imports without error, structure is consistent
-         from calibrate_extrinsics import apply_depth_verify_refine_postprocess
-         print('OK')
-         "
-      2. Assert: stdout contains "OK"
-    Expected Result: No import errors from structural changes
-  ```
-
-  **Commit**: NO (groups with Task 5)
-
---
-
- [x] 4. Add `--depth-pool-size` CLI option
-
-  **What to do**:
-  - Add click option to `main()` in `calibrate_extrinsics.py`:
-    ```python
-    @click.option(
-        "--depth-pool-size",
-        default=1,
-        type=click.IntRange(min=1, max=10),
-        help="Number of top-scored frames to pool for depth verification/refinement (1=single best frame, >1=median pooling).",
-    )
-    ```
-  - Pass through to function signature
-  - Add to `apply_depth_verify_refine_postprocess()` parameters (or pass `depth_pool_size` to control pooling)
-  - Update help text for `--depth-mode` if needed to mention pooling interaction
-
-  **Must NOT do**:
-  - Do NOT implement the actual pooling logic here (that's Task 5)
-  - Do NOT allow values > 10 (memory guardrail)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Single CLI option addition, boilerplate only
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 2 (with Task 3)
-  - **Blocks**: Task 5
-  - **Blocked By**: None
-
-  **References**:
-
-  **Pattern References**:
-  - `calibrate_extrinsics.py:474-478` — Existing `--max-samples` option as pattern for optional integer CLI flag
-  - `calibrate_extrinsics.py:431-436` — `--depth-mode` option pattern
-
-  **WHY Each Reference Matters**:
-  - Shows the exact click option pattern and placement convention in this file
-
-  **Acceptance Criteria**:
-  - [ ] `uv run calibrate_extrinsics.py --help` shows `--depth-pool-size` with description
-  - [ ] Default value is 1
-  - [ ] Values outside 1-10 are rejected by click
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: CLI option appears in help
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py --help
-      2. Assert: output contains "--depth-pool-size"
-      3. Assert: output contains "1=single best frame"
-    Expected Result: Option visible with correct help text
-
-  Scenario: Invalid pool size rejected
-    Tool: Bash
-    Steps:
-      1. uv run calibrate_extrinsics.py --depth-pool-size 0 --help 2>&1 || true
-      2. Assert: output contains error or "Invalid value"
-    Expected Result: Click rejects out-of-range value
-  ```
-
-  **Commit**: NO (groups with Task 5)
-
---
-
- [x] 5. Integrate pooling into `apply_depth_verify_refine_postprocess()`
-
-  **What to do**:
-  - Modify `apply_depth_verify_refine_postprocess()` to accept `depth_pool_size: int = 1` parameter
-  - When `depth_pool_size > 1` and multiple frames available:
-    1. Extract depth_maps and confidence_maps from the top-N frame list
-    2. Call `pool_depth_maps()` to produce pooled depth/confidence
-    3. Use pooled maps for `verify_extrinsics_with_depth()` and `refine_extrinsics_with_depth()`
-    4. Use the **best-scored frame's** `ids` for marker corner lookup (it has best detection quality)
-  - When `depth_pool_size == 1` OR only 1 frame available:
-    - Use existing single-frame path exactly (no pooling call)
-  - Add pooling metadata to JSON output: `"depth_pool": {"pool_size_requested": N, "pool_size_actual": M, "pooled": true/false}`
-  - Wire `depth_pool_size` from `main()` through to this function
-  - Handle edge case: if pooling produces a map with fewer valid points than best single frame, log warning and fall back to single frame
-
-  **Must NOT do**:
-  - Do NOT change `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` function signatures
-  - Do NOT add new CLI output formats
-
-  **Recommended Agent Profile**:
-  - **Category**: `unspecified-high`
-    - Reason: Core integration task with multiple touchpoints; requires careful wiring and edge case handling
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: NO
-  - **Parallel Group**: Sequential (after Wave 2)
-  - **Blocks**: Tasks 6, 7
-  - **Blocked By**: Tasks 1, 3, 4
-
-  **References**:
-
-  **Pattern References**:
-  - `calibrate_extrinsics.py:118-258` — Full `apply_depth_verify_refine_postprocess()` function being modified
-  - `calibrate_extrinsics.py:140-156` — Frame data extraction pattern (accessing `vf["frame"]`, `vf["ids"]`)
-  - `calibrate_extrinsics.py:158-180` — Verification call pattern
-  - `calibrate_extrinsics.py:182-245` — Refinement call pattern
-
-  **API/Type References**:
-  - `aruco/depth_pool.py:pool_depth_maps()` — The pooling function (Task 1 output)
-  - `aruco/depth_verify.py:119-179` — `verify_extrinsics_with_depth()` signature
-  - `aruco/depth_refine.py:71-227` — `refine_extrinsics_with_depth()` signature
-
-  **WHY Each Reference Matters**:
-  - `calibrate_extrinsics.py:140-156`: Shows how frame data is currently extracted; must adapt for list-of-frames
-  - `depth_pool.py`: The function we're calling for multi-frame pooling
-  - `depth_verify.py/depth_refine.py`: Confirms signatures remain unchanged (just pass different depth_map)
-
-  **Acceptance Criteria**:
-  - [ ] With `--depth-pool-size 1`: output JSON identical to baseline (no `depth_pool` metadata needed for N=1)
-  - [ ] With `--depth-pool-size 5`: output JSON includes `depth_pool` metadata; verify/refine uses pooled maps
-  - [ ] Fallback to single frame logged when pooling produces fewer valid points
-  - [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: Pool size 1 produces baseline-equivalent output
-    Tool: Bash
-    Preconditions: output/ directory with SVO files
-    Steps:
-      1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --no-preview --max-samples 5 --depth-pool-size 1 --output output/_test_pool1.json
-      2. Assert: exit code 0
-      3. Assert: output/_test_pool1.json exists and contains depth_verify entries
-    Expected Result: Runs cleanly, produces valid output
-
-  Scenario: Pool size 5 runs and includes pool metadata
-    Tool: Bash
-    Preconditions: output/ directory with SVO files
-    Steps:
-      1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --refine-depth --no-preview --max-samples 10 --depth-pool-size 5 --output output/_test_pool5.json
-      2. Assert: exit code 0
-      3. Parse output/_test_pool5.json
-      4. Assert: at least one camera entry contains "depth_pool" key
-    Expected Result: Pooling metadata present in output
-  ```
-
-  **Commit**: YES
-  - Message: `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag`
-  - Files: `calibrate_extrinsics.py`, `aruco/depth_pool.py`, `tests/test_depth_pool.py`
-  - Pre-commit: `uv run pytest tests/test_depth_pool.py && uv run basedpyright calibrate_extrinsics.py`
-
---
-
- [x] 6. N=1 equivalence regression test
-
-  **What to do**:
-  - Add test in `tests/test_depth_cli_postprocess.py` (or `tests/test_depth_pool.py`):
-    - Create synthetic scenario with known depth maps and marker geometry
-    - Run `apply_depth_verify_refine_postprocess()` with pool_size=1 using the old single-frame structure
-    - Run with pool_size=1 using the new list-of-frames structure
-    - Assert outputs are numerically identical (atol=0)
-  - This proves the refactor preserves backward compatibility
-
-  **Must NOT do**:
-  - No E2E CLI test here (that's Task 7)
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Focused regression test with synthetic data
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 3 (with Task 7)
-  - **Blocks**: None
-  - **Blocked By**: Task 5
-
-  **References**:
-
-  **Test References**:
-  - `tests/test_depth_cli_postprocess.py` — Existing integration test patterns
-  - `tests/test_depth_verify.py:36-60` — Synthetic depth map creation pattern
-
-  **Acceptance Criteria**:
-  - [ ] `uv run pytest -k "pool_size_1_equivalence"` → passes
-  - [ ] Test asserts exact numerical equality between old-path and new-path outputs
-
-  **Commit**: YES
-  - Message: `test(calibrate): add N=1 equivalence regression test for depth pooling`
-  - Files: `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py`
-
---
-
- [x] 7. E2E smoke comparison: pooled vs single-frame RMSE
-
-  **What to do**:
-  - Run calibration on test SVOs with `--depth-pool-size 1` and `--depth-pool-size 5`
-  - Compare:
-    - Post-refinement RMSE per camera
-    - Depth-normalized RMSE
-    - CSV residual distribution (mean_abs, p50, p90)
-    - Runtime (wall clock)
-  - Document results in a brief summary (stdout or saved to a comparison file)
-  - **Success criterion**: pooled RMSE ≤ single-frame RMSE for majority of cameras; runtime overhead < 25%
-
-  **Must NOT do**:
-  - No automated pass/fail assertion on real data (metrics are directional, not deterministic)
-  - No permanent benchmark infrastructure
-
-  **Recommended Agent Profile**:
-  - **Category**: `quick`
-    - Reason: Run two commands, compare JSON output, summarize
-  - **Skills**: []
-
-  **Parallelization**:
-  - **Can Run In Parallel**: YES
-  - **Parallel Group**: Wave 3 (with Task 6)
-  - **Blocks**: None
-  - **Blocked By**: Task 5
-
-  **References**:
-
-  **Pattern References**:
-  - Previous smoke runs in this session: `output/e2e_refine_depth_full_neural_plus.json` as baseline
-
-  **Acceptance Criteria**:
-  - [ ] Both runs complete without error
-  - [ ] Comparison summary printed showing per-camera RMSE for pool=1 vs pool=5
-  - [ ] Runtime logged for both runs
-
-  **Agent-Executed QA Scenarios:**
-  ```
-  Scenario: Compare pool=1 vs pool=5 on full SVOs
-    Tool: Bash
-    Steps:
-      1. Run with --depth-pool-size 1 --verify-depth --refine-depth --output output/_compare_pool1.json
-      2. Run with --depth-pool-size 5 --verify-depth --refine-depth --output output/_compare_pool5.json
-      3. Parse both JSON files
-      4. Print per-camera post RMSE comparison table
-      5. Print runtime difference
-    Expected Result: Both complete; comparison table printed
-    Evidence: Terminal output captured
-  ```
-
-  **Commit**: NO (no code change; just verification)
-
---
-
-## Commit Strategy
-
-| After Task | Message | Files | Verification |
-|------------|---------|-------|--------------|
-| 1+2 | `feat(aruco): add pool_depth_maps utility with tests` | `aruco/depth_pool.py`, `tests/test_depth_pool.py` | `uv run pytest tests/test_depth_pool.py` |
-| 5 (includes 3+4) | `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag` | `calibrate_extrinsics.py` | `uv run pytest && uv run basedpyright` |
-| 6 | `test(calibrate): add N=1 equivalence regression test for depth pooling` | `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py` | `uv run pytest -k pool_size_1` |
-
---
-
-## Success Criteria
-
-### Verification Commands
-```bash
-uv run pytest tests/test_depth_pool.py -v           # All pool unit tests pass
-uv run pytest -k "pool_size_1_equivalence" -v        # N=1 regression passes
-uv run basedpyright                                   # 0 new errors
-uv run calibrate_extrinsics.py --help | grep pool    # CLI flag visible
-```
-
-### Final Checklist
- [x] `pool_depth_maps()` pure function exists with full edge case handling
- [x] `--depth-pool-size` CLI option with default=1, max=10
- [x] N=1 produces identical results to baseline
- [x] All existing tests still pass
- [x] Type checker clean
- [x] E2E comparison shows pooled RMSE ≤ single-frame RMSE for majority of cameras