feat(tooling): add extrinsics visualizer and close depth-pooling plan
Finalize multi-frame depth pooling execution tracking with fully verified plan checkboxes and add a Y-up/bird-eye extrinsics visualizer with pose-convention auto detection for calibration sanity checks.
This commit is contained in:
@@ -0,0 +1,614 @@
|
||||
# Multi-Frame Depth Pooling for Extrinsic Calibration
|
||||
|
||||
## TL;DR
|
||||
|
||||
> **Quick Summary**: Replace single-best-frame depth verification/refinement with top-N temporal pooling to reduce noise sensitivity and improve calibration robustness, while keeping existing verify/refine function signatures untouched.
|
||||
>
|
||||
> **Deliverables**:
|
||||
> - New `pool_depth_maps()` utility function in `aruco/depth_pool.py`
|
||||
> - Extended frame collection (top-N per camera) in main loop
|
||||
> - New `--depth-pool-size` CLI option (default 1 = backward compatible)
|
||||
> - Unit tests for pooling, fallback, and N=1 equivalence
|
||||
> - E2E smoke comparison (pooled vs single-frame RMSE)
|
||||
>
|
||||
> **Estimated Effort**: Medium
|
||||
> **Parallel Execution**: YES — 3 waves
|
||||
> **Critical Path**: Task 1 → Task 3 → Task 5 → Task 7
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### Original Request
|
||||
User asked: "Is `apply_depth_verify_refine_postprocess` optimal? When `depth_mode` is not NONE, every frame computes depth regardless of whether it's used. Is there a better way to utilize every depth map when verify/refine is enabled?"
|
||||
|
||||
### Interview Summary
|
||||
**Key Discussions**:
|
||||
- Oracle confirmed single-best-frame is simplicity-biased but leaves accuracy on the table
|
||||
- Recommended top 3–5 frame temporal pooling with confidence gating
|
||||
- Phased approach: quick win (pooling), medium (weighted selection), advanced (joint optimization)
|
||||
|
||||
**Research Findings**:
|
||||
- `calibrate_extrinsics.py:682-714`: Current loop stores exactly one `verification_frames[serial]` per camera (best-scored)
|
||||
- `aruco/depth_verify.py`: `verify_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
|
||||
- `aruco/depth_refine.py`: `refine_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
|
||||
- `aruco/svo_sync.py:FrameData`: Each frame already carries `depth_map` + `confidence_map`
|
||||
- Memory: each depth map is ~3.5MB (720×1280 float32); storing 5 per camera = ~17.5MB/cam, ~70MB total for 4 cameras — acceptable
|
||||
- Existing tests use synthetic depth maps, so new tests can follow same pattern
|
||||
|
||||
### Metis Review
|
||||
**Identified Gaps** (addressed):
|
||||
- Camera motion during capture → addressed via assumption that cameras are static during calibration; documented as guardrail
|
||||
- "Top-N by score" may not correlate with depth quality → addressed by keeping confidence gating in pooling function
|
||||
- Fewer than N frames available → addressed with explicit fallback behavior
|
||||
- All pixels invalid after gating → addressed with fallback to best single frame
|
||||
- N=1 must reproduce baseline exactly → addressed with explicit equivalence test
|
||||
|
||||
---
|
||||
|
||||
## Work Objectives
|
||||
|
||||
### Core Objective
|
||||
Pool depth maps from the top-N scored frames per camera to produce a more robust single depth target for verification and refinement, reducing sensitivity to single-frame noise.
|
||||
|
||||
### Concrete Deliverables
|
||||
- `aruco/depth_pool.py` — new module with `pool_depth_maps()` function
|
||||
- Modified `calibrate_extrinsics.py` — top-N collection + pooling integration + CLI flag
|
||||
- `tests/test_depth_pool.py` — unit tests for pooling logic
|
||||
- Updated `tests/test_depth_cli_postprocess.py` — integration test for N=1 equivalence
|
||||
|
||||
### Definition of Done
|
||||
- [x] `uv run pytest -k "depth_pool"` → all tests pass
|
||||
- [x] `uv run basedpyright` → 0 new errors
|
||||
- [x] `--depth-pool-size 1` produces identical output to current baseline
|
||||
- [x] `--depth-pool-size 5` produces equal or lower post-RMSE on test SVOs
|
||||
|
||||
### Must Have
|
||||
- Feature-flagged behind `--depth-pool-size` (default 1)
|
||||
- Pure function `pool_depth_maps()` with deterministic output
|
||||
- Confidence gating during pooling
|
||||
- Graceful fallback when pooling fails (insufficient valid pixels)
|
||||
- N=1 code path identical to current behavior
|
||||
|
||||
### Must NOT Have (Guardrails)
|
||||
- NO changes to `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` signatures
|
||||
- NO scoring function redesign (use existing `score_frame()` as-is)
|
||||
- NO cross-camera fusion or spatial alignment/warping between frames
|
||||
- NO GPU acceleration or threading changes
|
||||
- NO new artifact files or dashboards
|
||||
- NO "unbounded history" — enforce max pool size cap (10)
|
||||
- NO optical flow, Kalman filters, or temporal alignment beyond frame selection
|
||||
|
||||
---
|
||||
|
||||
## Verification Strategy (MANDATORY)
|
||||
|
||||
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
|
||||
>
|
||||
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
|
||||
|
||||
### Test Decision
|
||||
- **Infrastructure exists**: YES
|
||||
- **Automated tests**: YES (Tests-after, matching existing pattern)
|
||||
- **Framework**: pytest (via `uv run pytest`)
|
||||
|
||||
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
|
||||
|
||||
**Verification Tool by Deliverable Type:**
|
||||
|
||||
| Type | Tool | How Agent Verifies |
|
||||
|------|------|-------------------|
|
||||
| Library/Module | Bash (uv run pytest) | Run targeted tests, compare output |
|
||||
| CLI | Bash (uv run calibrate_extrinsics.py) | Run with flags, check JSON output |
|
||||
| Type safety | Bash (uv run basedpyright) | Zero new errors |
|
||||
|
||||
---
|
||||
|
||||
## Execution Strategy
|
||||
|
||||
### Parallel Execution Waves
|
||||
|
||||
```
|
||||
Wave 1 (Start Immediately):
|
||||
├── Task 1: Create pool_depth_maps() utility
|
||||
└── Task 2: Unit tests for pool_depth_maps()
|
||||
|
||||
Wave 2 (After Wave 1):
|
||||
├── Task 3: Extend main loop to collect top-N frames
|
||||
├── Task 4: Add --depth-pool-size CLI option
|
||||
└── Task 5: Integrate pooling into postprocess function
|
||||
|
||||
Wave 3 (After Wave 2):
|
||||
├── Task 6: N=1 equivalence regression test
|
||||
└── Task 7: E2E smoke comparison (pooled vs single-frame)
|
||||
```
|
||||
|
||||
### Dependency Matrix
|
||||
|
||||
| Task | Depends On | Blocks | Can Parallelize With |
|
||||
|------|------------|--------|---------------------|
|
||||
| 1 | None | 2, 3, 5 | 2 |
|
||||
| 2 | 1 | None | 1 |
|
||||
| 3 | 1 | 5, 6 | 4 |
|
||||
| 4 | None | 5 | 3 |
|
||||
| 5 | 1, 3, 4 | 6, 7 | None |
|
||||
| 6 | 5 | None | 7 |
|
||||
| 7 | 5 | None | 6 |
|
||||
|
||||
---
|
||||
|
||||
## TODOs
|
||||
|
||||
- [x] 1. Create `pool_depth_maps()` utility in `aruco/depth_pool.py`
|
||||
|
||||
**What to do**:
|
||||
- Create new file `aruco/depth_pool.py`
|
||||
- Implement `pool_depth_maps(depth_maps: list[np.ndarray], confidence_maps: list[np.ndarray | None], confidence_thresh: float = 50.0, min_valid_count: int = 1) -> tuple[np.ndarray, np.ndarray | None]`
|
||||
- Algorithm:
|
||||
1. Stack depth maps along new axis → shape (N, H, W)
|
||||
2. For each pixel position, mask invalid values (NaN, inf, ≤ 0) AND confidence-rejected pixels (conf > thresh)
|
||||
3. Compute per-pixel **median** across valid frames → pooled depth
|
||||
4. For confidence: compute per-pixel **minimum** (most confident) across frames → pooled confidence
|
||||
5. Pixels with < `min_valid_count` valid observations → set to NaN in pooled depth
|
||||
- Handle edge cases:
|
||||
- Empty input list → raise ValueError
|
||||
- Single map (N=1) → return copy of input (exact equivalence path)
|
||||
- All maps invalid at a pixel → NaN in output
|
||||
- Shape mismatch across maps → raise ValueError
|
||||
- Mixed None confidence maps → pool only non-None, or return None if all None
|
||||
- Add type hints, docstring with Args/Returns
|
||||
|
||||
**Must NOT do**:
|
||||
- No weighted mean (median is more robust to outliers; keep simple for Phase 1)
|
||||
- No spatial alignment or warping
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `quick`
|
||||
- Reason: Single focused module, pure function, no complex dependencies
|
||||
- **Skills**: []
|
||||
- No special skills needed; standard Python/numpy work
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 1 (with Task 2)
|
||||
- **Blocks**: Tasks 2, 3, 5
|
||||
- **Blocked By**: None
|
||||
|
||||
**References**:
|
||||
|
||||
**Pattern References**:
|
||||
- `aruco/depth_verify.py:39-79` — `compute_depth_residual()` shows how invalid depth is handled (NaN, ≤0, window median pattern)
|
||||
- `aruco/depth_verify.py:27-36` — `get_confidence_weight()` shows confidence semantics (ZED: 1=most confident, 100=least; threshold default 50)
|
||||
|
||||
**API/Type References**:
|
||||
- `aruco/svo_sync.py:10-18` — `FrameData` dataclass: `depth_map: np.ndarray | None`, `confidence_map: np.ndarray | None`
|
||||
|
||||
**Test References**:
|
||||
- `tests/test_depth_verify.py:36-60` — Pattern for creating synthetic depth maps and testing residual computation
|
||||
|
||||
**WHY Each Reference Matters**:
|
||||
- `depth_verify.py:39-79`: Defines the invalid-depth encoding convention (NaN/≤0) that pooling must respect
|
||||
- `depth_verify.py:27-36`: Defines confidence semantics and threshold convention; pooling gating must match
|
||||
- `svo_sync.py:10-18`: Defines the data types the pooling function will receive
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] File `aruco/depth_pool.py` exists with `pool_depth_maps()` function
|
||||
- [ ] Function handles N=1 by returning exact copy of input
|
||||
- [ ] Function raises ValueError on empty input or shape mismatch
|
||||
- [ ] `uv run basedpyright aruco/depth_pool.py` → 0 errors
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: Module imports without error
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. uv run python -c "from aruco.depth_pool import pool_depth_maps; print('OK')"
|
||||
2. Assert: stdout contains "OK"
|
||||
Expected Result: Clean import
|
||||
```
|
||||
|
||||
**Commit**: YES
|
||||
- Message: `feat(aruco): add pool_depth_maps utility for multi-frame depth pooling`
|
||||
- Files: `aruco/depth_pool.py`
|
||||
|
||||
---
|
||||
|
||||
- [x] 2. Unit tests for `pool_depth_maps()`
|
||||
|
||||
**What to do**:
|
||||
- Create `tests/test_depth_pool.py`
|
||||
- Test cases:
|
||||
1. **Single map (N=1)**: output equals input exactly
|
||||
2. **Two maps, clean**: median of two values at each pixel
|
||||
3. **Three maps with NaN**: median ignores NaN pixels correctly
|
||||
4. **Confidence gating**: pixels above threshold excluded from median
|
||||
5. **All invalid at pixel**: output is NaN
|
||||
6. **Empty input**: raises ValueError
|
||||
7. **Shape mismatch**: raises ValueError
|
||||
8. **min_valid_count**: pixel with fewer valid observations → NaN
|
||||
9. **None confidence maps**: graceful handling (pools depth only, returns None confidence)
|
||||
- Use `numpy.testing.assert_allclose` for numerical checks
|
||||
- Use `pytest.raises(ValueError, match=...)` for error cases
|
||||
|
||||
**Must NOT do**:
|
||||
- No integration with calibrate_extrinsics.py yet (unit tests only)
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `quick`
|
||||
- Reason: Focused test file creation following existing patterns
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 1 (with Task 1)
|
||||
- **Blocks**: None
|
||||
- **Blocked By**: Task 1
|
||||
|
||||
**References**:
|
||||
|
||||
**Test References**:
|
||||
- `tests/test_depth_verify.py:36-60` — Pattern for synthetic depth map creation and assertion style
|
||||
- `tests/test_depth_refine.py:10-18` — Pattern for roundtrip/equivalence testing
|
||||
|
||||
**WHY Each Reference Matters**:
|
||||
- Shows the exact assertion patterns and synthetic data conventions used in this codebase
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] `uv run pytest tests/test_depth_pool.py -v` → all tests pass
|
||||
- [ ] At least 9 test cases covering the enumerated scenarios
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: All pool tests pass
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. uv run pytest tests/test_depth_pool.py -v
|
||||
2. Assert: exit code 0
|
||||
3. Assert: output contains "passed" with 0 "failed"
|
||||
Expected Result: All tests green
|
||||
```
|
||||
|
||||
**Commit**: YES (groups with Task 1)
|
||||
- Message: `test(aruco): add unit tests for pool_depth_maps`
|
||||
- Files: `tests/test_depth_pool.py`
|
||||
|
||||
---
|
||||
|
||||
- [x] 3. Extend main loop to collect top-N frames per camera
|
||||
|
||||
**What to do**:
|
||||
- In `calibrate_extrinsics.py`, modify the verification frame collection (lines ~682-714):
|
||||
- Change `verification_frames` from `dict[serial, single_frame_dict]` to `dict[serial, list[frame_dict]]`
|
||||
- Maintain list sorted by score (descending), truncated to `depth_pool_size`
|
||||
- Use `heapq` or sorted insertion to keep top-N efficiently
|
||||
- When `depth_pool_size == 1`, behavior must be identical to current (store only best)
|
||||
- Update all downstream references to `verification_frames` that assume single-frame structure
|
||||
- The `first_frames` dict remains unchanged (it's for benchmarking, separate concern)
|
||||
|
||||
**Must NOT do**:
|
||||
- Do NOT change the scoring function `score_frame()`
|
||||
- Do NOT change `FrameData` structure
|
||||
- Do NOT store frames outside the sampled loop (only collect from frames that already have depth)
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `unspecified-low`
|
||||
- Reason: Surgical modification to existing loop logic; requires careful attention to existing consumers
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 2 (with Tasks 4)
|
||||
- **Blocks**: Tasks 5, 6
|
||||
- **Blocked By**: Task 1
|
||||
|
||||
**References**:
|
||||
|
||||
**Pattern References**:
|
||||
- `calibrate_extrinsics.py:620-760` — Main loop where verification frames are collected; lines 682-714 are the critical section
|
||||
- `calibrate_extrinsics.py:118-258` — `apply_depth_verify_refine_postprocess()` which consumes `verification_frames`
|
||||
|
||||
**API/Type References**:
|
||||
- `aruco/svo_sync.py:10-18` — `FrameData` structure that's stored in verification_frames
|
||||
|
||||
**WHY Each Reference Matters**:
|
||||
- `calibrate_extrinsics.py:682-714`: This is the exact code being modified; must understand score comparison and dict storage
|
||||
- `calibrate_extrinsics.py:118-258`: Must understand how `verification_frames` is consumed downstream to know what structure changes are safe
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] `verification_frames[serial]` is now a list of frame dicts, sorted by score descending
|
||||
- [ ] List length ≤ `depth_pool_size` for each camera
|
||||
- [ ] When `depth_pool_size == 1`, list has exactly one element matching current best-frame behavior
|
||||
- [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: Top-N collection works with pool size 3
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. uv run python -c "
|
||||
# Verify the data structure change is correct by inspecting types
|
||||
import ast, inspect
|
||||
# If this imports without error, structure is consistent
|
||||
from calibrate_extrinsics import apply_depth_verify_refine_postprocess
|
||||
print('OK')
|
||||
"
|
||||
2. Assert: stdout contains "OK"
|
||||
Expected Result: No import errors from structural changes
|
||||
```
|
||||
|
||||
**Commit**: NO (groups with Task 5)
|
||||
|
||||
---
|
||||
|
||||
- [x] 4. Add `--depth-pool-size` CLI option
|
||||
|
||||
**What to do**:
|
||||
- Add click option to `main()` in `calibrate_extrinsics.py`:
|
||||
```python
|
||||
@click.option(
|
||||
"--depth-pool-size",
|
||||
default=1,
|
||||
type=click.IntRange(min=1, max=10),
|
||||
help="Number of top-scored frames to pool for depth verification/refinement (1=single best frame, >1=median pooling).",
|
||||
)
|
||||
```
|
||||
- Pass through to function signature
|
||||
- Add to `apply_depth_verify_refine_postprocess()` parameters (or pass `depth_pool_size` to control pooling)
|
||||
- Update help text for `--depth-mode` if needed to mention pooling interaction
|
||||
|
||||
**Must NOT do**:
|
||||
- Do NOT implement the actual pooling logic here (that's Task 5)
|
||||
- Do NOT allow values > 10 (memory guardrail)
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `quick`
|
||||
- Reason: Single CLI option addition, boilerplate only
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 2 (with Task 3)
|
||||
- **Blocks**: Task 5
|
||||
- **Blocked By**: None
|
||||
|
||||
**References**:
|
||||
|
||||
**Pattern References**:
|
||||
- `calibrate_extrinsics.py:474-478` — Existing `--max-samples` option as pattern for optional integer CLI flag
|
||||
- `calibrate_extrinsics.py:431-436` — `--depth-mode` option pattern
|
||||
|
||||
**WHY Each Reference Matters**:
|
||||
- Shows the exact click option pattern and placement convention in this file
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] `uv run calibrate_extrinsics.py --help` shows `--depth-pool-size` with description
|
||||
- [ ] Default value is 1
|
||||
- [ ] Values outside 1-10 are rejected by click
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: CLI option appears in help
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. uv run calibrate_extrinsics.py --help
|
||||
2. Assert: output contains "--depth-pool-size"
|
||||
3. Assert: output contains "1=single best frame"
|
||||
Expected Result: Option visible with correct help text
|
||||
|
||||
Scenario: Invalid pool size rejected
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. uv run calibrate_extrinsics.py --depth-pool-size 0 --help 2>&1 || true
|
||||
2. Assert: output contains error or "Invalid value"
|
||||
Expected Result: Click rejects out-of-range value
|
||||
```
|
||||
|
||||
**Commit**: NO (groups with Task 5)
|
||||
|
||||
---
|
||||
|
||||
- [x] 5. Integrate pooling into `apply_depth_verify_refine_postprocess()`
|
||||
|
||||
**What to do**:
|
||||
- Modify `apply_depth_verify_refine_postprocess()` to accept `depth_pool_size: int = 1` parameter
|
||||
- When `depth_pool_size > 1` and multiple frames available:
|
||||
1. Extract depth_maps and confidence_maps from the top-N frame list
|
||||
2. Call `pool_depth_maps()` to produce pooled depth/confidence
|
||||
3. Use pooled maps for `verify_extrinsics_with_depth()` and `refine_extrinsics_with_depth()`
|
||||
4. Use the **best-scored frame's** `ids` for marker corner lookup (it has best detection quality)
|
||||
- When `depth_pool_size == 1` OR only 1 frame available:
|
||||
- Use existing single-frame path exactly (no pooling call)
|
||||
- Add pooling metadata to JSON output: `"depth_pool": {"pool_size_requested": N, "pool_size_actual": M, "pooled": true/false}`
|
||||
- Wire `depth_pool_size` from `main()` through to this function
|
||||
- Handle edge case: if pooling produces a map with fewer valid points than best single frame, log warning and fall back to single frame
|
||||
|
||||
**Must NOT do**:
|
||||
- Do NOT change `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` function signatures
|
||||
- Do NOT add new CLI output formats
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `unspecified-high`
|
||||
- Reason: Core integration task with multiple touchpoints; requires careful wiring and edge case handling
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: NO
|
||||
- **Parallel Group**: Sequential (after Wave 2)
|
||||
- **Blocks**: Tasks 6, 7
|
||||
- **Blocked By**: Tasks 1, 3, 4
|
||||
|
||||
**References**:
|
||||
|
||||
**Pattern References**:
|
||||
- `calibrate_extrinsics.py:118-258` — Full `apply_depth_verify_refine_postprocess()` function being modified
|
||||
- `calibrate_extrinsics.py:140-156` — Frame data extraction pattern (accessing `vf["frame"]`, `vf["ids"]`)
|
||||
- `calibrate_extrinsics.py:158-180` — Verification call pattern
|
||||
- `calibrate_extrinsics.py:182-245` — Refinement call pattern
|
||||
|
||||
**API/Type References**:
|
||||
- `aruco/depth_pool.py:pool_depth_maps()` — The pooling function (Task 1 output)
|
||||
- `aruco/depth_verify.py:119-179` — `verify_extrinsics_with_depth()` signature
|
||||
- `aruco/depth_refine.py:71-227` — `refine_extrinsics_with_depth()` signature
|
||||
|
||||
**WHY Each Reference Matters**:
|
||||
- `calibrate_extrinsics.py:140-156`: Shows how frame data is currently extracted; must adapt for list-of-frames
|
||||
- `depth_pool.py`: The function we're calling for multi-frame pooling
|
||||
- `depth_verify.py/depth_refine.py`: Confirms signatures remain unchanged (just pass different depth_map)
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] With `--depth-pool-size 1`: output JSON identical to baseline (no `depth_pool` metadata needed for N=1)
|
||||
- [ ] With `--depth-pool-size 5`: output JSON includes `depth_pool` metadata; verify/refine uses pooled maps
|
||||
- [ ] Fallback to single frame logged when pooling produces fewer valid points
|
||||
- [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: Pool size 1 produces baseline-equivalent output
|
||||
Tool: Bash
|
||||
Preconditions: output/ directory with SVO files
|
||||
Steps:
|
||||
1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --no-preview --max-samples 5 --depth-pool-size 1 --output output/_test_pool1.json
|
||||
2. Assert: exit code 0
|
||||
3. Assert: output/_test_pool1.json exists and contains depth_verify entries
|
||||
Expected Result: Runs cleanly, produces valid output
|
||||
|
||||
Scenario: Pool size 5 runs and includes pool metadata
|
||||
Tool: Bash
|
||||
Preconditions: output/ directory with SVO files
|
||||
Steps:
|
||||
1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --refine-depth --no-preview --max-samples 10 --depth-pool-size 5 --output output/_test_pool5.json
|
||||
2. Assert: exit code 0
|
||||
3. Parse output/_test_pool5.json
|
||||
4. Assert: at least one camera entry contains "depth_pool" key
|
||||
Expected Result: Pooling metadata present in output
|
||||
```
|
||||
|
||||
**Commit**: YES
|
||||
- Message: `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag`
|
||||
- Files: `calibrate_extrinsics.py`, `aruco/depth_pool.py`, `tests/test_depth_pool.py`
|
||||
- Pre-commit: `uv run pytest tests/test_depth_pool.py && uv run basedpyright calibrate_extrinsics.py`
|
||||
|
||||
---
|
||||
|
||||
- [x] 6. N=1 equivalence regression test
|
||||
|
||||
**What to do**:
|
||||
- Add test in `tests/test_depth_cli_postprocess.py` (or `tests/test_depth_pool.py`):
|
||||
- Create synthetic scenario with known depth maps and marker geometry
|
||||
- Run `apply_depth_verify_refine_postprocess()` with pool_size=1 using the old single-frame structure
|
||||
- Run with pool_size=1 using the new list-of-frames structure
|
||||
- Assert outputs are numerically identical (atol=0)
|
||||
- This proves the refactor preserves backward compatibility
|
||||
|
||||
**Must NOT do**:
|
||||
- No E2E CLI test here (that's Task 7)
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `quick`
|
||||
- Reason: Focused regression test with synthetic data
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 3 (with Task 7)
|
||||
- **Blocks**: None
|
||||
- **Blocked By**: Task 5
|
||||
|
||||
**References**:
|
||||
|
||||
**Test References**:
|
||||
- `tests/test_depth_cli_postprocess.py` — Existing integration test patterns
|
||||
- `tests/test_depth_verify.py:36-60` — Synthetic depth map creation pattern
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] `uv run pytest -k "pool_size_1_equivalence"` → passes
|
||||
- [ ] Test asserts exact numerical equality between old-path and new-path outputs
|
||||
|
||||
**Commit**: YES
|
||||
- Message: `test(calibrate): add N=1 equivalence regression test for depth pooling`
|
||||
- Files: `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py`
|
||||
|
||||
---
|
||||
|
||||
- [x] 7. E2E smoke comparison: pooled vs single-frame RMSE
|
||||
|
||||
**What to do**:
|
||||
- Run calibration on test SVOs with `--depth-pool-size 1` and `--depth-pool-size 5`
|
||||
- Compare:
|
||||
- Post-refinement RMSE per camera
|
||||
- Depth-normalized RMSE
|
||||
- CSV residual distribution (mean_abs, p50, p90)
|
||||
- Runtime (wall clock)
|
||||
- Document results in a brief summary (stdout or saved to a comparison file)
|
||||
- **Success criterion**: pooled RMSE ≤ single-frame RMSE for majority of cameras; runtime overhead < 25%
|
||||
|
||||
**Must NOT do**:
|
||||
- No automated pass/fail assertion on real data (metrics are directional, not deterministic)
|
||||
- No permanent benchmark infrastructure
|
||||
|
||||
**Recommended Agent Profile**:
|
||||
- **Category**: `quick`
|
||||
- Reason: Run two commands, compare JSON output, summarize
|
||||
- **Skills**: []
|
||||
|
||||
**Parallelization**:
|
||||
- **Can Run In Parallel**: YES
|
||||
- **Parallel Group**: Wave 3 (with Task 6)
|
||||
- **Blocks**: None
|
||||
- **Blocked By**: Task 5
|
||||
|
||||
**References**:
|
||||
|
||||
**Pattern References**:
|
||||
- Previous smoke runs in this session: `output/e2e_refine_depth_full_neural_plus.json` as baseline
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Both runs complete without error
|
||||
- [ ] Comparison summary printed showing per-camera RMSE for pool=1 vs pool=5
|
||||
- [ ] Runtime logged for both runs
|
||||
|
||||
**Agent-Executed QA Scenarios:**
|
||||
```
|
||||
Scenario: Compare pool=1 vs pool=5 on full SVOs
|
||||
Tool: Bash
|
||||
Steps:
|
||||
1. Run with --depth-pool-size 1 --verify-depth --refine-depth --output output/_compare_pool1.json
|
||||
2. Run with --depth-pool-size 5 --verify-depth --refine-depth --output output/_compare_pool5.json
|
||||
3. Parse both JSON files
|
||||
4. Print per-camera post RMSE comparison table
|
||||
5. Print runtime difference
|
||||
Expected Result: Both complete; comparison table printed
|
||||
Evidence: Terminal output captured
|
||||
```
|
||||
|
||||
**Commit**: NO (no code change; just verification)
|
||||
|
||||
---
|
||||
|
||||
## Commit Strategy
|
||||
|
||||
| After Task | Message | Files | Verification |
|
||||
|------------|---------|-------|--------------|
|
||||
| 1+2 | `feat(aruco): add pool_depth_maps utility with tests` | `aruco/depth_pool.py`, `tests/test_depth_pool.py` | `uv run pytest tests/test_depth_pool.py` |
|
||||
| 5 (includes 3+4) | `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag` | `calibrate_extrinsics.py` | `uv run pytest && uv run basedpyright` |
|
||||
| 6 | `test(calibrate): add N=1 equivalence regression test for depth pooling` | `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py` | `uv run pytest -k pool_size_1` |
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Verification Commands
|
||||
```bash
|
||||
uv run pytest tests/test_depth_pool.py -v # All pool unit tests pass
|
||||
uv run pytest -k "pool_size_1_equivalence" -v # N=1 regression passes
|
||||
uv run basedpyright # 0 new errors
|
||||
uv run calibrate_extrinsics.py --help | grep pool # CLI flag visible
|
||||
```
|
||||
|
||||
### Final Checklist
|
||||
- [x] `pool_depth_maps()` pure function exists with full edge case handling
|
||||
- [x] `--depth-pool-size` CLI option with default=1, max=10
|
||||
- [x] N=1 produces identical results to baseline
|
||||
- [x] All existing tests still pass
|
||||
- [x] Type checker clean
|
||||
- [x] E2E comparison shows pooled RMSE ≤ single-frame RMSE for majority of cameras
|
||||
Reference in New Issue
Block a user