Files
zed-playground/py_workspace/.sisyphus/plans/multi-frame-depth-pooling.md
T
crosstyan 6113d0e1f3 feat(tooling): add extrinsics visualizer and close depth-pooling plan
Finalize multi-frame depth pooling execution tracking with fully verified plan checkboxes and add a Y-up/bird-eye extrinsics visualizer with pose-convention auto detection for calibration sanity checks.
2026-02-07 09:14:34 +00:00

615 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Multi-Frame Depth Pooling for Extrinsic Calibration
## TL;DR
> **Quick Summary**: Replace single-best-frame depth verification/refinement with top-N temporal pooling to reduce noise sensitivity and improve calibration robustness, while keeping existing verify/refine function signatures untouched.
>
> **Deliverables**:
> - New `pool_depth_maps()` utility function in `aruco/depth_pool.py`
> - Extended frame collection (top-N per camera) in main loop
> - New `--depth-pool-size` CLI option (default 1 = backward compatible)
> - Unit tests for pooling, fallback, and N=1 equivalence
> - E2E smoke comparison (pooled vs single-frame RMSE)
>
> **Estimated Effort**: Medium
> **Parallel Execution**: YES — 3 waves
> **Critical Path**: Task 1 → Task 3 → Task 5 → Task 7
---
## Context
### Original Request
User asked: "Is `apply_depth_verify_refine_postprocess` optimal? When `depth_mode` is not NONE, every frame computes depth regardless of whether it's used. Is there a better way to utilize every depth map when verify/refine is enabled?"
### Interview Summary
**Key Discussions**:
- Oracle confirmed single-best-frame is simplicity-biased but leaves accuracy on the table
- Recommended top 35 frame temporal pooling with confidence gating
- Phased approach: quick win (pooling), medium (weighted selection), advanced (joint optimization)
**Research Findings**:
- `calibrate_extrinsics.py:682-714`: Current loop stores exactly one `verification_frames[serial]` per camera (best-scored)
- `aruco/depth_verify.py`: `verify_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
- `aruco/depth_refine.py`: `refine_extrinsics_with_depth()` accepts single `depth_map` + `confidence_map`
- `aruco/svo_sync.py:FrameData`: Each frame already carries `depth_map` + `confidence_map`
- Memory: each depth map is ~3.5MB (720×1280 float32); storing 5 per camera = ~17.5MB/cam, ~70MB total for 4 cameras — acceptable
- Existing tests use synthetic depth maps, so new tests can follow same pattern
### Metis Review
**Identified Gaps** (addressed):
- Camera motion during capture → addressed via assumption that cameras are static during calibration; documented as guardrail
- "Top-N by score" may not correlate with depth quality → addressed by keeping confidence gating in pooling function
- Fewer than N frames available → addressed with explicit fallback behavior
- All pixels invalid after gating → addressed with fallback to best single frame
- N=1 must reproduce baseline exactly → addressed with explicit equivalence test
---
## Work Objectives
### Core Objective
Pool depth maps from the top-N scored frames per camera to produce a more robust single depth target for verification and refinement, reducing sensitivity to single-frame noise.
### Concrete Deliverables
- `aruco/depth_pool.py` — new module with `pool_depth_maps()` function
- Modified `calibrate_extrinsics.py` — top-N collection + pooling integration + CLI flag
- `tests/test_depth_pool.py` — unit tests for pooling logic
- Updated `tests/test_depth_cli_postprocess.py` — integration test for N=1 equivalence
### Definition of Done
- [x] `uv run pytest -k "depth_pool"` → all tests pass
- [x] `uv run basedpyright` → 0 new errors
- [x] `--depth-pool-size 1` produces identical output to current baseline
- [x] `--depth-pool-size 5` produces equal or lower post-RMSE on test SVOs
### Must Have
- Feature-flagged behind `--depth-pool-size` (default 1)
- Pure function `pool_depth_maps()` with deterministic output
- Confidence gating during pooling
- Graceful fallback when pooling fails (insufficient valid pixels)
- N=1 code path identical to current behavior
### Must NOT Have (Guardrails)
- NO changes to `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` signatures
- NO scoring function redesign (use existing `score_frame()` as-is)
- NO cross-camera fusion or spatial alignment/warping between frames
- NO GPU acceleration or threading changes
- NO new artifact files or dashboards
- NO "unbounded history" — enforce max pool size cap (10)
- NO optical flow, Kalman filters, or temporal alignment beyond frame selection
---
## Verification Strategy (MANDATORY)
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
>
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
### Test Decision
- **Infrastructure exists**: YES
- **Automated tests**: YES (Tests-after, matching existing pattern)
- **Framework**: pytest (via `uv run pytest`)
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
**Verification Tool by Deliverable Type:**
| Type | Tool | How Agent Verifies |
|------|------|-------------------|
| Library/Module | Bash (uv run pytest) | Run targeted tests, compare output |
| CLI | Bash (uv run calibrate_extrinsics.py) | Run with flags, check JSON output |
| Type safety | Bash (uv run basedpyright) | Zero new errors |
---
## Execution Strategy
### Parallel Execution Waves
```
Wave 1 (Start Immediately):
├── Task 1: Create pool_depth_maps() utility
└── Task 2: Unit tests for pool_depth_maps()
Wave 2 (After Wave 1):
├── Task 3: Extend main loop to collect top-N frames
├── Task 4: Add --depth-pool-size CLI option
└── Task 5: Integrate pooling into postprocess function
Wave 3 (After Wave 2):
├── Task 6: N=1 equivalence regression test
└── Task 7: E2E smoke comparison (pooled vs single-frame)
```
### Dependency Matrix
| Task | Depends On | Blocks | Can Parallelize With |
|------|------------|--------|---------------------|
| 1 | None | 2, 3, 5 | 2 |
| 2 | 1 | None | 1 |
| 3 | 1 | 5, 6 | 4 |
| 4 | None | 5 | 3 |
| 5 | 1, 3, 4 | 6, 7 | None |
| 6 | 5 | None | 7 |
| 7 | 5 | None | 6 |
---
## TODOs
- [x] 1. Create `pool_depth_maps()` utility in `aruco/depth_pool.py`
**What to do**:
- Create new file `aruco/depth_pool.py`
- Implement `pool_depth_maps(depth_maps: list[np.ndarray], confidence_maps: list[np.ndarray | None], confidence_thresh: float = 50.0, min_valid_count: int = 1) -> tuple[np.ndarray, np.ndarray | None]`
- Algorithm:
1. Stack depth maps along new axis → shape (N, H, W)
2. For each pixel position, mask invalid values (NaN, inf, ≤ 0) AND confidence-rejected pixels (conf > thresh)
3. Compute per-pixel **median** across valid frames → pooled depth
4. For confidence: compute per-pixel **minimum** (most confident) across frames → pooled confidence
5. Pixels with < `min_valid_count` valid observations → set to NaN in pooled depth
- Handle edge cases:
- Empty input list → raise ValueError
- Single map (N=1) → return copy of input (exact equivalence path)
- All maps invalid at a pixel → NaN in output
- Shape mismatch across maps → raise ValueError
- Mixed None confidence maps → pool only non-None, or return None if all None
- Add type hints, docstring with Args/Returns
**Must NOT do**:
- No weighted mean (median is more robust to outliers; keep simple for Phase 1)
- No spatial alignment or warping
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Single focused module, pure function, no complex dependencies
- **Skills**: []
- No special skills needed; standard Python/numpy work
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 1 (with Task 2)
- **Blocks**: Tasks 2, 3, 5
- **Blocked By**: None
**References**:
**Pattern References**:
- `aruco/depth_verify.py:39-79``compute_depth_residual()` shows how invalid depth is handled (NaN, ≤0, window median pattern)
- `aruco/depth_verify.py:27-36``get_confidence_weight()` shows confidence semantics (ZED: 1=most confident, 100=least; threshold default 50)
**API/Type References**:
- `aruco/svo_sync.py:10-18``FrameData` dataclass: `depth_map: np.ndarray | None`, `confidence_map: np.ndarray | None`
**Test References**:
- `tests/test_depth_verify.py:36-60` — Pattern for creating synthetic depth maps and testing residual computation
**WHY Each Reference Matters**:
- `depth_verify.py:39-79`: Defines the invalid-depth encoding convention (NaN/≤0) that pooling must respect
- `depth_verify.py:27-36`: Defines confidence semantics and threshold convention; pooling gating must match
- `svo_sync.py:10-18`: Defines the data types the pooling function will receive
**Acceptance Criteria**:
- [ ] File `aruco/depth_pool.py` exists with `pool_depth_maps()` function
- [ ] Function handles N=1 by returning exact copy of input
- [ ] Function raises ValueError on empty input or shape mismatch
- [ ] `uv run basedpyright aruco/depth_pool.py` → 0 errors
**Agent-Executed QA Scenarios:**
```
Scenario: Module imports without error
Tool: Bash
Steps:
1. uv run python -c "from aruco.depth_pool import pool_depth_maps; print('OK')"
2. Assert: stdout contains "OK"
Expected Result: Clean import
```
**Commit**: YES
- Message: `feat(aruco): add pool_depth_maps utility for multi-frame depth pooling`
- Files: `aruco/depth_pool.py`
---
- [x] 2. Unit tests for `pool_depth_maps()`
**What to do**:
- Create `tests/test_depth_pool.py`
- Test cases:
1. **Single map (N=1)**: output equals input exactly
2. **Two maps, clean**: median of two values at each pixel
3. **Three maps with NaN**: median ignores NaN pixels correctly
4. **Confidence gating**: pixels above threshold excluded from median
5. **All invalid at pixel**: output is NaN
6. **Empty input**: raises ValueError
7. **Shape mismatch**: raises ValueError
8. **min_valid_count**: pixel with fewer valid observations → NaN
9. **None confidence maps**: graceful handling (pools depth only, returns None confidence)
- Use `numpy.testing.assert_allclose` for numerical checks
- Use `pytest.raises(ValueError, match=...)` for error cases
**Must NOT do**:
- No integration with calibrate_extrinsics.py yet (unit tests only)
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Focused test file creation following existing patterns
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 1 (with Task 1)
- **Blocks**: None
- **Blocked By**: Task 1
**References**:
**Test References**:
- `tests/test_depth_verify.py:36-60` — Pattern for synthetic depth map creation and assertion style
- `tests/test_depth_refine.py:10-18` — Pattern for roundtrip/equivalence testing
**WHY Each Reference Matters**:
- Shows the exact assertion patterns and synthetic data conventions used in this codebase
**Acceptance Criteria**:
- [ ] `uv run pytest tests/test_depth_pool.py -v` → all tests pass
- [ ] At least 9 test cases covering the enumerated scenarios
**Agent-Executed QA Scenarios:**
```
Scenario: All pool tests pass
Tool: Bash
Steps:
1. uv run pytest tests/test_depth_pool.py -v
2. Assert: exit code 0
3. Assert: output contains "passed" with 0 "failed"
Expected Result: All tests green
```
**Commit**: YES (groups with Task 1)
- Message: `test(aruco): add unit tests for pool_depth_maps`
- Files: `tests/test_depth_pool.py`
---
- [x] 3. Extend main loop to collect top-N frames per camera
**What to do**:
- In `calibrate_extrinsics.py`, modify the verification frame collection (lines ~682-714):
- Change `verification_frames` from `dict[serial, single_frame_dict]` to `dict[serial, list[frame_dict]]`
- Maintain list sorted by score (descending), truncated to `depth_pool_size`
- Use `heapq` or sorted insertion to keep top-N efficiently
- When `depth_pool_size == 1`, behavior must be identical to current (store only best)
- Update all downstream references to `verification_frames` that assume single-frame structure
- The `first_frames` dict remains unchanged (it's for benchmarking, separate concern)
**Must NOT do**:
- Do NOT change the scoring function `score_frame()`
- Do NOT change `FrameData` structure
- Do NOT store frames outside the sampled loop (only collect from frames that already have depth)
**Recommended Agent Profile**:
- **Category**: `unspecified-low`
- Reason: Surgical modification to existing loop logic; requires careful attention to existing consumers
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 2 (with Tasks 4)
- **Blocks**: Tasks 5, 6
- **Blocked By**: Task 1
**References**:
**Pattern References**:
- `calibrate_extrinsics.py:620-760` — Main loop where verification frames are collected; lines 682-714 are the critical section
- `calibrate_extrinsics.py:118-258` — `apply_depth_verify_refine_postprocess()` which consumes `verification_frames`
**API/Type References**:
- `aruco/svo_sync.py:10-18` — `FrameData` structure that's stored in verification_frames
**WHY Each Reference Matters**:
- `calibrate_extrinsics.py:682-714`: This is the exact code being modified; must understand score comparison and dict storage
- `calibrate_extrinsics.py:118-258`: Must understand how `verification_frames` is consumed downstream to know what structure changes are safe
**Acceptance Criteria**:
- [ ] `verification_frames[serial]` is now a list of frame dicts, sorted by score descending
- [ ] List length ≤ `depth_pool_size` for each camera
- [ ] When `depth_pool_size == 1`, list has exactly one element matching current best-frame behavior
- [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
**Agent-Executed QA Scenarios:**
```
Scenario: Top-N collection works with pool size 3
Tool: Bash
Steps:
1. uv run python -c "
# Verify the data structure change is correct by inspecting types
import ast, inspect
# If this imports without error, structure is consistent
from calibrate_extrinsics import apply_depth_verify_refine_postprocess
print('OK')
"
2. Assert: stdout contains "OK"
Expected Result: No import errors from structural changes
```
**Commit**: NO (groups with Task 5)
---
- [x] 4. Add `--depth-pool-size` CLI option
**What to do**:
- Add click option to `main()` in `calibrate_extrinsics.py`:
```python
@click.option(
"--depth-pool-size",
default=1,
type=click.IntRange(min=1, max=10),
help="Number of top-scored frames to pool for depth verification/refinement (1=single best frame, >1=median pooling).",
)
```
- Pass through to function signature
- Add to `apply_depth_verify_refine_postprocess()` parameters (or pass `depth_pool_size` to control pooling)
- Update help text for `--depth-mode` if needed to mention pooling interaction
**Must NOT do**:
- Do NOT implement the actual pooling logic here (that's Task 5)
- Do NOT allow values > 10 (memory guardrail)
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Single CLI option addition, boilerplate only
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 2 (with Task 3)
- **Blocks**: Task 5
- **Blocked By**: None
**References**:
**Pattern References**:
- `calibrate_extrinsics.py:474-478` — Existing `--max-samples` option as pattern for optional integer CLI flag
- `calibrate_extrinsics.py:431-436` — `--depth-mode` option pattern
**WHY Each Reference Matters**:
- Shows the exact click option pattern and placement convention in this file
**Acceptance Criteria**:
- [ ] `uv run calibrate_extrinsics.py --help` shows `--depth-pool-size` with description
- [ ] Default value is 1
- [ ] Values outside 1-10 are rejected by click
**Agent-Executed QA Scenarios:**
```
Scenario: CLI option appears in help
Tool: Bash
Steps:
1. uv run calibrate_extrinsics.py --help
2. Assert: output contains "--depth-pool-size"
3. Assert: output contains "1=single best frame"
Expected Result: Option visible with correct help text
Scenario: Invalid pool size rejected
Tool: Bash
Steps:
1. uv run calibrate_extrinsics.py --depth-pool-size 0 --help 2>&1 || true
2. Assert: output contains error or "Invalid value"
Expected Result: Click rejects out-of-range value
```
**Commit**: NO (groups with Task 5)
---
- [x] 5. Integrate pooling into `apply_depth_verify_refine_postprocess()`
**What to do**:
- Modify `apply_depth_verify_refine_postprocess()` to accept `depth_pool_size: int = 1` parameter
- When `depth_pool_size > 1` and multiple frames available:
1. Extract depth_maps and confidence_maps from the top-N frame list
2. Call `pool_depth_maps()` to produce pooled depth/confidence
3. Use pooled maps for `verify_extrinsics_with_depth()` and `refine_extrinsics_with_depth()`
4. Use the **best-scored frame's** `ids` for marker corner lookup (it has best detection quality)
- When `depth_pool_size == 1` OR only 1 frame available:
- Use existing single-frame path exactly (no pooling call)
- Add pooling metadata to JSON output: `"depth_pool": {"pool_size_requested": N, "pool_size_actual": M, "pooled": true/false}`
- Wire `depth_pool_size` from `main()` through to this function
- Handle edge case: if pooling produces a map with fewer valid points than best single frame, log warning and fall back to single frame
**Must NOT do**:
- Do NOT change `verify_extrinsics_with_depth()` or `refine_extrinsics_with_depth()` function signatures
- Do NOT add new CLI output formats
**Recommended Agent Profile**:
- **Category**: `unspecified-high`
- Reason: Core integration task with multiple touchpoints; requires careful wiring and edge case handling
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: NO
- **Parallel Group**: Sequential (after Wave 2)
- **Blocks**: Tasks 6, 7
- **Blocked By**: Tasks 1, 3, 4
**References**:
**Pattern References**:
- `calibrate_extrinsics.py:118-258` — Full `apply_depth_verify_refine_postprocess()` function being modified
- `calibrate_extrinsics.py:140-156` — Frame data extraction pattern (accessing `vf["frame"]`, `vf["ids"]`)
- `calibrate_extrinsics.py:158-180` — Verification call pattern
- `calibrate_extrinsics.py:182-245` — Refinement call pattern
**API/Type References**:
- `aruco/depth_pool.py:pool_depth_maps()` — The pooling function (Task 1 output)
- `aruco/depth_verify.py:119-179` — `verify_extrinsics_with_depth()` signature
- `aruco/depth_refine.py:71-227` — `refine_extrinsics_with_depth()` signature
**WHY Each Reference Matters**:
- `calibrate_extrinsics.py:140-156`: Shows how frame data is currently extracted; must adapt for list-of-frames
- `depth_pool.py`: The function we're calling for multi-frame pooling
- `depth_verify.py/depth_refine.py`: Confirms signatures remain unchanged (just pass different depth_map)
**Acceptance Criteria**:
- [ ] With `--depth-pool-size 1`: output JSON identical to baseline (no `depth_pool` metadata needed for N=1)
- [ ] With `--depth-pool-size 5`: output JSON includes `depth_pool` metadata; verify/refine uses pooled maps
- [ ] Fallback to single frame logged when pooling produces fewer valid points
- [ ] `uv run basedpyright calibrate_extrinsics.py` → 0 new errors
**Agent-Executed QA Scenarios:**
```
Scenario: Pool size 1 produces baseline-equivalent output
Tool: Bash
Preconditions: output/ directory with SVO files
Steps:
1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --no-preview --max-samples 5 --depth-pool-size 1 --output output/_test_pool1.json
2. Assert: exit code 0
3. Assert: output/_test_pool1.json exists and contains depth_verify entries
Expected Result: Runs cleanly, produces valid output
Scenario: Pool size 5 runs and includes pool metadata
Tool: Bash
Preconditions: output/ directory with SVO files
Steps:
1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --refine-depth --no-preview --max-samples 10 --depth-pool-size 5 --output output/_test_pool5.json
2. Assert: exit code 0
3. Parse output/_test_pool5.json
4. Assert: at least one camera entry contains "depth_pool" key
Expected Result: Pooling metadata present in output
```
**Commit**: YES
- Message: `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag`
- Files: `calibrate_extrinsics.py`, `aruco/depth_pool.py`, `tests/test_depth_pool.py`
- Pre-commit: `uv run pytest tests/test_depth_pool.py && uv run basedpyright calibrate_extrinsics.py`
---
- [x] 6. N=1 equivalence regression test
**What to do**:
- Add test in `tests/test_depth_cli_postprocess.py` (or `tests/test_depth_pool.py`):
- Create synthetic scenario with known depth maps and marker geometry
- Run `apply_depth_verify_refine_postprocess()` with pool_size=1 using the old single-frame structure
- Run with pool_size=1 using the new list-of-frames structure
- Assert outputs are numerically identical (atol=0)
- This proves the refactor preserves backward compatibility
**Must NOT do**:
- No E2E CLI test here (that's Task 7)
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Focused regression test with synthetic data
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 3 (with Task 7)
- **Blocks**: None
- **Blocked By**: Task 5
**References**:
**Test References**:
- `tests/test_depth_cli_postprocess.py` — Existing integration test patterns
- `tests/test_depth_verify.py:36-60` — Synthetic depth map creation pattern
**Acceptance Criteria**:
- [ ] `uv run pytest -k "pool_size_1_equivalence"` → passes
- [ ] Test asserts exact numerical equality between old-path and new-path outputs
**Commit**: YES
- Message: `test(calibrate): add N=1 equivalence regression test for depth pooling`
- Files: `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py`
---
- [x] 7. E2E smoke comparison: pooled vs single-frame RMSE
**What to do**:
- Run calibration on test SVOs with `--depth-pool-size 1` and `--depth-pool-size 5`
- Compare:
- Post-refinement RMSE per camera
- Depth-normalized RMSE
- CSV residual distribution (mean_abs, p50, p90)
- Runtime (wall clock)
- Document results in a brief summary (stdout or saved to a comparison file)
- **Success criterion**: pooled RMSE ≤ single-frame RMSE for majority of cameras; runtime overhead < 25%
**Must NOT do**:
- No automated pass/fail assertion on real data (metrics are directional, not deterministic)
- No permanent benchmark infrastructure
**Recommended Agent Profile**:
- **Category**: `quick`
- Reason: Run two commands, compare JSON output, summarize
- **Skills**: []
**Parallelization**:
- **Can Run In Parallel**: YES
- **Parallel Group**: Wave 3 (with Task 6)
- **Blocks**: None
- **Blocked By**: Task 5
**References**:
**Pattern References**:
- Previous smoke runs in this session: `output/e2e_refine_depth_full_neural_plus.json` as baseline
**Acceptance Criteria**:
- [ ] Both runs complete without error
- [ ] Comparison summary printed showing per-camera RMSE for pool=1 vs pool=5
- [ ] Runtime logged for both runs
**Agent-Executed QA Scenarios:**
```
Scenario: Compare pool=1 vs pool=5 on full SVOs
Tool: Bash
Steps:
1. Run with --depth-pool-size 1 --verify-depth --refine-depth --output output/_compare_pool1.json
2. Run with --depth-pool-size 5 --verify-depth --refine-depth --output output/_compare_pool5.json
3. Parse both JSON files
4. Print per-camera post RMSE comparison table
5. Print runtime difference
Expected Result: Both complete; comparison table printed
Evidence: Terminal output captured
```
**Commit**: NO (no code change; just verification)
---
## Commit Strategy
| After Task | Message | Files | Verification |
|------------|---------|-------|--------------|
| 1+2 | `feat(aruco): add pool_depth_maps utility with tests` | `aruco/depth_pool.py`, `tests/test_depth_pool.py` | `uv run pytest tests/test_depth_pool.py` |
| 5 (includes 3+4) | `feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag` | `calibrate_extrinsics.py` | `uv run pytest && uv run basedpyright` |
| 6 | `test(calibrate): add N=1 equivalence regression test for depth pooling` | `tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py` | `uv run pytest -k pool_size_1` |
---
## Success Criteria
### Verification Commands
```bash
uv run pytest tests/test_depth_pool.py -v # All pool unit tests pass
uv run pytest -k "pool_size_1_equivalence" -v # N=1 regression passes
uv run basedpyright # 0 new errors
uv run calibrate_extrinsics.py --help | grep pool # CLI flag visible
```
### Final Checklist
- [x] `pool_depth_maps()` pure function exists with full edge case handling
- [x] `--depth-pool-size` CLI option with default=1, max=10
- [x] N=1 produces identical results to baseline
- [x] All existing tests still pass
- [x] Type checker clean
- [x] E2E comparison shows pooled RMSE ≤ single-frame RMSE for majority of cameras