25 KiB
Multi-Frame Depth Pooling for Extrinsic Calibration
TL;DR
Quick Summary: Replace single-best-frame depth verification/refinement with top-N temporal pooling to reduce noise sensitivity and improve calibration robustness, while keeping existing verify/refine function signatures untouched.
Deliverables:
- New
pool_depth_maps()utility function inaruco/depth_pool.py- Extended frame collection (top-N per camera) in main loop
- New
--depth-pool-sizeCLI option (default 1 = backward compatible)- Unit tests for pooling, fallback, and N=1 equivalence
- E2E smoke comparison (pooled vs single-frame RMSE)
Estimated Effort: Medium Parallel Execution: YES — 3 waves Critical Path: Task 1 → Task 3 → Task 5 → Task 7
Context
Original Request
User asked: "Is apply_depth_verify_refine_postprocess optimal? When depth_mode is not NONE, every frame computes depth regardless of whether it's used. Is there a better way to utilize every depth map when verify/refine is enabled?"
Interview Summary
Key Discussions:
- Oracle confirmed single-best-frame is simplicity-biased but leaves accuracy on the table
- Recommended top 3–5 frame temporal pooling with confidence gating
- Phased approach: quick win (pooling), medium (weighted selection), advanced (joint optimization)
Research Findings:
calibrate_extrinsics.py:682-714: Current loop stores exactly oneverification_frames[serial]per camera (best-scored)aruco/depth_verify.py:verify_extrinsics_with_depth()accepts singledepth_map+confidence_maparuco/depth_refine.py:refine_extrinsics_with_depth()accepts singledepth_map+confidence_maparuco/svo_sync.py:FrameData: Each frame already carriesdepth_map+confidence_map- Memory: each depth map is ~3.5MB (720×1280 float32); storing 5 per camera = ~17.5MB/cam, ~70MB total for 4 cameras — acceptable
- Existing tests use synthetic depth maps, so new tests can follow same pattern
Metis Review
Identified Gaps (addressed):
- Camera motion during capture → addressed via assumption that cameras are static during calibration; documented as guardrail
- "Top-N by score" may not correlate with depth quality → addressed by keeping confidence gating in pooling function
- Fewer than N frames available → addressed with explicit fallback behavior
- All pixels invalid after gating → addressed with fallback to best single frame
- N=1 must reproduce baseline exactly → addressed with explicit equivalence test
Work Objectives
Core Objective
Pool depth maps from the top-N scored frames per camera to produce a more robust single depth target for verification and refinement, reducing sensitivity to single-frame noise.
Concrete Deliverables
aruco/depth_pool.py— new module withpool_depth_maps()function- Modified
calibrate_extrinsics.py— top-N collection + pooling integration + CLI flag tests/test_depth_pool.py— unit tests for pooling logic- Updated
tests/test_depth_cli_postprocess.py— integration test for N=1 equivalence
Definition of Done
uv run pytest -k "depth_pool"→ all tests passuv run basedpyright→ 0 new errors--depth-pool-size 1produces identical output to current baseline--depth-pool-size 5produces equal or lower post-RMSE on test SVOs
Must Have
- Feature-flagged behind
--depth-pool-size(default 1) - Pure function
pool_depth_maps()with deterministic output - Confidence gating during pooling
- Graceful fallback when pooling fails (insufficient valid pixels)
- N=1 code path identical to current behavior
Must NOT Have (Guardrails)
- NO changes to
verify_extrinsics_with_depth()orrefine_extrinsics_with_depth()signatures - NO scoring function redesign (use existing
score_frame()as-is) - NO cross-camera fusion or spatial alignment/warping between frames
- NO GPU acceleration or threading changes
- NO new artifact files or dashboards
- NO "unbounded history" — enforce max pool size cap (10)
- NO optical flow, Kalman filters, or temporal alignment beyond frame selection
Verification Strategy (MANDATORY)
UNIVERSAL RULE: ZERO HUMAN INTERVENTION
ALL tasks in this plan MUST be verifiable WITHOUT any human action.
Test Decision
- Infrastructure exists: YES
- Automated tests: YES (Tests-after, matching existing pattern)
- Framework: pytest (via
uv run pytest)
Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
Verification Tool by Deliverable Type:
| Type | Tool | How Agent Verifies |
|---|---|---|
| Library/Module | Bash (uv run pytest) | Run targeted tests, compare output |
| CLI | Bash (uv run calibrate_extrinsics.py) | Run with flags, check JSON output |
| Type safety | Bash (uv run basedpyright) | Zero new errors |
Execution Strategy
Parallel Execution Waves
Wave 1 (Start Immediately):
├── Task 1: Create pool_depth_maps() utility
└── Task 2: Unit tests for pool_depth_maps()
Wave 2 (After Wave 1):
├── Task 3: Extend main loop to collect top-N frames
├── Task 4: Add --depth-pool-size CLI option
└── Task 5: Integrate pooling into postprocess function
Wave 3 (After Wave 2):
├── Task 6: N=1 equivalence regression test
└── Task 7: E2E smoke comparison (pooled vs single-frame)
Dependency Matrix
| Task | Depends On | Blocks | Can Parallelize With |
|---|---|---|---|
| 1 | None | 2, 3, 5 | 2 |
| 2 | 1 | None | 1 |
| 3 | 1 | 5, 6 | 4 |
| 4 | None | 5 | 3 |
| 5 | 1, 3, 4 | 6, 7 | None |
| 6 | 5 | None | 7 |
| 7 | 5 | None | 6 |
TODOs
-
1. Create
pool_depth_maps()utility inaruco/depth_pool.pyWhat to do:
- Create new file
aruco/depth_pool.py - Implement
pool_depth_maps(depth_maps: list[np.ndarray], confidence_maps: list[np.ndarray | None], confidence_thresh: float = 50.0, min_valid_count: int = 1) -> tuple[np.ndarray, np.ndarray | None] - Algorithm:
- Stack depth maps along new axis → shape (N, H, W)
- For each pixel position, mask invalid values (NaN, inf, ≤ 0) AND confidence-rejected pixels (conf > thresh)
- Compute per-pixel median across valid frames → pooled depth
- For confidence: compute per-pixel minimum (most confident) across frames → pooled confidence
- Pixels with <
min_valid_countvalid observations → set to NaN in pooled depth
- Handle edge cases:
- Empty input list → raise ValueError
- Single map (N=1) → return copy of input (exact equivalence path)
- All maps invalid at a pixel → NaN in output
- Shape mismatch across maps → raise ValueError
- Mixed None confidence maps → pool only non-None, or return None if all None
- Add type hints, docstring with Args/Returns
Must NOT do:
- No weighted mean (median is more robust to outliers; keep simple for Phase 1)
- No spatial alignment or warping
Recommended Agent Profile:
- Category:
quick- Reason: Single focused module, pure function, no complex dependencies
- Skills: []
- No special skills needed; standard Python/numpy work
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Task 2)
- Blocks: Tasks 2, 3, 5
- Blocked By: None
References:
Pattern References:
aruco/depth_verify.py:39-79—compute_depth_residual()shows how invalid depth is handled (NaN, ≤0, window median pattern)aruco/depth_verify.py:27-36—get_confidence_weight()shows confidence semantics (ZED: 1=most confident, 100=least; threshold default 50)
API/Type References:
aruco/svo_sync.py:10-18—FrameDatadataclass:depth_map: np.ndarray | None,confidence_map: np.ndarray | None
Test References:
tests/test_depth_verify.py:36-60— Pattern for creating synthetic depth maps and testing residual computation
WHY Each Reference Matters:
depth_verify.py:39-79: Defines the invalid-depth encoding convention (NaN/≤0) that pooling must respectdepth_verify.py:27-36: Defines confidence semantics and threshold convention; pooling gating must matchsvo_sync.py:10-18: Defines the data types the pooling function will receive
Acceptance Criteria:
- File
aruco/depth_pool.pyexists withpool_depth_maps()function - Function handles N=1 by returning exact copy of input
- Function raises ValueError on empty input or shape mismatch
uv run basedpyright aruco/depth_pool.py→ 0 errors
Agent-Executed QA Scenarios:
Scenario: Module imports without error Tool: Bash Steps: 1. uv run python -c "from aruco.depth_pool import pool_depth_maps; print('OK')" 2. Assert: stdout contains "OK" Expected Result: Clean importCommit: YES
- Message:
feat(aruco): add pool_depth_maps utility for multi-frame depth pooling - Files:
aruco/depth_pool.py
- Create new file
-
2. Unit tests for
pool_depth_maps()What to do:
- Create
tests/test_depth_pool.py - Test cases:
- Single map (N=1): output equals input exactly
- Two maps, clean: median of two values at each pixel
- Three maps with NaN: median ignores NaN pixels correctly
- Confidence gating: pixels above threshold excluded from median
- All invalid at pixel: output is NaN
- Empty input: raises ValueError
- Shape mismatch: raises ValueError
- min_valid_count: pixel with fewer valid observations → NaN
- None confidence maps: graceful handling (pools depth only, returns None confidence)
- Use
numpy.testing.assert_allclosefor numerical checks - Use
pytest.raises(ValueError, match=...)for error cases
Must NOT do:
- No integration with calibrate_extrinsics.py yet (unit tests only)
Recommended Agent Profile:
- Category:
quick- Reason: Focused test file creation following existing patterns
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Task 1)
- Blocks: None
- Blocked By: Task 1
References:
Test References:
tests/test_depth_verify.py:36-60— Pattern for synthetic depth map creation and assertion styletests/test_depth_refine.py:10-18— Pattern for roundtrip/equivalence testing
WHY Each Reference Matters:
- Shows the exact assertion patterns and synthetic data conventions used in this codebase
Acceptance Criteria:
uv run pytest tests/test_depth_pool.py -v→ all tests pass- At least 9 test cases covering the enumerated scenarios
Agent-Executed QA Scenarios:
Scenario: All pool tests pass Tool: Bash Steps: 1. uv run pytest tests/test_depth_pool.py -v 2. Assert: exit code 0 3. Assert: output contains "passed" with 0 "failed" Expected Result: All tests greenCommit: YES (groups with Task 1)
- Message:
test(aruco): add unit tests for pool_depth_maps - Files:
tests/test_depth_pool.py
- Create
-
3. Extend main loop to collect top-N frames per camera
What to do:
- In
calibrate_extrinsics.py, modify the verification frame collection (lines ~682-714):- Change
verification_framesfromdict[serial, single_frame_dict]todict[serial, list[frame_dict]] - Maintain list sorted by score (descending), truncated to
depth_pool_size - Use
heapqor sorted insertion to keep top-N efficiently - When
depth_pool_size == 1, behavior must be identical to current (store only best)
- Change
- Update all downstream references to
verification_framesthat assume single-frame structure - The
first_framesdict remains unchanged (it's for benchmarking, separate concern)
Must NOT do:
- Do NOT change the scoring function
score_frame() - Do NOT change
FrameDatastructure - Do NOT store frames outside the sampled loop (only collect from frames that already have depth)
Recommended Agent Profile:
- Category:
unspecified-low- Reason: Surgical modification to existing loop logic; requires careful attention to existing consumers
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 4)
- Blocks: Tasks 5, 6
- Blocked By: Task 1
References:
Pattern References:
calibrate_extrinsics.py:620-760— Main loop where verification frames are collected; lines 682-714 are the critical sectioncalibrate_extrinsics.py:118-258—apply_depth_verify_refine_postprocess()which consumesverification_frames
API/Type References:
aruco/svo_sync.py:10-18—FrameDatastructure that's stored in verification_frames
WHY Each Reference Matters:
calibrate_extrinsics.py:682-714: This is the exact code being modified; must understand score comparison and dict storagecalibrate_extrinsics.py:118-258: Must understand howverification_framesis consumed downstream to know what structure changes are safe
Acceptance Criteria:
verification_frames[serial]is now a list of frame dicts, sorted by score descending- List length ≤
depth_pool_sizefor each camera - When
depth_pool_size == 1, list has exactly one element matching current best-frame behavior uv run basedpyright calibrate_extrinsics.py→ 0 new errors
Agent-Executed QA Scenarios:
Scenario: Top-N collection works with pool size 3 Tool: Bash Steps: 1. uv run python -c " # Verify the data structure change is correct by inspecting types import ast, inspect # If this imports without error, structure is consistent from calibrate_extrinsics import apply_depth_verify_refine_postprocess print('OK') " 2. Assert: stdout contains "OK" Expected Result: No import errors from structural changesCommit: NO (groups with Task 5)
- In
-
4. Add
--depth-pool-sizeCLI optionWhat to do:
- Add click option to
main()incalibrate_extrinsics.py:@click.option( "--depth-pool-size", default=1, type=click.IntRange(min=1, max=10), help="Number of top-scored frames to pool for depth verification/refinement (1=single best frame, >1=median pooling).", ) - Pass through to function signature
- Add to
apply_depth_verify_refine_postprocess()parameters (or passdepth_pool_sizeto control pooling) - Update help text for
--depth-modeif needed to mention pooling interaction
Must NOT do:
- Do NOT implement the actual pooling logic here (that's Task 5)
- Do NOT allow values > 10 (memory guardrail)
Recommended Agent Profile:
- Category:
quick- Reason: Single CLI option addition, boilerplate only
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Task 3)
- Blocks: Task 5
- Blocked By: None
References:
Pattern References:
calibrate_extrinsics.py:474-478— Existing--max-samplesoption as pattern for optional integer CLI flagcalibrate_extrinsics.py:431-436—--depth-modeoption pattern
WHY Each Reference Matters:
- Shows the exact click option pattern and placement convention in this file
Acceptance Criteria:
uv run calibrate_extrinsics.py --helpshows--depth-pool-sizewith description- Default value is 1
- Values outside 1-10 are rejected by click
Agent-Executed QA Scenarios:
Scenario: CLI option appears in help Tool: Bash Steps: 1. uv run calibrate_extrinsics.py --help 2. Assert: output contains "--depth-pool-size" 3. Assert: output contains "1=single best frame" Expected Result: Option visible with correct help text Scenario: Invalid pool size rejected Tool: Bash Steps: 1. uv run calibrate_extrinsics.py --depth-pool-size 0 --help 2>&1 || true 2. Assert: output contains error or "Invalid value" Expected Result: Click rejects out-of-range valueCommit: NO (groups with Task 5)
- Add click option to
-
5. Integrate pooling into
apply_depth_verify_refine_postprocess()What to do:
- Modify
apply_depth_verify_refine_postprocess()to acceptdepth_pool_size: int = 1parameter - When
depth_pool_size > 1and multiple frames available:- Extract depth_maps and confidence_maps from the top-N frame list
- Call
pool_depth_maps()to produce pooled depth/confidence - Use pooled maps for
verify_extrinsics_with_depth()andrefine_extrinsics_with_depth() - Use the best-scored frame's
idsfor marker corner lookup (it has best detection quality)
- When
depth_pool_size == 1OR only 1 frame available:- Use existing single-frame path exactly (no pooling call)
- Add pooling metadata to JSON output:
"depth_pool": {"pool_size_requested": N, "pool_size_actual": M, "pooled": true/false} - Wire
depth_pool_sizefrommain()through to this function - Handle edge case: if pooling produces a map with fewer valid points than best single frame, log warning and fall back to single frame
Must NOT do:
- Do NOT change
verify_extrinsics_with_depth()orrefine_extrinsics_with_depth()function signatures - Do NOT add new CLI output formats
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Core integration task with multiple touchpoints; requires careful wiring and edge case handling
- Skills: []
Parallelization:
- Can Run In Parallel: NO
- Parallel Group: Sequential (after Wave 2)
- Blocks: Tasks 6, 7
- Blocked By: Tasks 1, 3, 4
References:
Pattern References:
calibrate_extrinsics.py:118-258— Fullapply_depth_verify_refine_postprocess()function being modifiedcalibrate_extrinsics.py:140-156— Frame data extraction pattern (accessingvf["frame"],vf["ids"])calibrate_extrinsics.py:158-180— Verification call patterncalibrate_extrinsics.py:182-245— Refinement call pattern
API/Type References:
aruco/depth_pool.py:pool_depth_maps()— The pooling function (Task 1 output)aruco/depth_verify.py:119-179—verify_extrinsics_with_depth()signaturearuco/depth_refine.py:71-227—refine_extrinsics_with_depth()signature
WHY Each Reference Matters:
calibrate_extrinsics.py:140-156: Shows how frame data is currently extracted; must adapt for list-of-framesdepth_pool.py: The function we're calling for multi-frame poolingdepth_verify.py/depth_refine.py: Confirms signatures remain unchanged (just pass different depth_map)
Acceptance Criteria:
- With
--depth-pool-size 1: output JSON identical to baseline (nodepth_poolmetadata needed for N=1) - With
--depth-pool-size 5: output JSON includesdepth_poolmetadata; verify/refine uses pooled maps - Fallback to single frame logged when pooling produces fewer valid points
uv run basedpyright calibrate_extrinsics.py→ 0 new errors
Agent-Executed QA Scenarios:
Scenario: Pool size 1 produces baseline-equivalent output Tool: Bash Preconditions: output/ directory with SVO files Steps: 1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --no-preview --max-samples 5 --depth-pool-size 1 --output output/_test_pool1.json 2. Assert: exit code 0 3. Assert: output/_test_pool1.json exists and contains depth_verify entries Expected Result: Runs cleanly, produces valid output Scenario: Pool size 5 runs and includes pool metadata Tool: Bash Preconditions: output/ directory with SVO files Steps: 1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --refine-depth --no-preview --max-samples 10 --depth-pool-size 5 --output output/_test_pool5.json 2. Assert: exit code 0 3. Parse output/_test_pool5.json 4. Assert: at least one camera entry contains "depth_pool" key Expected Result: Pooling metadata present in outputCommit: YES
- Message:
feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag - Files:
calibrate_extrinsics.py,aruco/depth_pool.py,tests/test_depth_pool.py - Pre-commit:
uv run pytest tests/test_depth_pool.py && uv run basedpyright calibrate_extrinsics.py
- Modify
-
6. N=1 equivalence regression test
What to do:
- Add test in
tests/test_depth_cli_postprocess.py(ortests/test_depth_pool.py):- Create synthetic scenario with known depth maps and marker geometry
- Run
apply_depth_verify_refine_postprocess()with pool_size=1 using the old single-frame structure - Run with pool_size=1 using the new list-of-frames structure
- Assert outputs are numerically identical (atol=0)
- This proves the refactor preserves backward compatibility
Must NOT do:
- No E2E CLI test here (that's Task 7)
Recommended Agent Profile:
- Category:
quick- Reason: Focused regression test with synthetic data
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Task 7)
- Blocks: None
- Blocked By: Task 5
References:
Test References:
tests/test_depth_cli_postprocess.py— Existing integration test patternstests/test_depth_verify.py:36-60— Synthetic depth map creation pattern
Acceptance Criteria:
uv run pytest -k "pool_size_1_equivalence"→ passes- Test asserts exact numerical equality between old-path and new-path outputs
Commit: YES
- Message:
test(calibrate): add N=1 equivalence regression test for depth pooling - Files:
tests/test_depth_pool.pyortests/test_depth_cli_postprocess.py
- Add test in
-
7. E2E smoke comparison: pooled vs single-frame RMSE
What to do:
- Run calibration on test SVOs with
--depth-pool-size 1and--depth-pool-size 5 - Compare:
- Post-refinement RMSE per camera
- Depth-normalized RMSE
- CSV residual distribution (mean_abs, p50, p90)
- Runtime (wall clock)
- Document results in a brief summary (stdout or saved to a comparison file)
- Success criterion: pooled RMSE ≤ single-frame RMSE for majority of cameras; runtime overhead < 25%
Must NOT do:
- No automated pass/fail assertion on real data (metrics are directional, not deterministic)
- No permanent benchmark infrastructure
Recommended Agent Profile:
- Category:
quick- Reason: Run two commands, compare JSON output, summarize
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Task 6)
- Blocks: None
- Blocked By: Task 5
References:
Pattern References:
- Previous smoke runs in this session:
output/e2e_refine_depth_full_neural_plus.jsonas baseline
Acceptance Criteria:
- Both runs complete without error
- Comparison summary printed showing per-camera RMSE for pool=1 vs pool=5
- Runtime logged for both runs
Agent-Executed QA Scenarios:
Scenario: Compare pool=1 vs pool=5 on full SVOs Tool: Bash Steps: 1. Run with --depth-pool-size 1 --verify-depth --refine-depth --output output/_compare_pool1.json 2. Run with --depth-pool-size 5 --verify-depth --refine-depth --output output/_compare_pool5.json 3. Parse both JSON files 4. Print per-camera post RMSE comparison table 5. Print runtime difference Expected Result: Both complete; comparison table printed Evidence: Terminal output capturedCommit: NO (no code change; just verification)
- Run calibration on test SVOs with
Commit Strategy
| After Task | Message | Files | Verification |
|---|---|---|---|
| 1+2 | feat(aruco): add pool_depth_maps utility with tests |
aruco/depth_pool.py, tests/test_depth_pool.py |
uv run pytest tests/test_depth_pool.py |
| 5 (includes 3+4) | feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag |
calibrate_extrinsics.py |
uv run pytest && uv run basedpyright |
| 6 | test(calibrate): add N=1 equivalence regression test for depth pooling |
tests/test_depth_pool.py or tests/test_depth_cli_postprocess.py |
uv run pytest -k pool_size_1 |
Success Criteria
Verification Commands
uv run pytest tests/test_depth_pool.py -v # All pool unit tests pass
uv run pytest -k "pool_size_1_equivalence" -v # N=1 regression passes
uv run basedpyright # 0 new errors
uv run calibrate_extrinsics.py --help | grep pool # CLI flag visible
Final Checklist
pool_depth_maps()pure function exists with full edge case handling--depth-pool-sizeCLI option with default=1, max=10- N=1 produces identical results to baseline
- All existing tests still pass
- Type checker clean
- E2E comparison shows pooled RMSE ≤ single-frame RMSE for majority of cameras