crosstyan/zed-playground

Fork 0

Files

T

crosstyan 511994e3a8 chore: checkpoint ground-plane calibration refinement work

2026-02-09 10:02:48 +00:00

25 KiB

Raw Blame History

Multi-Frame Depth Pooling for Extrinsic Calibration

TL;DR

Quick Summary: Replace single-best-frame depth verification/refinement with top-N temporal pooling to reduce noise sensitivity and improve calibration robustness, while keeping existing verify/refine function signatures untouched.

Deliverables:

New pool_depth_maps() utility function in aruco/depth_pool.py

Extended frame collection (top-N per camera) in main loop

New --depth-pool-size CLI option (default 1 = backward compatible)

Unit tests for pooling, fallback, and N=1 equivalence

E2E smoke comparison (pooled vs single-frame RMSE)

Estimated Effort: Medium Parallel Execution: YES — 3 waves Critical Path: Task 1 → Task 3 → Task 5 → Task 7

Context

Original Request

User asked: "Is apply_depth_verify_refine_postprocess optimal? When depth_mode is not NONE, every frame computes depth regardless of whether it's used. Is there a better way to utilize every depth map when verify/refine is enabled?"

Interview Summary

Key Discussions:

Oracle confirmed single-best-frame is simplicity-biased but leaves accuracy on the table
Recommended top 3–5 frame temporal pooling with confidence gating
Phased approach: quick win (pooling), medium (weighted selection), advanced (joint optimization)

Research Findings:

calibrate_extrinsics.py:682-714: Current loop stores exactly one verification_frames[serial] per camera (best-scored)
aruco/depth_verify.py: verify_extrinsics_with_depth() accepts single depth_map + confidence_map
aruco/depth_refine.py: refine_extrinsics_with_depth() accepts single depth_map + confidence_map
aruco/svo_sync.py:FrameData: Each frame already carries depth_map + confidence_map
Memory: each depth map is ~3.5MB (720×1280 float32); storing 5 per camera = ~17.5MB/cam, ~70MB total for 4 cameras — acceptable
Existing tests use synthetic depth maps, so new tests can follow same pattern

Metis Review

Identified Gaps (addressed):

Camera motion during capture → addressed via assumption that cameras are static during calibration; documented as guardrail
"Top-N by score" may not correlate with depth quality → addressed by keeping confidence gating in pooling function
Fewer than N frames available → addressed with explicit fallback behavior
All pixels invalid after gating → addressed with fallback to best single frame
N=1 must reproduce baseline exactly → addressed with explicit equivalence test

Work Objectives

Core Objective

Pool depth maps from the top-N scored frames per camera to produce a more robust single depth target for verification and refinement, reducing sensitivity to single-frame noise.

Concrete Deliverables

aruco/depth_pool.py — new module with pool_depth_maps() function
Modified calibrate_extrinsics.py — top-N collection + pooling integration + CLI flag
tests/test_depth_pool.py — unit tests for pooling logic
Updated tests/test_depth_cli_postprocess.py — integration test for N=1 equivalence

Definition of Done

uv run pytest -k "depth_pool" → all tests pass
uv run basedpyright → 0 new errors
--depth-pool-size 1 produces identical output to current baseline
--depth-pool-size 5 produces equal or lower post-RMSE on test SVOs

Must Have

Feature-flagged behind --depth-pool-size (default 1)
Pure function pool_depth_maps() with deterministic output
Confidence gating during pooling
Graceful fallback when pooling fails (insufficient valid pixels)
N=1 code path identical to current behavior

Must NOT Have (Guardrails)

NO changes to verify_extrinsics_with_depth() or refine_extrinsics_with_depth() signatures
NO scoring function redesign (use existing score_frame() as-is)
NO cross-camera fusion or spatial alignment/warping between frames
NO GPU acceleration or threading changes
NO new artifact files or dashboards
NO "unbounded history" — enforce max pool size cap (10)
NO optical flow, Kalman filters, or temporal alignment beyond frame selection

Verification Strategy (MANDATORY)

UNIVERSAL RULE: ZERO HUMAN INTERVENTION

ALL tasks in this plan MUST be verifiable WITHOUT any human action.

Test Decision

Infrastructure exists: YES
Automated tests: YES (Tests-after, matching existing pattern)
Framework: pytest (via uv run pytest)

Agent-Executed QA Scenarios (MANDATORY — ALL tasks)

Verification Tool by Deliverable Type:

Type	Tool	How Agent Verifies
Library/Module	Bash (uv run pytest)	Run targeted tests, compare output
CLI	Bash (uv run calibrate_extrinsics.py)	Run with flags, check JSON output
Type safety	Bash (uv run basedpyright)	Zero new errors

Execution Strategy

Parallel Execution Waves

Wave 1 (Start Immediately):
├── Task 1: Create pool_depth_maps() utility
└── Task 2: Unit tests for pool_depth_maps()

Wave 2 (After Wave 1):
├── Task 3: Extend main loop to collect top-N frames
├── Task 4: Add --depth-pool-size CLI option
└── Task 5: Integrate pooling into postprocess function

Wave 3 (After Wave 2):
├── Task 6: N=1 equivalence regression test
└── Task 7: E2E smoke comparison (pooled vs single-frame)

Dependency Matrix

Task	Depends On	Blocks	Can Parallelize With
1	None	2, 3, 5	2
2	1	None	1
3	1	5, 6	4
4	None	5	3
5	1, 3, 4	6, 7	None
6	5	None	7
7	5	None	6

TODOs

1. Create pool_depth_maps() utility in aruco/depth_pool.py

What to do:
- Create new file aruco/depth_pool.py
- Implement pool_depth_maps(depth_maps: list[np.ndarray], confidence_maps: list[np.ndarray | None], confidence_thresh: float = 50.0, min_valid_count: int = 1) -> tuple[np.ndarray, np.ndarray | None]
- Algorithm:
  1. Stack depth maps along new axis → shape (N, H, W)
  2. For each pixel position, mask invalid values (NaN, inf, ≤ 0) AND confidence-rejected pixels (conf > thresh)
  3. Compute per-pixel median across valid frames → pooled depth
  4. For confidence: compute per-pixel minimum (most confident) across frames → pooled confidence
  5. Pixels with < min_valid_count valid observations → set to NaN in pooled depth
- Handle edge cases:
  - Empty input list → raise ValueError
  - Single map (N=1) → return copy of input (exact equivalence path)
  - All maps invalid at a pixel → NaN in output
  - Shape mismatch across maps → raise ValueError
  - Mixed None confidence maps → pool only non-None, or return None if all None
- Add type hints, docstring with Args/Returns
Must NOT do:
- No weighted mean (median is more robust to outliers; keep simple for Phase 1)
- No spatial alignment or warping
Recommended Agent Profile:
- Category: quick
  - Reason: Single focused module, pure function, no complex dependencies
- Skills: []
  - No special skills needed; standard Python/numpy work
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Task 2)
- Blocks: Tasks 2, 3, 5
- Blocked By: None
References:

Pattern References:
- aruco/depth_verify.py:39-79 — compute_depth_residual() shows how invalid depth is handled (NaN, ≤0, window median pattern)
- aruco/depth_verify.py:27-36 — get_confidence_weight() shows confidence semantics (ZED: 1=most confident, 100=least; threshold default 50)
API/Type References:
- aruco/svo_sync.py:10-18 — FrameData dataclass: depth_map: np.ndarray | None, confidence_map: np.ndarray | None
Test References:
- tests/test_depth_verify.py:36-60 — Pattern for creating synthetic depth maps and testing residual computation
WHY Each Reference Matters:
- depth_verify.py:39-79: Defines the invalid-depth encoding convention (NaN/≤0) that pooling must respect
- depth_verify.py:27-36: Defines confidence semantics and threshold convention; pooling gating must match
- svo_sync.py:10-18: Defines the data types the pooling function will receive
Acceptance Criteria:
- File aruco/depth_pool.py exists with pool_depth_maps() function
- Function handles N=1 by returning exact copy of input
- Function raises ValueError on empty input or shape mismatch
- uv run basedpyright aruco/depth_pool.py → 0 errors
Agent-Executed QA Scenarios:
```
Scenario: Module imports without error
  Tool: Bash
  Steps:
    1. uv run python -c "from aruco.depth_pool import pool_depth_maps; print('OK')"
    2. Assert: stdout contains "OK"
  Expected Result: Clean import
```
Commit: YES
- Message: feat(aruco): add pool_depth_maps utility for multi-frame depth pooling
- Files: aruco/depth_pool.py

2. Unit tests for pool_depth_maps()

What to do:
- Create tests/test_depth_pool.py
- Test cases:
  1. Single map (N=1): output equals input exactly
  2. Two maps, clean: median of two values at each pixel
  3. Three maps with NaN: median ignores NaN pixels correctly
  4. Confidence gating: pixels above threshold excluded from median
  5. All invalid at pixel: output is NaN
  6. Empty input: raises ValueError
  7. Shape mismatch: raises ValueError
  8. min_valid_count: pixel with fewer valid observations → NaN
  9. None confidence maps: graceful handling (pools depth only, returns None confidence)
- Use numpy.testing.assert_allclose for numerical checks
- Use pytest.raises(ValueError, match=...) for error cases
Must NOT do:
- No integration with calibrate_extrinsics.py yet (unit tests only)
Recommended Agent Profile:
- Category: quick
  - Reason: Focused test file creation following existing patterns
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Task 1)
- Blocks: None
- Blocked By: Task 1
References:

Test References:
- tests/test_depth_verify.py:36-60 — Pattern for synthetic depth map creation and assertion style
- tests/test_depth_refine.py:10-18 — Pattern for roundtrip/equivalence testing
WHY Each Reference Matters:
- Shows the exact assertion patterns and synthetic data conventions used in this codebase
Acceptance Criteria:
- uv run pytest tests/test_depth_pool.py -v → all tests pass
- At least 9 test cases covering the enumerated scenarios
Agent-Executed QA Scenarios:
```
Scenario: All pool tests pass
  Tool: Bash
  Steps:
    1. uv run pytest tests/test_depth_pool.py -v
    2. Assert: exit code 0
    3. Assert: output contains "passed" with 0 "failed"
  Expected Result: All tests green
```
Commit: YES (groups with Task 1)
- Message: test(aruco): add unit tests for pool_depth_maps
- Files: tests/test_depth_pool.py

3. Extend main loop to collect top-N frames per camera

What to do:
- In calibrate_extrinsics.py, modify the verification frame collection (lines ~682-714):
  - Change verification_frames from dict[serial, single_frame_dict] to dict[serial, list[frame_dict]]
  - Maintain list sorted by score (descending), truncated to depth_pool_size
  - Use heapq or sorted insertion to keep top-N efficiently
  - When depth_pool_size == 1, behavior must be identical to current (store only best)
- Update all downstream references to verification_frames that assume single-frame structure
- The first_frames dict remains unchanged (it's for benchmarking, separate concern)
Must NOT do:
- Do NOT change the scoring function score_frame()
- Do NOT change FrameData structure
- Do NOT store frames outside the sampled loop (only collect from frames that already have depth)
Recommended Agent Profile:
- Category: unspecified-low
  - Reason: Surgical modification to existing loop logic; requires careful attention to existing consumers
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 4)
- Blocks: Tasks 5, 6
- Blocked By: Task 1
References:

Pattern References:
- calibrate_extrinsics.py:620-760 — Main loop where verification frames are collected; lines 682-714 are the critical section
- calibrate_extrinsics.py:118-258 — apply_depth_verify_refine_postprocess() which consumes verification_frames
API/Type References:
- aruco/svo_sync.py:10-18 — FrameData structure that's stored in verification_frames
WHY Each Reference Matters:
- calibrate_extrinsics.py:682-714: This is the exact code being modified; must understand score comparison and dict storage
- calibrate_extrinsics.py:118-258: Must understand how verification_frames is consumed downstream to know what structure changes are safe
Acceptance Criteria:
- verification_frames[serial] is now a list of frame dicts, sorted by score descending
- List length ≤ depth_pool_size for each camera
- When depth_pool_size == 1, list has exactly one element matching current best-frame behavior
- uv run basedpyright calibrate_extrinsics.py → 0 new errors
Agent-Executed QA Scenarios:
```
Scenario: Top-N collection works with pool size 3
  Tool: Bash
  Steps:
    1. uv run python -c "
       # Verify the data structure change is correct by inspecting types
       import ast, inspect
       # If this imports without error, structure is consistent
       from calibrate_extrinsics import apply_depth_verify_refine_postprocess
       print('OK')
       "
    2. Assert: stdout contains "OK"
  Expected Result: No import errors from structural changes
```
Commit: NO (groups with Task 5)

4. Add --depth-pool-size CLI option

What to do:
- Add click option to main() in calibrate_extrinsics.py:
```
@click.option(
    "--depth-pool-size",
    default=1,
    type=click.IntRange(min=1, max=10),
    help="Number of top-scored frames to pool for depth verification/refinement (1=single best frame, >1=median pooling).",
)
```
- Pass through to function signature
- Add to apply_depth_verify_refine_postprocess() parameters (or pass depth_pool_size to control pooling)
- Update help text for --depth-mode if needed to mention pooling interaction
Must NOT do:
- Do NOT implement the actual pooling logic here (that's Task 5)
- Do NOT allow values > 10 (memory guardrail)
Recommended Agent Profile:
- Category: quick
  - Reason: Single CLI option addition, boilerplate only
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Task 3)
- Blocks: Task 5
- Blocked By: None
References:

Pattern References:
- calibrate_extrinsics.py:474-478 — Existing --max-samples option as pattern for optional integer CLI flag
- calibrate_extrinsics.py:431-436 — --depth-mode option pattern
WHY Each Reference Matters:
- Shows the exact click option pattern and placement convention in this file
Acceptance Criteria:
- uv run calibrate_extrinsics.py --help shows --depth-pool-size with description
- Default value is 1
- Values outside 1-10 are rejected by click
Agent-Executed QA Scenarios:
```
Scenario: CLI option appears in help
  Tool: Bash
  Steps:
    1. uv run calibrate_extrinsics.py --help
    2. Assert: output contains "--depth-pool-size"
    3. Assert: output contains "1=single best frame"
  Expected Result: Option visible with correct help text

Scenario: Invalid pool size rejected
  Tool: Bash
  Steps:
    1. uv run calibrate_extrinsics.py --depth-pool-size 0 --help 2>&1 || true
    2. Assert: output contains error or "Invalid value"
  Expected Result: Click rejects out-of-range value
```
Commit: NO (groups with Task 5)

5. Integrate pooling into apply_depth_verify_refine_postprocess()

What to do:
- Modify apply_depth_verify_refine_postprocess() to accept depth_pool_size: int = 1 parameter
- When depth_pool_size > 1 and multiple frames available:
  1. Extract depth_maps and confidence_maps from the top-N frame list
  2. Call pool_depth_maps() to produce pooled depth/confidence
  3. Use pooled maps for verify_extrinsics_with_depth() and refine_extrinsics_with_depth()
  4. Use the best-scored frame's ids for marker corner lookup (it has best detection quality)
- When depth_pool_size == 1 OR only 1 frame available:
  - Use existing single-frame path exactly (no pooling call)
- Add pooling metadata to JSON output: "depth_pool": {"pool_size_requested": N, "pool_size_actual": M, "pooled": true/false}
- Wire depth_pool_size from main() through to this function
- Handle edge case: if pooling produces a map with fewer valid points than best single frame, log warning and fall back to single frame
Must NOT do:
- Do NOT change verify_extrinsics_with_depth() or refine_extrinsics_with_depth() function signatures
- Do NOT add new CLI output formats
Recommended Agent Profile:
- Category: unspecified-high
  - Reason: Core integration task with multiple touchpoints; requires careful wiring and edge case handling
- Skills: []
Parallelization:
- Can Run In Parallel: NO
- Parallel Group: Sequential (after Wave 2)
- Blocks: Tasks 6, 7
- Blocked By: Tasks 1, 3, 4
References:

Pattern References:
- calibrate_extrinsics.py:118-258 — Full apply_depth_verify_refine_postprocess() function being modified
- calibrate_extrinsics.py:140-156 — Frame data extraction pattern (accessing vf["frame"], vf["ids"])
- calibrate_extrinsics.py:158-180 — Verification call pattern
- calibrate_extrinsics.py:182-245 — Refinement call pattern
API/Type References:
- aruco/depth_pool.py:pool_depth_maps() — The pooling function (Task 1 output)
- aruco/depth_verify.py:119-179 — verify_extrinsics_with_depth() signature
- aruco/depth_refine.py:71-227 — refine_extrinsics_with_depth() signature
WHY Each Reference Matters:
- calibrate_extrinsics.py:140-156: Shows how frame data is currently extracted; must adapt for list-of-frames
- depth_pool.py: The function we're calling for multi-frame pooling
- depth_verify.py/depth_refine.py: Confirms signatures remain unchanged (just pass different depth_map)
Acceptance Criteria:
- With --depth-pool-size 1: output JSON identical to baseline (no depth_pool metadata needed for N=1)
- With --depth-pool-size 5: output JSON includes depth_pool metadata; verify/refine uses pooled maps
- Fallback to single frame logged when pooling produces fewer valid points
- uv run basedpyright calibrate_extrinsics.py → 0 new errors
Agent-Executed QA Scenarios:
```
Scenario: Pool size 1 produces baseline-equivalent output
  Tool: Bash
  Preconditions: output/ directory with SVO files
  Steps:
    1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --no-preview --max-samples 5 --depth-pool-size 1 --output output/_test_pool1.json
    2. Assert: exit code 0
    3. Assert: output/_test_pool1.json exists and contains depth_verify entries
  Expected Result: Runs cleanly, produces valid output

Scenario: Pool size 5 runs and includes pool metadata
  Tool: Bash
  Preconditions: output/ directory with SVO files
  Steps:
    1. uv run calibrate_extrinsics.py -s output/ -m aruco/markers/standard_box_markers_600mm.parquet --aruco-dictionary DICT_APRILTAG_36h11 --verify-depth --refine-depth --no-preview --max-samples 10 --depth-pool-size 5 --output output/_test_pool5.json
    2. Assert: exit code 0
    3. Parse output/_test_pool5.json
    4. Assert: at least one camera entry contains "depth_pool" key
  Expected Result: Pooling metadata present in output
```
Commit: YES
- Message: feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag
- Files: calibrate_extrinsics.py, aruco/depth_pool.py, tests/test_depth_pool.py
- Pre-commit: uv run pytest tests/test_depth_pool.py && uv run basedpyright calibrate_extrinsics.py

6. N=1 equivalence regression test

What to do:
- Add test in tests/test_depth_cli_postprocess.py (or tests/test_depth_pool.py):
  - Create synthetic scenario with known depth maps and marker geometry
  - Run apply_depth_verify_refine_postprocess() with pool_size=1 using the old single-frame structure
  - Run with pool_size=1 using the new list-of-frames structure
  - Assert outputs are numerically identical (atol=0)
- This proves the refactor preserves backward compatibility
Must NOT do:
- No E2E CLI test here (that's Task 7)
Recommended Agent Profile:
- Category: quick
  - Reason: Focused regression test with synthetic data
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Task 7)
- Blocks: None
- Blocked By: Task 5
References:

Test References:
- tests/test_depth_cli_postprocess.py — Existing integration test patterns
- tests/test_depth_verify.py:36-60 — Synthetic depth map creation pattern
Acceptance Criteria:
- uv run pytest -k "pool_size_1_equivalence" → passes
- Test asserts exact numerical equality between old-path and new-path outputs
Commit: YES
- Message: test(calibrate): add N=1 equivalence regression test for depth pooling
- Files: tests/test_depth_pool.py or tests/test_depth_cli_postprocess.py

7. E2E smoke comparison: pooled vs single-frame RMSE

What to do:
- Run calibration on test SVOs with --depth-pool-size 1 and --depth-pool-size 5
- Compare:
  - Post-refinement RMSE per camera
  - Depth-normalized RMSE
  - CSV residual distribution (mean_abs, p50, p90)
  - Runtime (wall clock)
- Document results in a brief summary (stdout or saved to a comparison file)
- Success criterion: pooled RMSE ≤ single-frame RMSE for majority of cameras; runtime overhead < 25%
Must NOT do:
- No automated pass/fail assertion on real data (metrics are directional, not deterministic)
- No permanent benchmark infrastructure
Recommended Agent Profile:
- Category: quick
  - Reason: Run two commands, compare JSON output, summarize
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Task 6)
- Blocks: None
- Blocked By: Task 5
References:

Pattern References:
- Previous smoke runs in this session: output/e2e_refine_depth_full_neural_plus.json as baseline
Acceptance Criteria:
- Both runs complete without error
- Comparison summary printed showing per-camera RMSE for pool=1 vs pool=5
- Runtime logged for both runs
Agent-Executed QA Scenarios:
```
Scenario: Compare pool=1 vs pool=5 on full SVOs
  Tool: Bash
  Steps:
    1. Run with --depth-pool-size 1 --verify-depth --refine-depth --output output/_compare_pool1.json
    2. Run with --depth-pool-size 5 --verify-depth --refine-depth --output output/_compare_pool5.json
    3. Parse both JSON files
    4. Print per-camera post RMSE comparison table
    5. Print runtime difference
  Expected Result: Both complete; comparison table printed
  Evidence: Terminal output captured
```
Commit: NO (no code change; just verification)

Commit Strategy

After Task	Message	Files	Verification
1+2	`feat(aruco): add pool_depth_maps utility with tests`	`aruco/depth_pool.py`, `tests/test_depth_pool.py`	`uv run pytest tests/test_depth_pool.py`
5 (includes 3+4)	`feat(calibrate): integrate multi-frame depth pooling with --depth-pool-size flag`	`calibrate_extrinsics.py`	`uv run pytest && uv run basedpyright`
6	`test(calibrate): add N=1 equivalence regression test for depth pooling`	`tests/test_depth_pool.py` or `tests/test_depth_cli_postprocess.py`	`uv run pytest -k pool_size_1`

Success Criteria

Verification Commands

uv run pytest tests/test_depth_pool.py -v           # All pool unit tests pass
uv run pytest -k "pool_size_1_equivalence" -v        # N=1 regression passes
uv run basedpyright                                   # 0 new errors
uv run calibrate_extrinsics.py --help | grep pool    # CLI flag visible

Final Checklist

pool_depth_maps() pure function exists with full edge case handling
--depth-pool-size CLI option with default=1, max=10
N=1 produces identical results to baseline
All existing tests still pass
Type checker clean
E2E comparison shows pooled RMSE ≤ single-frame RMSE for majority of cameras

25 KiB Raw Blame History Unescape Escape