refactor: things

2026-03-06 17:17:59 +08:00
parent 8c6087683f
commit 33ab1a5d9d
171 changed files with 293 additions and 29894 deletions
@@ -0,0 +1,108 @@
+# Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO
+
+## Requirements (confirmed)
+
+### Goal
+Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in `inside_network.json`.
+
+### Calibration Target
+- **Type**: 3D box with 6 diamond board faces
+- **Object points**: Defined in `aruco/output/standard_box_markers.parquet`
+- **Marker dictionary**: `DICT_4X4_50` (from existing code)
+- **Minimum markers per frame**: 4+ (one diamond face worth)
+
+### Input
+- Multiple SVO2 files (one per camera)
+- Frame sampling: Fixed interval + quality filter
+- Timestamp-aligned playback (using existing `svo_playback.py` pattern)
+
+### Output
+- **New JSON file** with calibrated extrinsics
+- Format: Similar to `inside_network.json` but with accurate `pose` field
+- Reference frame: **Marker is world origin** (all cameras expressed relative to ArUco box)
+
+### Workflow
+- **CLI with preview**: Command-line driven but shows visualization of detected markers
+- Example: `uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json`
+
+## Technical Decisions
+
+### Intrinsics Source
+- Use ZED SDK's pre-calibrated intrinsics from `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam`
+- Properties: `fx, fy, cx, cy, disto`
+
+### Pose Estimation
+- Use `cv2.solvePnP` with `SOLVEPNP_SQPNP` flag (from existing code)
+- Consider `solvePnPRansac` for per-frame robustness
+
+### Outlier Handling (Two-stage)
+1. **Per-frame rejection**: Reject frames with high reprojection error (threshold ~2-5 pixels)
+2. **RANSAC on pose set**: After collecting all valid poses, use RANSAC-style consensus
+
+### Pose Averaging
+- **Rotation**: Use `scipy.spatial.transform.Rotation.mean()` for geodesic mean
+- **Translation**: Use median or weighted mean with MAD-based outlier rejection
+
+### Math: Camera-to-World Transform
+Each camera sees marker → `T_cam_marker` (camera-to-marker)
+World origin = marker, so camera pose in world = `T_world_cam = inv(T_cam_marker)`
+
+For camera i: `T_world_cam_i = inv(T_cam_i_marker)`
+
+## Research Findings
+
+### From Librarian (Multi-camera calibration)
+- Relative transform: `T_BA = T_BM @ inv(T_AM)`
+- Board-based detection improves robustness to occlusion
+- Use `refineDetectedMarkers` for corner accuracy
+- Handle missing views by only computing poses when enough markers visible
+
+### From Librarian (Robust averaging)
+- Use `scipy.spatial.transform.Rotation.mean(weights=...)` for rotation averaging
+- Median/MAD on translation for outlier rejection
+- RANSAC over pose set with rotation angle + translation distance thresholds
+- Practical thresholds: rotation >2-5°, translation depends on scale
+
+### Existing Codebase Patterns
+- `find_extrinsic_object.py`: ArUco detection + solvePnP pattern
+- `svo_playback.py`: Multi-SVO sync via timestamp alignment
+- `aruco_box.py`: Diamond board geometry generation
+
+## Open Questions
+- None remaining
+
+## Metis Gap Analysis (Addressed)
+
+### Critical Gaps Resolved:
+1. **World frame**: As defined in `standard_box_markers.parquet` (origin at box coordinate system)
+2. **Image stream**: Use rectified LEFT view (no distortion coefficients needed)
+3. **Transform convention**: Match `inside_network.json` format - appears to be T_world_from_cam (camera pose in world)
+   - Format: space-separated 4x4 matrix, row-major
+4. **Sync tolerance**: Moderate (<33ms, 1 frame at 30fps)
+
+### Guardrails Added:
+- Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
+- Use reprojection error as primary quality metric
+- Require ≥4 markers with sufficient 3D spread (not just coplanar)
+- Whitelist only expected marker IDs (from parquet)
+- Add self-check mode with quantitative quality report
+
+## Scope Boundaries
+
+### INCLUDE
+- SVO file loading with timestamp sync
+- ArUco detection on left camera image
+- Pose estimation using solvePnP
+- Per-frame quality filtering (reprojection error)
+- Multi-frame pose averaging with outlier rejection
+- JSON output with 4x4 pose matrices
+- Preview visualization showing detected markers and axes
+- CLI interface with click
+
+### EXCLUDE
+- Right camera processing (use left only for simplicity)
+- Intrinsic calibration (use pre-calibrated from ZED SDK)
+- Modifying `inside_network.json` in-place
+- GUI-based frame selection
+- Bundle adjustment refinement
+- Depth-based verification
@@ -0,0 +1,55 @@
+# Draft: Depth-Based Extrinsic Verification/Fusion
+
+## Requirements (confirmed)
+
+- **Primary Goal**: Both verify AND refine extrinsics using depth data
+- **Integration**: Add to existing `calibrate_extrinsics.py` CLI (new flags)
+- **Depth Mode**: CLI argument with default to NEURAL_PLUS (or NEURAL)
+- **Target Geometry**: Any markers (from parquet file), not just ArUco box
+
+## Technical Decisions
+
+- Use ZED SDK `retrieve_measure(MEASURE.DEPTH)` for depth maps
+- Extend `SVOReader` to optionally enable depth mode
+- Compute depth residuals at detected marker corner positions
+- Use residual statistics for verification metrics
+- ICP or optimization for refinement (if requested)
+
+## Research Findings
+
+### Depth Residual Formula
+For 3D point P_world with camera extrinsics (R, t):
+```
+P_cam = R @ P_world + t
+z_predicted = P_cam[2]
+(u, v) = project(P_cam, K)
+z_measured = depth_map[v, u]
+residual = z_measured - z_predicted
+```
+
+### Verification Metrics
+- Mean absolute residual
+- RMSE
+- Depth-normalized error: |r| / z_pred
+- Spatial bias detection (residual vs pixel position)
+
+### Refinement Approach
+- ICP (Iterative Closest Point) on depth points near markers
+- Point-to-plane ICP for better convergence
+- Initialize with ArUco pose, refine with depth
+
+## User Decisions (Round 2)
+
+- **Refinement Method**: Direct optimization (minimize depth residuals to adjust extrinsics)
+- **Verification Output**: Full reporting (console + JSON + optional CSV)
+- **Depth Filtering**: Confidence-based (use ZED confidence threshold + range limits)
+
+## Open Questions
+
+- Test strategy: TDD or tests after?
+- Minimum markers/frames for reliable depth verification?
+
+## Scope Boundaries
+
+- INCLUDE: Depth retrieval, residual computation, verification metrics, optional ICP refinement
+- EXCLUDE: Bundle adjustment, SLAM, right camera processing
@@ -0,0 +1,3 @@
+# Draft: SUPERSEDED
+
+This draft has been superseded by the final plan at `.sisyphus/plans/depth-refinement-robust.md`.
@@ -0,0 +1,79 @@
+# Draft: Ground Plane Refinement & Depth Map Persistence
+
+## Requirements (confirmed)
+- **Core problem**: Camera disagreement — different cameras don't agree on where the ground is (floor at different heights/angles)
+- **Depth saving**: Save BOTH pooled depth maps AND raw best-scored frames per camera, so pooling parameters can be re-tuned without re-reading SVOs
+- **Integration**: Post-processing step — a new standalone CLI tool that loads existing extrinsics + saved depth data and refines
+- **Library**: TBD — user wants to understand trade-offs before committing
+
+## Technical Decisions
+- Post-processing approach: non-invasive, loads existing calibration JSON + depth data
+- Depth saving happens inside calibrate_extrinsics.py (or triggered by flag)
+- Ground refinement tool is a NEW script (e.g., `refine_ground_plane.py`)
+
+## Research Findings
+- **Current alignment.py**: Aligns world frame based on marker face normals, NOT actual floor geometry
+- **Current depth_pool.py**: Per-pixel median pooling exists, but result is discarded after use (never saved)
+- **Current depth_refine.py**: Optimizes 6-DOF per camera using depth at marker corners only (sparse)
+- **compare_pose_sets.py**: Has Kabsch `rigid_transform_3d()` for point-set alignment
+- **Available deps**: numpy, scipy, opencv — sufficient for RANSAC plane fitting
+- **Open3D**: Provides ICP, RANSAC, visualization but is ~500MB heavy dep
+
+## Open Questions (Resolved)
+- **Camera count**: 2-4 cameras (small setup, likely some floor overlap)
+- **Observation method**: Point clouds don't align when overlayed in world coords
+- **Error magnitude**: Small — 1-3° tilt, <2cm offset (fine-tuning level)
+- **Floor type**: TBD (assumed flat for now)
+- **Library choice**: TBD — recommendation below
+
+## Library Recommendation Analysis
+Given: 2-4 cameras, small errors, flat floor assumption, post-processing tool
+
+**numpy/scipy approach**:
+- RANSAC plane fitting: trivial with numpy (random sample 3 points, fit plane, count inliers)
+- Plane-to-plane alignment: rotation_align_vectors already exists in alignment.py
+- Point cloud generation from depth+intrinsics: simple numpy vectorized operation
+- Kabsch alignment: already exists in compare_pose_sets.py
+- Verdict: **SUFFICIENT for this use case**. No ICP needed since we're fitting to a known target (Y=0 plane).
+
+**Open3D approach**:
+- Overkill for plane fitting + rotation correction
+- Would be useful if we needed dense ICP between overlapping point clouds
+- 500MB dep for what amounts to ~50 lines of numpy code
+- Verdict: **Not needed for the initial version**
+
+**Decision**: Use Open3D for point cloud operations (user wants it available for future work).
+Also add h5py for HDF5 depth map persistence.
+
+## Confirmed Technical Choices
+- **Library**: Open3D (RANSAC plane segmentation, ICP if needed, point cloud ops)
+- **Depth save format**: HDF5 via h5py (structured, metadata-rich, one file per camera)
+- **Visualization**: Plotly HTML (interactive 3D — floor points per camera, consensus plane, before/after)
+- **Integration**: Standalone post-processing CLI tool (click-based, like existing tools)
+- **Error handling**: numpy/scipy for math, Open3D for geometry, existing alignment.py patterns
+
+## Algorithm (confirmed via research + codebase analysis)
+1. Load existing extrinsics JSON + saved depth maps (HDF5)
+2. Per camera: unproject depth → world-coord point cloud using extrinsics
+3. Per camera: Open3D RANSAC plane segmentation → extract floor points
+4. Consensus: fit a single plane to ALL floor points from all cameras
+5. Compute correction rotation: align consensus plane normal to [0, -1, 0]
+6. Apply correction to all extrinsics (global rotation, like current alignment.py)
+7. Optionally: per-camera ICP refinement on overlapping floor regions
+8. Save corrected extrinsics JSON + generate diagnostic Plotly visualization
+
+## Final Decisions (all confirmed)
+- **Depth save trigger**: `--save-depth <dir>` flag in calibrate_extrinsics.py
+- **Refinement granularity**: Per-camera refinement (each camera corrected based on its floor obs)
+- **Test strategy**: TDD — write tests first, following existing test patterns in tests/
+
+## Scope Boundaries
+- INCLUDE: Depth map saving (HDF5), ground plane detection per camera, consensus plane fitting, per-camera extrinsic correction
+- INCLUDE: Standalone post-processing CLI tool (`refine_ground_plane.py`)
+- INCLUDE: Plotly diagnostic visualization
+- INCLUDE: TDD with pytest
+- INCLUDE: New deps: open3d, h5py
+- EXCLUDE: Modifying the core ArUco detection or PnP pipeline
+- EXCLUDE: Real-time / streaming refinement
+- EXCLUDE: Non-flat floor handling (ramps, stairs)
+- EXCLUDE: Dense multi-view reconstruction beyond floor plane
@@ -0,0 +1,93 @@
+# Draft: ICP Registration for Multi-Camera Extrinsic Refinement
+
+## Requirements (confirmed)
+- ICP role: **Complement** existing RANSAC ground-plane — chain after RANSAC leveling
+- Multi-camera strategy: **Global pose-graph optimization** (pairwise ICP → pose graph)
+- Point cloud scope: **Near-floor band** (floor_y to floor_y + band_height, ~30cm default) — includes slight 3D structure (baseboards, table legs) for better ICP constraints
+- DOF constraint: **Gravity-constrained** — ICP refines yaw + XZ translation + small height; pitch/roll regularized (soft penalty) to preserve RANSAC gravity alignment
+
+## Technical Decisions
+- Open3D already a dependency — no new deps needed
+- **Two ICP methods**: Point-to-Plane (default) + GICP (optional via --icp-method)
+- Voxel downsampling for performance (3-5cm voxel size)
+- Reference camera fixed during optimization
+- Robust kernel (Tukey/Huber) for outlier rejection
+- Colored ICP deferred (requires RGB pipeline plumbing — see analysis below)
+
+## Research Findings
+- `unproject_depth_to_points` already exists in `aruco/ground_plane.py`
+- `detect_floor_plane` already does RANSAC segmentation → can reuse inlier indices for floor filtering
+- Open3D `registration_icp` + `PoseGraph` + `global_optimization` = full pipeline
+- Multi-scale ICP (coarse→fine voxel) recommended for robustness
+- `get_information_matrix_from_point_clouds` provides edge weights for pose graph
+- Existing pipeline: unproject → RANSAC detect → consensus → correct (pitch/roll/Y only)
+- ICP addition: after RANSAC correction → extract floor points → pairwise ICP → pose graph → refine all 6 DOF
+
+## Resolved Questions
+- Overlap detection: **Bounding-box overlap check** on world XZ projections
+- DOF: **Full 6-DOF** refinement (ICP refines all rotation + translation)
+- CLI integration: **Flag on refine_ground_plane.py** (--icp/--no-icp)
+- CLI complexity: **Minimal flags + defaults** (--icp, maybe --icp-voxel-size, rest uses hardcoded defaults)
+- Test strategy: **Tests-after** (implement ICP, then add tests)
+
+## Open Questions
+- (none remaining)
+
+## Colored ICP Analysis (2025-02-09)
+
+### What Colored ICP Does
+Open3D's `registration_colored_icp` (Park et al., ICCV 2017) optimizes a joint objective:
+`E = (1-λ)·E_geom + λ·E_photo` where λ_geometric defaults to 0.968.
+It combines point-to-plane geometric distance with photometric (color) consistency.
+
+### When It Helps
+- **Planar/low-geometry environments**: Floor is exactly this — a flat plane where
+  geometric ICP can "slide" along the tangent plane. Color information "locks" the
+  translation along axes where geometry alone is degenerate.
+- **Sub-millimeter polish**: Color provides a dense signal that geometry misses due to
+  depth quantization in stereo cameras.
+
+### When It Hurts / Failure Modes
+- **Lighting inconsistency**: If cameras have different auto-exposure/white-balance, the
+  photometric term introduces bias instead of helping.
+- **Textureless floors**: Plain concrete/linoleum floors have near-zero color gradient,
+  making the photometric term useless (falls back to geometric ICP anyway).
+- **Computational overhead**: Requires RGB data, color gradient computation, ~2-3x slower.
+
+### Critical Data Pipeline Issue
+**The current HDF5 depth storage pipeline does NOT save RGB images.**
+- `depth_save.py` only stores: `pooled_depth`, `pooled_confidence`, `intrinsics`, `raw_frames`
+- `raw_frames` only contain `depth_map` and `confidence_map` — no `image` field
+- `FrameData` in `svo_sync.py` DOES have an `image` field (BGRA from ZED), but it's
+  discarded when saving to HDF5
+- To enable colored ICP, we'd need to:
+  1. Extend `save_depth_data` to also store RGB images (significant HDF5 size increase)
+  2. Extend `load_depth_data` to return images
+  3. Modify `refine_ground_plane.py` to pass images through the pipeline
+  4. Create RGBD → colored PointCloud conversion using `o3d.geometry.RGBDImage`
+
+### Recommendation
+**Defer colored ICP to a future iteration.** Reasons:
+1. Floor-only scope means we're aligning planar geometry — the exact scenario where
+   point-to-plane ICP is already optimal (when floor HAS texture, colored ICP helps;
+   when it doesn't, colored ICP is equivalent to geometric ICP).
+2. Significant plumbing work to save/load/pass RGB through the pipeline.
+3. The initial pose from ArUco markers is already very good (~cm accuracy), so ICP
+   only needs to refine by a few mm — well within geometric ICP's capability.
+4. Can be added later as an enhancement flag (--icp-method color) without redesigning
+   the core ICP module.
+5. If later we expand beyond floor-only to full scene registration, colored ICP becomes
+   much more compelling and worth the investment.
+
+### Alternative: Generalized ICP (GICP)
+- Purely geometric, no RGB needed — same data pipeline as point-to-plane
+- Models local structure as Gaussian distributions ("plane-to-plane")
+- More robust than point-to-plane for noisy stereo data
+- Available as `o3d.pipelines.registration.registration_generalized_icp`
+- **Worth considering as a --icp-method option alongside point-to-plane**
+
+## Scope Boundaries
+- INCLUDE: ICP registration module, pose-graph optimization, CLI integration, tests, docs
+- INCLUDE (stretch): GICP as alternative ICP method option (same data pipeline, no extra plumbing)
+- EXCLUDE: colored ICP (requires RGB pipeline work — future enhancement)
+- EXCLUDE: real-time/streaming ICP