refactor: things

This commit is contained in:
2026-03-06 17:17:59 +08:00
parent 8c6087683f
commit 33ab1a5d9d
171 changed files with 293 additions and 29894 deletions
@@ -0,0 +1,108 @@
# Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO
## Requirements (confirmed)
### Goal
Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in `inside_network.json`.
### Calibration Target
- **Type**: 3D box with 6 diamond board faces
- **Object points**: Defined in `aruco/output/standard_box_markers.parquet`
- **Marker dictionary**: `DICT_4X4_50` (from existing code)
- **Minimum markers per frame**: 4+ (one diamond face worth)
### Input
- Multiple SVO2 files (one per camera)
- Frame sampling: Fixed interval + quality filter
- Timestamp-aligned playback (using existing `svo_playback.py` pattern)
### Output
- **New JSON file** with calibrated extrinsics
- Format: Similar to `inside_network.json` but with accurate `pose` field
- Reference frame: **Marker is world origin** (all cameras expressed relative to ArUco box)
### Workflow
- **CLI with preview**: Command-line driven but shows visualization of detected markers
- Example: `uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json`
## Technical Decisions
### Intrinsics Source
- Use ZED SDK's pre-calibrated intrinsics from `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam`
- Properties: `fx, fy, cx, cy, disto`
### Pose Estimation
- Use `cv2.solvePnP` with `SOLVEPNP_SQPNP` flag (from existing code)
- Consider `solvePnPRansac` for per-frame robustness
### Outlier Handling (Two-stage)
1. **Per-frame rejection**: Reject frames with high reprojection error (threshold ~2-5 pixels)
2. **RANSAC on pose set**: After collecting all valid poses, use RANSAC-style consensus
### Pose Averaging
- **Rotation**: Use `scipy.spatial.transform.Rotation.mean()` for geodesic mean
- **Translation**: Use median or weighted mean with MAD-based outlier rejection
### Math: Camera-to-World Transform
Each camera sees marker → `T_cam_marker` (camera-to-marker)
World origin = marker, so camera pose in world = `T_world_cam = inv(T_cam_marker)`
For camera i: `T_world_cam_i = inv(T_cam_i_marker)`
## Research Findings
### From Librarian (Multi-camera calibration)
- Relative transform: `T_BA = T_BM @ inv(T_AM)`
- Board-based detection improves robustness to occlusion
- Use `refineDetectedMarkers` for corner accuracy
- Handle missing views by only computing poses when enough markers visible
### From Librarian (Robust averaging)
- Use `scipy.spatial.transform.Rotation.mean(weights=...)` for rotation averaging
- Median/MAD on translation for outlier rejection
- RANSAC over pose set with rotation angle + translation distance thresholds
- Practical thresholds: rotation >2-5°, translation depends on scale
### Existing Codebase Patterns
- `find_extrinsic_object.py`: ArUco detection + solvePnP pattern
- `svo_playback.py`: Multi-SVO sync via timestamp alignment
- `aruco_box.py`: Diamond board geometry generation
## Open Questions
- None remaining
## Metis Gap Analysis (Addressed)
### Critical Gaps Resolved:
1. **World frame**: As defined in `standard_box_markers.parquet` (origin at box coordinate system)
2. **Image stream**: Use rectified LEFT view (no distortion coefficients needed)
3. **Transform convention**: Match `inside_network.json` format - appears to be T_world_from_cam (camera pose in world)
- Format: space-separated 4x4 matrix, row-major
4. **Sync tolerance**: Moderate (<33ms, 1 frame at 30fps)
### Guardrails Added:
- Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
- Use reprojection error as primary quality metric
- Require ≥4 markers with sufficient 3D spread (not just coplanar)
- Whitelist only expected marker IDs (from parquet)
- Add self-check mode with quantitative quality report
## Scope Boundaries
### INCLUDE
- SVO file loading with timestamp sync
- ArUco detection on left camera image
- Pose estimation using solvePnP
- Per-frame quality filtering (reprojection error)
- Multi-frame pose averaging with outlier rejection
- JSON output with 4x4 pose matrices
- Preview visualization showing detected markers and axes
- CLI interface with click
### EXCLUDE
- Right camera processing (use left only for simplicity)
- Intrinsic calibration (use pre-calibrated from ZED SDK)
- Modifying `inside_network.json` in-place
- GUI-based frame selection
- Bundle adjustment refinement
- Depth-based verification
@@ -0,0 +1,55 @@
# Draft: Depth-Based Extrinsic Verification/Fusion
## Requirements (confirmed)
- **Primary Goal**: Both verify AND refine extrinsics using depth data
- **Integration**: Add to existing `calibrate_extrinsics.py` CLI (new flags)
- **Depth Mode**: CLI argument with default to NEURAL_PLUS (or NEURAL)
- **Target Geometry**: Any markers (from parquet file), not just ArUco box
## Technical Decisions
- Use ZED SDK `retrieve_measure(MEASURE.DEPTH)` for depth maps
- Extend `SVOReader` to optionally enable depth mode
- Compute depth residuals at detected marker corner positions
- Use residual statistics for verification metrics
- ICP or optimization for refinement (if requested)
## Research Findings
### Depth Residual Formula
For 3D point P_world with camera extrinsics (R, t):
```
P_cam = R @ P_world + t
z_predicted = P_cam[2]
(u, v) = project(P_cam, K)
z_measured = depth_map[v, u]
residual = z_measured - z_predicted
```
### Verification Metrics
- Mean absolute residual
- RMSE
- Depth-normalized error: |r| / z_pred
- Spatial bias detection (residual vs pixel position)
### Refinement Approach
- ICP (Iterative Closest Point) on depth points near markers
- Point-to-plane ICP for better convergence
- Initialize with ArUco pose, refine with depth
## User Decisions (Round 2)
- **Refinement Method**: Direct optimization (minimize depth residuals to adjust extrinsics)
- **Verification Output**: Full reporting (console + JSON + optional CSV)
- **Depth Filtering**: Confidence-based (use ZED confidence threshold + range limits)
## Open Questions
- Test strategy: TDD or tests after?
- Minimum markers/frames for reliable depth verification?
## Scope Boundaries
- INCLUDE: Depth retrieval, residual computation, verification metrics, optional ICP refinement
- EXCLUDE: Bundle adjustment, SLAM, right camera processing
@@ -0,0 +1,3 @@
# Draft: SUPERSEDED
This draft has been superseded by the final plan at `.sisyphus/plans/depth-refinement-robust.md`.
@@ -0,0 +1,79 @@
# Draft: Ground Plane Refinement & Depth Map Persistence
## Requirements (confirmed)
- **Core problem**: Camera disagreement — different cameras don't agree on where the ground is (floor at different heights/angles)
- **Depth saving**: Save BOTH pooled depth maps AND raw best-scored frames per camera, so pooling parameters can be re-tuned without re-reading SVOs
- **Integration**: Post-processing step — a new standalone CLI tool that loads existing extrinsics + saved depth data and refines
- **Library**: TBD — user wants to understand trade-offs before committing
## Technical Decisions
- Post-processing approach: non-invasive, loads existing calibration JSON + depth data
- Depth saving happens inside calibrate_extrinsics.py (or triggered by flag)
- Ground refinement tool is a NEW script (e.g., `refine_ground_plane.py`)
## Research Findings
- **Current alignment.py**: Aligns world frame based on marker face normals, NOT actual floor geometry
- **Current depth_pool.py**: Per-pixel median pooling exists, but result is discarded after use (never saved)
- **Current depth_refine.py**: Optimizes 6-DOF per camera using depth at marker corners only (sparse)
- **compare_pose_sets.py**: Has Kabsch `rigid_transform_3d()` for point-set alignment
- **Available deps**: numpy, scipy, opencv — sufficient for RANSAC plane fitting
- **Open3D**: Provides ICP, RANSAC, visualization but is ~500MB heavy dep
## Open Questions (Resolved)
- **Camera count**: 2-4 cameras (small setup, likely some floor overlap)
- **Observation method**: Point clouds don't align when overlayed in world coords
- **Error magnitude**: Small — 1-3° tilt, <2cm offset (fine-tuning level)
- **Floor type**: TBD (assumed flat for now)
- **Library choice**: TBD — recommendation below
## Library Recommendation Analysis
Given: 2-4 cameras, small errors, flat floor assumption, post-processing tool
**numpy/scipy approach**:
- RANSAC plane fitting: trivial with numpy (random sample 3 points, fit plane, count inliers)
- Plane-to-plane alignment: rotation_align_vectors already exists in alignment.py
- Point cloud generation from depth+intrinsics: simple numpy vectorized operation
- Kabsch alignment: already exists in compare_pose_sets.py
- Verdict: **SUFFICIENT for this use case**. No ICP needed since we're fitting to a known target (Y=0 plane).
**Open3D approach**:
- Overkill for plane fitting + rotation correction
- Would be useful if we needed dense ICP between overlapping point clouds
- 500MB dep for what amounts to ~50 lines of numpy code
- Verdict: **Not needed for the initial version**
**Decision**: Use Open3D for point cloud operations (user wants it available for future work).
Also add h5py for HDF5 depth map persistence.
## Confirmed Technical Choices
- **Library**: Open3D (RANSAC plane segmentation, ICP if needed, point cloud ops)
- **Depth save format**: HDF5 via h5py (structured, metadata-rich, one file per camera)
- **Visualization**: Plotly HTML (interactive 3D — floor points per camera, consensus plane, before/after)
- **Integration**: Standalone post-processing CLI tool (click-based, like existing tools)
- **Error handling**: numpy/scipy for math, Open3D for geometry, existing alignment.py patterns
## Algorithm (confirmed via research + codebase analysis)
1. Load existing extrinsics JSON + saved depth maps (HDF5)
2. Per camera: unproject depth → world-coord point cloud using extrinsics
3. Per camera: Open3D RANSAC plane segmentation → extract floor points
4. Consensus: fit a single plane to ALL floor points from all cameras
5. Compute correction rotation: align consensus plane normal to [0, -1, 0]
6. Apply correction to all extrinsics (global rotation, like current alignment.py)
7. Optionally: per-camera ICP refinement on overlapping floor regions
8. Save corrected extrinsics JSON + generate diagnostic Plotly visualization
## Final Decisions (all confirmed)
- **Depth save trigger**: `--save-depth <dir>` flag in calibrate_extrinsics.py
- **Refinement granularity**: Per-camera refinement (each camera corrected based on its floor obs)
- **Test strategy**: TDD — write tests first, following existing test patterns in tests/
## Scope Boundaries
- INCLUDE: Depth map saving (HDF5), ground plane detection per camera, consensus plane fitting, per-camera extrinsic correction
- INCLUDE: Standalone post-processing CLI tool (`refine_ground_plane.py`)
- INCLUDE: Plotly diagnostic visualization
- INCLUDE: TDD with pytest
- INCLUDE: New deps: open3d, h5py
- EXCLUDE: Modifying the core ArUco detection or PnP pipeline
- EXCLUDE: Real-time / streaming refinement
- EXCLUDE: Non-flat floor handling (ramps, stairs)
- EXCLUDE: Dense multi-view reconstruction beyond floor plane
@@ -0,0 +1,93 @@
# Draft: ICP Registration for Multi-Camera Extrinsic Refinement
## Requirements (confirmed)
- ICP role: **Complement** existing RANSAC ground-plane — chain after RANSAC leveling
- Multi-camera strategy: **Global pose-graph optimization** (pairwise ICP → pose graph)
- Point cloud scope: **Near-floor band** (floor_y to floor_y + band_height, ~30cm default) — includes slight 3D structure (baseboards, table legs) for better ICP constraints
- DOF constraint: **Gravity-constrained** — ICP refines yaw + XZ translation + small height; pitch/roll regularized (soft penalty) to preserve RANSAC gravity alignment
## Technical Decisions
- Open3D already a dependency — no new deps needed
- **Two ICP methods**: Point-to-Plane (default) + GICP (optional via --icp-method)
- Voxel downsampling for performance (3-5cm voxel size)
- Reference camera fixed during optimization
- Robust kernel (Tukey/Huber) for outlier rejection
- Colored ICP deferred (requires RGB pipeline plumbing — see analysis below)
## Research Findings
- `unproject_depth_to_points` already exists in `aruco/ground_plane.py`
- `detect_floor_plane` already does RANSAC segmentation → can reuse inlier indices for floor filtering
- Open3D `registration_icp` + `PoseGraph` + `global_optimization` = full pipeline
- Multi-scale ICP (coarse→fine voxel) recommended for robustness
- `get_information_matrix_from_point_clouds` provides edge weights for pose graph
- Existing pipeline: unproject → RANSAC detect → consensus → correct (pitch/roll/Y only)
- ICP addition: after RANSAC correction → extract floor points → pairwise ICP → pose graph → refine all 6 DOF
## Resolved Questions
- Overlap detection: **Bounding-box overlap check** on world XZ projections
- DOF: **Full 6-DOF** refinement (ICP refines all rotation + translation)
- CLI integration: **Flag on refine_ground_plane.py** (--icp/--no-icp)
- CLI complexity: **Minimal flags + defaults** (--icp, maybe --icp-voxel-size, rest uses hardcoded defaults)
- Test strategy: **Tests-after** (implement ICP, then add tests)
## Open Questions
- (none remaining)
## Colored ICP Analysis (2025-02-09)
### What Colored ICP Does
Open3D's `registration_colored_icp` (Park et al., ICCV 2017) optimizes a joint objective:
`E = (1-λ)·E_geom + λ·E_photo` where λ_geometric defaults to 0.968.
It combines point-to-plane geometric distance with photometric (color) consistency.
### When It Helps
- **Planar/low-geometry environments**: Floor is exactly this — a flat plane where
geometric ICP can "slide" along the tangent plane. Color information "locks" the
translation along axes where geometry alone is degenerate.
- **Sub-millimeter polish**: Color provides a dense signal that geometry misses due to
depth quantization in stereo cameras.
### When It Hurts / Failure Modes
- **Lighting inconsistency**: If cameras have different auto-exposure/white-balance, the
photometric term introduces bias instead of helping.
- **Textureless floors**: Plain concrete/linoleum floors have near-zero color gradient,
making the photometric term useless (falls back to geometric ICP anyway).
- **Computational overhead**: Requires RGB data, color gradient computation, ~2-3x slower.
### Critical Data Pipeline Issue
**The current HDF5 depth storage pipeline does NOT save RGB images.**
- `depth_save.py` only stores: `pooled_depth`, `pooled_confidence`, `intrinsics`, `raw_frames`
- `raw_frames` only contain `depth_map` and `confidence_map` — no `image` field
- `FrameData` in `svo_sync.py` DOES have an `image` field (BGRA from ZED), but it's
discarded when saving to HDF5
- To enable colored ICP, we'd need to:
1. Extend `save_depth_data` to also store RGB images (significant HDF5 size increase)
2. Extend `load_depth_data` to return images
3. Modify `refine_ground_plane.py` to pass images through the pipeline
4. Create RGBD → colored PointCloud conversion using `o3d.geometry.RGBDImage`
### Recommendation
**Defer colored ICP to a future iteration.** Reasons:
1. Floor-only scope means we're aligning planar geometry — the exact scenario where
point-to-plane ICP is already optimal (when floor HAS texture, colored ICP helps;
when it doesn't, colored ICP is equivalent to geometric ICP).
2. Significant plumbing work to save/load/pass RGB through the pipeline.
3. The initial pose from ArUco markers is already very good (~cm accuracy), so ICP
only needs to refine by a few mm — well within geometric ICP's capability.
4. Can be added later as an enhancement flag (--icp-method color) without redesigning
the core ICP module.
5. If later we expand beyond floor-only to full scene registration, colored ICP becomes
much more compelling and worth the investment.
### Alternative: Generalized ICP (GICP)
- Purely geometric, no RGB needed — same data pipeline as point-to-plane
- Models local structure as Gaussian distributions ("plane-to-plane")
- More robust than point-to-plane for noisy stereo data
- Available as `o3d.pipelines.registration.registration_generalized_icp`
- **Worth considering as a --icp-method option alongside point-to-plane**
## Scope Boundaries
- INCLUDE: ICP registration module, pose-graph optimization, CLI integration, tests, docs
- INCLUDE (stretch): GICP as alternative ICP method option (same data pipeline, no extra plumbing)
- EXCLUDE: colored ICP (requires RGB pipeline work — future enhancement)
- EXCLUDE: real-time/streaming ICP