refactor: things
This commit is contained in:
@@ -0,0 +1,108 @@
|
||||
# Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO
|
||||
|
||||
## Requirements (confirmed)
|
||||
|
||||
### Goal
|
||||
Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in `inside_network.json`.
|
||||
|
||||
### Calibration Target
|
||||
- **Type**: 3D box with 6 diamond board faces
|
||||
- **Object points**: Defined in `aruco/output/standard_box_markers.parquet`
|
||||
- **Marker dictionary**: `DICT_4X4_50` (from existing code)
|
||||
- **Minimum markers per frame**: 4+ (one diamond face worth)
|
||||
|
||||
### Input
|
||||
- Multiple SVO2 files (one per camera)
|
||||
- Frame sampling: Fixed interval + quality filter
|
||||
- Timestamp-aligned playback (using existing `svo_playback.py` pattern)
|
||||
|
||||
### Output
|
||||
- **New JSON file** with calibrated extrinsics
|
||||
- Format: Similar to `inside_network.json` but with accurate `pose` field
|
||||
- Reference frame: **Marker is world origin** (all cameras expressed relative to ArUco box)
|
||||
|
||||
### Workflow
|
||||
- **CLI with preview**: Command-line driven but shows visualization of detected markers
|
||||
- Example: `uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json`
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
### Intrinsics Source
|
||||
- Use ZED SDK's pre-calibrated intrinsics from `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam`
|
||||
- Properties: `fx, fy, cx, cy, disto`
|
||||
|
||||
### Pose Estimation
|
||||
- Use `cv2.solvePnP` with `SOLVEPNP_SQPNP` flag (from existing code)
|
||||
- Consider `solvePnPRansac` for per-frame robustness
|
||||
|
||||
### Outlier Handling (Two-stage)
|
||||
1. **Per-frame rejection**: Reject frames with high reprojection error (threshold ~2-5 pixels)
|
||||
2. **RANSAC on pose set**: After collecting all valid poses, use RANSAC-style consensus
|
||||
|
||||
### Pose Averaging
|
||||
- **Rotation**: Use `scipy.spatial.transform.Rotation.mean()` for geodesic mean
|
||||
- **Translation**: Use median or weighted mean with MAD-based outlier rejection
|
||||
|
||||
### Math: Camera-to-World Transform
|
||||
Each camera sees marker → `T_cam_marker` (camera-to-marker)
|
||||
World origin = marker, so camera pose in world = `T_world_cam = inv(T_cam_marker)`
|
||||
|
||||
For camera i: `T_world_cam_i = inv(T_cam_i_marker)`
|
||||
|
||||
## Research Findings
|
||||
|
||||
### From Librarian (Multi-camera calibration)
|
||||
- Relative transform: `T_BA = T_BM @ inv(T_AM)`
|
||||
- Board-based detection improves robustness to occlusion
|
||||
- Use `refineDetectedMarkers` for corner accuracy
|
||||
- Handle missing views by only computing poses when enough markers visible
|
||||
|
||||
### From Librarian (Robust averaging)
|
||||
- Use `scipy.spatial.transform.Rotation.mean(weights=...)` for rotation averaging
|
||||
- Median/MAD on translation for outlier rejection
|
||||
- RANSAC over pose set with rotation angle + translation distance thresholds
|
||||
- Practical thresholds: rotation >2-5°, translation depends on scale
|
||||
|
||||
### Existing Codebase Patterns
|
||||
- `find_extrinsic_object.py`: ArUco detection + solvePnP pattern
|
||||
- `svo_playback.py`: Multi-SVO sync via timestamp alignment
|
||||
- `aruco_box.py`: Diamond board geometry generation
|
||||
|
||||
## Open Questions
|
||||
- None remaining
|
||||
|
||||
## Metis Gap Analysis (Addressed)
|
||||
|
||||
### Critical Gaps Resolved:
|
||||
1. **World frame**: As defined in `standard_box_markers.parquet` (origin at box coordinate system)
|
||||
2. **Image stream**: Use rectified LEFT view (no distortion coefficients needed)
|
||||
3. **Transform convention**: Match `inside_network.json` format - appears to be T_world_from_cam (camera pose in world)
|
||||
- Format: space-separated 4x4 matrix, row-major
|
||||
4. **Sync tolerance**: Moderate (<33ms, 1 frame at 30fps)
|
||||
|
||||
### Guardrails Added:
|
||||
- Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
|
||||
- Use reprojection error as primary quality metric
|
||||
- Require ≥4 markers with sufficient 3D spread (not just coplanar)
|
||||
- Whitelist only expected marker IDs (from parquet)
|
||||
- Add self-check mode with quantitative quality report
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
### INCLUDE
|
||||
- SVO file loading with timestamp sync
|
||||
- ArUco detection on left camera image
|
||||
- Pose estimation using solvePnP
|
||||
- Per-frame quality filtering (reprojection error)
|
||||
- Multi-frame pose averaging with outlier rejection
|
||||
- JSON output with 4x4 pose matrices
|
||||
- Preview visualization showing detected markers and axes
|
||||
- CLI interface with click
|
||||
|
||||
### EXCLUDE
|
||||
- Right camera processing (use left only for simplicity)
|
||||
- Intrinsic calibration (use pre-calibrated from ZED SDK)
|
||||
- Modifying `inside_network.json` in-place
|
||||
- GUI-based frame selection
|
||||
- Bundle adjustment refinement
|
||||
- Depth-based verification
|
||||
@@ -0,0 +1,55 @@
|
||||
# Draft: Depth-Based Extrinsic Verification/Fusion
|
||||
|
||||
## Requirements (confirmed)
|
||||
|
||||
- **Primary Goal**: Both verify AND refine extrinsics using depth data
|
||||
- **Integration**: Add to existing `calibrate_extrinsics.py` CLI (new flags)
|
||||
- **Depth Mode**: CLI argument with default to NEURAL_PLUS (or NEURAL)
|
||||
- **Target Geometry**: Any markers (from parquet file), not just ArUco box
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
- Use ZED SDK `retrieve_measure(MEASURE.DEPTH)` for depth maps
|
||||
- Extend `SVOReader` to optionally enable depth mode
|
||||
- Compute depth residuals at detected marker corner positions
|
||||
- Use residual statistics for verification metrics
|
||||
- ICP or optimization for refinement (if requested)
|
||||
|
||||
## Research Findings
|
||||
|
||||
### Depth Residual Formula
|
||||
For 3D point P_world with camera extrinsics (R, t):
|
||||
```
|
||||
P_cam = R @ P_world + t
|
||||
z_predicted = P_cam[2]
|
||||
(u, v) = project(P_cam, K)
|
||||
z_measured = depth_map[v, u]
|
||||
residual = z_measured - z_predicted
|
||||
```
|
||||
|
||||
### Verification Metrics
|
||||
- Mean absolute residual
|
||||
- RMSE
|
||||
- Depth-normalized error: |r| / z_pred
|
||||
- Spatial bias detection (residual vs pixel position)
|
||||
|
||||
### Refinement Approach
|
||||
- ICP (Iterative Closest Point) on depth points near markers
|
||||
- Point-to-plane ICP for better convergence
|
||||
- Initialize with ArUco pose, refine with depth
|
||||
|
||||
## User Decisions (Round 2)
|
||||
|
||||
- **Refinement Method**: Direct optimization (minimize depth residuals to adjust extrinsics)
|
||||
- **Verification Output**: Full reporting (console + JSON + optional CSV)
|
||||
- **Depth Filtering**: Confidence-based (use ZED confidence threshold + range limits)
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Test strategy: TDD or tests after?
|
||||
- Minimum markers/frames for reliable depth verification?
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
- INCLUDE: Depth retrieval, residual computation, verification metrics, optional ICP refinement
|
||||
- EXCLUDE: Bundle adjustment, SLAM, right camera processing
|
||||
@@ -0,0 +1,3 @@
|
||||
# Draft: SUPERSEDED
|
||||
|
||||
This draft has been superseded by the final plan at `.sisyphus/plans/depth-refinement-robust.md`.
|
||||
@@ -0,0 +1,79 @@
|
||||
# Draft: Ground Plane Refinement & Depth Map Persistence
|
||||
|
||||
## Requirements (confirmed)
|
||||
- **Core problem**: Camera disagreement — different cameras don't agree on where the ground is (floor at different heights/angles)
|
||||
- **Depth saving**: Save BOTH pooled depth maps AND raw best-scored frames per camera, so pooling parameters can be re-tuned without re-reading SVOs
|
||||
- **Integration**: Post-processing step — a new standalone CLI tool that loads existing extrinsics + saved depth data and refines
|
||||
- **Library**: TBD — user wants to understand trade-offs before committing
|
||||
|
||||
## Technical Decisions
|
||||
- Post-processing approach: non-invasive, loads existing calibration JSON + depth data
|
||||
- Depth saving happens inside calibrate_extrinsics.py (or triggered by flag)
|
||||
- Ground refinement tool is a NEW script (e.g., `refine_ground_plane.py`)
|
||||
|
||||
## Research Findings
|
||||
- **Current alignment.py**: Aligns world frame based on marker face normals, NOT actual floor geometry
|
||||
- **Current depth_pool.py**: Per-pixel median pooling exists, but result is discarded after use (never saved)
|
||||
- **Current depth_refine.py**: Optimizes 6-DOF per camera using depth at marker corners only (sparse)
|
||||
- **compare_pose_sets.py**: Has Kabsch `rigid_transform_3d()` for point-set alignment
|
||||
- **Available deps**: numpy, scipy, opencv — sufficient for RANSAC plane fitting
|
||||
- **Open3D**: Provides ICP, RANSAC, visualization but is ~500MB heavy dep
|
||||
|
||||
## Open Questions (Resolved)
|
||||
- **Camera count**: 2-4 cameras (small setup, likely some floor overlap)
|
||||
- **Observation method**: Point clouds don't align when overlayed in world coords
|
||||
- **Error magnitude**: Small — 1-3° tilt, <2cm offset (fine-tuning level)
|
||||
- **Floor type**: TBD (assumed flat for now)
|
||||
- **Library choice**: TBD — recommendation below
|
||||
|
||||
## Library Recommendation Analysis
|
||||
Given: 2-4 cameras, small errors, flat floor assumption, post-processing tool
|
||||
|
||||
**numpy/scipy approach**:
|
||||
- RANSAC plane fitting: trivial with numpy (random sample 3 points, fit plane, count inliers)
|
||||
- Plane-to-plane alignment: rotation_align_vectors already exists in alignment.py
|
||||
- Point cloud generation from depth+intrinsics: simple numpy vectorized operation
|
||||
- Kabsch alignment: already exists in compare_pose_sets.py
|
||||
- Verdict: **SUFFICIENT for this use case**. No ICP needed since we're fitting to a known target (Y=0 plane).
|
||||
|
||||
**Open3D approach**:
|
||||
- Overkill for plane fitting + rotation correction
|
||||
- Would be useful if we needed dense ICP between overlapping point clouds
|
||||
- 500MB dep for what amounts to ~50 lines of numpy code
|
||||
- Verdict: **Not needed for the initial version**
|
||||
|
||||
**Decision**: Use Open3D for point cloud operations (user wants it available for future work).
|
||||
Also add h5py for HDF5 depth map persistence.
|
||||
|
||||
## Confirmed Technical Choices
|
||||
- **Library**: Open3D (RANSAC plane segmentation, ICP if needed, point cloud ops)
|
||||
- **Depth save format**: HDF5 via h5py (structured, metadata-rich, one file per camera)
|
||||
- **Visualization**: Plotly HTML (interactive 3D — floor points per camera, consensus plane, before/after)
|
||||
- **Integration**: Standalone post-processing CLI tool (click-based, like existing tools)
|
||||
- **Error handling**: numpy/scipy for math, Open3D for geometry, existing alignment.py patterns
|
||||
|
||||
## Algorithm (confirmed via research + codebase analysis)
|
||||
1. Load existing extrinsics JSON + saved depth maps (HDF5)
|
||||
2. Per camera: unproject depth → world-coord point cloud using extrinsics
|
||||
3. Per camera: Open3D RANSAC plane segmentation → extract floor points
|
||||
4. Consensus: fit a single plane to ALL floor points from all cameras
|
||||
5. Compute correction rotation: align consensus plane normal to [0, -1, 0]
|
||||
6. Apply correction to all extrinsics (global rotation, like current alignment.py)
|
||||
7. Optionally: per-camera ICP refinement on overlapping floor regions
|
||||
8. Save corrected extrinsics JSON + generate diagnostic Plotly visualization
|
||||
|
||||
## Final Decisions (all confirmed)
|
||||
- **Depth save trigger**: `--save-depth <dir>` flag in calibrate_extrinsics.py
|
||||
- **Refinement granularity**: Per-camera refinement (each camera corrected based on its floor obs)
|
||||
- **Test strategy**: TDD — write tests first, following existing test patterns in tests/
|
||||
|
||||
## Scope Boundaries
|
||||
- INCLUDE: Depth map saving (HDF5), ground plane detection per camera, consensus plane fitting, per-camera extrinsic correction
|
||||
- INCLUDE: Standalone post-processing CLI tool (`refine_ground_plane.py`)
|
||||
- INCLUDE: Plotly diagnostic visualization
|
||||
- INCLUDE: TDD with pytest
|
||||
- INCLUDE: New deps: open3d, h5py
|
||||
- EXCLUDE: Modifying the core ArUco detection or PnP pipeline
|
||||
- EXCLUDE: Real-time / streaming refinement
|
||||
- EXCLUDE: Non-flat floor handling (ramps, stairs)
|
||||
- EXCLUDE: Dense multi-view reconstruction beyond floor plane
|
||||
@@ -0,0 +1,93 @@
|
||||
# Draft: ICP Registration for Multi-Camera Extrinsic Refinement
|
||||
|
||||
## Requirements (confirmed)
|
||||
- ICP role: **Complement** existing RANSAC ground-plane — chain after RANSAC leveling
|
||||
- Multi-camera strategy: **Global pose-graph optimization** (pairwise ICP → pose graph)
|
||||
- Point cloud scope: **Near-floor band** (floor_y to floor_y + band_height, ~30cm default) — includes slight 3D structure (baseboards, table legs) for better ICP constraints
|
||||
- DOF constraint: **Gravity-constrained** — ICP refines yaw + XZ translation + small height; pitch/roll regularized (soft penalty) to preserve RANSAC gravity alignment
|
||||
|
||||
## Technical Decisions
|
||||
- Open3D already a dependency — no new deps needed
|
||||
- **Two ICP methods**: Point-to-Plane (default) + GICP (optional via --icp-method)
|
||||
- Voxel downsampling for performance (3-5cm voxel size)
|
||||
- Reference camera fixed during optimization
|
||||
- Robust kernel (Tukey/Huber) for outlier rejection
|
||||
- Colored ICP deferred (requires RGB pipeline plumbing — see analysis below)
|
||||
|
||||
## Research Findings
|
||||
- `unproject_depth_to_points` already exists in `aruco/ground_plane.py`
|
||||
- `detect_floor_plane` already does RANSAC segmentation → can reuse inlier indices for floor filtering
|
||||
- Open3D `registration_icp` + `PoseGraph` + `global_optimization` = full pipeline
|
||||
- Multi-scale ICP (coarse→fine voxel) recommended for robustness
|
||||
- `get_information_matrix_from_point_clouds` provides edge weights for pose graph
|
||||
- Existing pipeline: unproject → RANSAC detect → consensus → correct (pitch/roll/Y only)
|
||||
- ICP addition: after RANSAC correction → extract floor points → pairwise ICP → pose graph → refine all 6 DOF
|
||||
|
||||
## Resolved Questions
|
||||
- Overlap detection: **Bounding-box overlap check** on world XZ projections
|
||||
- DOF: **Full 6-DOF** refinement (ICP refines all rotation + translation)
|
||||
- CLI integration: **Flag on refine_ground_plane.py** (--icp/--no-icp)
|
||||
- CLI complexity: **Minimal flags + defaults** (--icp, maybe --icp-voxel-size, rest uses hardcoded defaults)
|
||||
- Test strategy: **Tests-after** (implement ICP, then add tests)
|
||||
|
||||
## Open Questions
|
||||
- (none remaining)
|
||||
|
||||
## Colored ICP Analysis (2025-02-09)
|
||||
|
||||
### What Colored ICP Does
|
||||
Open3D's `registration_colored_icp` (Park et al., ICCV 2017) optimizes a joint objective:
|
||||
`E = (1-λ)·E_geom + λ·E_photo` where λ_geometric defaults to 0.968.
|
||||
It combines point-to-plane geometric distance with photometric (color) consistency.
|
||||
|
||||
### When It Helps
|
||||
- **Planar/low-geometry environments**: Floor is exactly this — a flat plane where
|
||||
geometric ICP can "slide" along the tangent plane. Color information "locks" the
|
||||
translation along axes where geometry alone is degenerate.
|
||||
- **Sub-millimeter polish**: Color provides a dense signal that geometry misses due to
|
||||
depth quantization in stereo cameras.
|
||||
|
||||
### When It Hurts / Failure Modes
|
||||
- **Lighting inconsistency**: If cameras have different auto-exposure/white-balance, the
|
||||
photometric term introduces bias instead of helping.
|
||||
- **Textureless floors**: Plain concrete/linoleum floors have near-zero color gradient,
|
||||
making the photometric term useless (falls back to geometric ICP anyway).
|
||||
- **Computational overhead**: Requires RGB data, color gradient computation, ~2-3x slower.
|
||||
|
||||
### Critical Data Pipeline Issue
|
||||
**The current HDF5 depth storage pipeline does NOT save RGB images.**
|
||||
- `depth_save.py` only stores: `pooled_depth`, `pooled_confidence`, `intrinsics`, `raw_frames`
|
||||
- `raw_frames` only contain `depth_map` and `confidence_map` — no `image` field
|
||||
- `FrameData` in `svo_sync.py` DOES have an `image` field (BGRA from ZED), but it's
|
||||
discarded when saving to HDF5
|
||||
- To enable colored ICP, we'd need to:
|
||||
1. Extend `save_depth_data` to also store RGB images (significant HDF5 size increase)
|
||||
2. Extend `load_depth_data` to return images
|
||||
3. Modify `refine_ground_plane.py` to pass images through the pipeline
|
||||
4. Create RGBD → colored PointCloud conversion using `o3d.geometry.RGBDImage`
|
||||
|
||||
### Recommendation
|
||||
**Defer colored ICP to a future iteration.** Reasons:
|
||||
1. Floor-only scope means we're aligning planar geometry — the exact scenario where
|
||||
point-to-plane ICP is already optimal (when floor HAS texture, colored ICP helps;
|
||||
when it doesn't, colored ICP is equivalent to geometric ICP).
|
||||
2. Significant plumbing work to save/load/pass RGB through the pipeline.
|
||||
3. The initial pose from ArUco markers is already very good (~cm accuracy), so ICP
|
||||
only needs to refine by a few mm — well within geometric ICP's capability.
|
||||
4. Can be added later as an enhancement flag (--icp-method color) without redesigning
|
||||
the core ICP module.
|
||||
5. If later we expand beyond floor-only to full scene registration, colored ICP becomes
|
||||
much more compelling and worth the investment.
|
||||
|
||||
### Alternative: Generalized ICP (GICP)
|
||||
- Purely geometric, no RGB needed — same data pipeline as point-to-plane
|
||||
- Models local structure as Gaussian distributions ("plane-to-plane")
|
||||
- More robust than point-to-plane for noisy stereo data
|
||||
- Available as `o3d.pipelines.registration.registration_generalized_icp`
|
||||
- **Worth considering as a --icp-method option alongside point-to-plane**
|
||||
|
||||
## Scope Boundaries
|
||||
- INCLUDE: ICP registration module, pose-graph optimization, CLI integration, tests, docs
|
||||
- INCLUDE (stretch): GICP as alternative ICP method option (same data pipeline, no extra plumbing)
|
||||
- EXCLUDE: colored ICP (requires RGB pipeline work — future enhancement)
|
||||
- EXCLUDE: real-time/streaming ICP
|
||||
Reference in New Issue
Block a user