9c861105f7
- Add comprehensive work plan for ArUco-based multi-camera calibration - Add recording_multi.py for multi-camera SVO recording - Add streaming_receiver.py for network streaming - Add svo_playback.py for synchronized playback - Add zed_network_utils.py for camera configuration - Add AGENTS.md with project context
4.4 KiB
4.4 KiB
Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO
Requirements (confirmed)
Goal
Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in inside_network.json.
Calibration Target
- Type: 3D box with 6 diamond board faces
- Object points: Defined in
aruco/output/standard_box_markers.parquet - Marker dictionary:
DICT_4X4_50(from existing code) - Minimum markers per frame: 4+ (one diamond face worth)
Input
- Multiple SVO2 files (one per camera)
- Frame sampling: Fixed interval + quality filter
- Timestamp-aligned playback (using existing
svo_playback.pypattern)
Output
- New JSON file with calibrated extrinsics
- Format: Similar to
inside_network.jsonbut with accurateposefield - Reference frame: Marker is world origin (all cameras expressed relative to ArUco box)
Workflow
- CLI with preview: Command-line driven but shows visualization of detected markers
- Example:
uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json
Technical Decisions
Intrinsics Source
- Use ZED SDK's pre-calibrated intrinsics from
cam.get_camera_information().camera_configuration.calibration_parameters.left_cam - Properties:
fx, fy, cx, cy, disto
Pose Estimation
- Use
cv2.solvePnPwithSOLVEPNP_SQPNPflag (from existing code) - Consider
solvePnPRansacfor per-frame robustness
Outlier Handling (Two-stage)
- Per-frame rejection: Reject frames with high reprojection error (threshold ~2-5 pixels)
- RANSAC on pose set: After collecting all valid poses, use RANSAC-style consensus
Pose Averaging
- Rotation: Use
scipy.spatial.transform.Rotation.mean()for geodesic mean - Translation: Use median or weighted mean with MAD-based outlier rejection
Math: Camera-to-World Transform
Each camera sees marker → T_cam_marker (camera-to-marker)
World origin = marker, so camera pose in world = T_world_cam = inv(T_cam_marker)
For camera i: T_world_cam_i = inv(T_cam_i_marker)
Research Findings
From Librarian (Multi-camera calibration)
- Relative transform:
T_BA = T_BM @ inv(T_AM) - Board-based detection improves robustness to occlusion
- Use
refineDetectedMarkersfor corner accuracy - Handle missing views by only computing poses when enough markers visible
From Librarian (Robust averaging)
- Use
scipy.spatial.transform.Rotation.mean(weights=...)for rotation averaging - Median/MAD on translation for outlier rejection
- RANSAC over pose set with rotation angle + translation distance thresholds
- Practical thresholds: rotation >2-5°, translation depends on scale
Existing Codebase Patterns
find_extrinsic_object.py: ArUco detection + solvePnP patternsvo_playback.py: Multi-SVO sync via timestamp alignmentaruco_box.py: Diamond board geometry generation
Open Questions
- None remaining
Metis Gap Analysis (Addressed)
Critical Gaps Resolved:
- World frame: As defined in
standard_box_markers.parquet(origin at box coordinate system) - Image stream: Use rectified LEFT view (no distortion coefficients needed)
- Transform convention: Match
inside_network.jsonformat - appears to be T_world_from_cam (camera pose in world)- Format: space-separated 4x4 matrix, row-major
- Sync tolerance: Moderate (<33ms, 1 frame at 30fps)
Guardrails Added:
- Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
- Use reprojection error as primary quality metric
- Require ≥4 markers with sufficient 3D spread (not just coplanar)
- Whitelist only expected marker IDs (from parquet)
- Add self-check mode with quantitative quality report
Scope Boundaries
INCLUDE
- SVO file loading with timestamp sync
- ArUco detection on left camera image
- Pose estimation using solvePnP
- Per-frame quality filtering (reprojection error)
- Multi-frame pose averaging with outlier rejection
- JSON output with 4x4 pose matrices
- Preview visualization showing detected markers and axes
- CLI interface with click
EXCLUDE
- Right camera processing (use left only for simplicity)
- Intrinsic calibration (use pre-calibrated from ZED SDK)
- Modifying
inside_network.jsonin-place - GUI-based frame selection
- Bundle adjustment refinement
- Depth-based verification