# Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO ## Requirements (confirmed) ### Goal Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in `inside_network.json`. ### Calibration Target - **Type**: 3D box with 6 diamond board faces - **Object points**: Defined in `aruco/output/standard_box_markers.parquet` - **Marker dictionary**: `DICT_4X4_50` (from existing code) - **Minimum markers per frame**: 4+ (one diamond face worth) ### Input - Multiple SVO2 files (one per camera) - Frame sampling: Fixed interval + quality filter - Timestamp-aligned playback (using existing `svo_playback.py` pattern) ### Output - **New JSON file** with calibrated extrinsics - Format: Similar to `inside_network.json` but with accurate `pose` field - Reference frame: **Marker is world origin** (all cameras expressed relative to ArUco box) ### Workflow - **CLI with preview**: Command-line driven but shows visualization of detected markers - Example: `uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json` ## Technical Decisions ### Intrinsics Source - Use ZED SDK's pre-calibrated intrinsics from `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam` - Properties: `fx, fy, cx, cy, disto` ### Pose Estimation - Use `cv2.solvePnP` with `SOLVEPNP_SQPNP` flag (from existing code) - Consider `solvePnPRansac` for per-frame robustness ### Outlier Handling (Two-stage) 1. **Per-frame rejection**: Reject frames with high reprojection error (threshold ~2-5 pixels) 2. **RANSAC on pose set**: After collecting all valid poses, use RANSAC-style consensus ### Pose Averaging - **Rotation**: Use `scipy.spatial.transform.Rotation.mean()` for geodesic mean - **Translation**: Use median or weighted mean with MAD-based outlier rejection ### Math: Camera-to-World Transform Each camera sees marker → `T_cam_marker` (camera-to-marker) World origin = marker, so camera pose in world = `T_world_cam = inv(T_cam_marker)` For camera i: `T_world_cam_i = inv(T_cam_i_marker)` ## Research Findings ### From Librarian (Multi-camera calibration) - Relative transform: `T_BA = T_BM @ inv(T_AM)` - Board-based detection improves robustness to occlusion - Use `refineDetectedMarkers` for corner accuracy - Handle missing views by only computing poses when enough markers visible ### From Librarian (Robust averaging) - Use `scipy.spatial.transform.Rotation.mean(weights=...)` for rotation averaging - Median/MAD on translation for outlier rejection - RANSAC over pose set with rotation angle + translation distance thresholds - Practical thresholds: rotation >2-5°, translation depends on scale ### Existing Codebase Patterns - `find_extrinsic_object.py`: ArUco detection + solvePnP pattern - `svo_playback.py`: Multi-SVO sync via timestamp alignment - `aruco_box.py`: Diamond board geometry generation ## Open Questions - None remaining ## Metis Gap Analysis (Addressed) ### Critical Gaps Resolved: 1. **World frame**: As defined in `standard_box_markers.parquet` (origin at box coordinate system) 2. **Image stream**: Use rectified LEFT view (no distortion coefficients needed) 3. **Transform convention**: Match `inside_network.json` format - appears to be T_world_from_cam (camera pose in world) - Format: space-separated 4x4 matrix, row-major 4. **Sync tolerance**: Moderate (<33ms, 1 frame at 30fps) ### Guardrails Added: - Validate parquet schema early (require marker_id, corners with X,Y,Z in meters) - Use reprojection error as primary quality metric - Require ≥4 markers with sufficient 3D spread (not just coplanar) - Whitelist only expected marker IDs (from parquet) - Add self-check mode with quantitative quality report ## Scope Boundaries ### INCLUDE - SVO file loading with timestamp sync - ArUco detection on left camera image - Pose estimation using solvePnP - Per-frame quality filtering (reprojection error) - Multi-frame pose averaging with outlier rejection - JSON output with 4x4 pose matrices - Preview visualization showing detected markers and axes - CLI interface with click ### EXCLUDE - Right camera processing (use left only for simplicity) - Intrinsic calibration (use pre-calibrated from ZED SDK) - Modifying `inside_network.json` in-place - GUI-based frame selection - Bundle adjustment refinement - Depth-based verification