zed-playground/py_workspace/.sisyphus/drafts/aruco-svo-calibration.md

# Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO

## Requirements (confirmed)

### Goal
Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in `inside_network.json`.

### Calibration Target
- **Type**: 3D box with 6 diamond board faces
- **Object points**: Defined in `aruco/output/standard_box_markers.parquet`
- **Marker dictionary**: `DICT_4X4_50` (from existing code)
- **Minimum markers per frame**: 4+ (one diamond face worth)

### Input
- Multiple SVO2 files (one per camera)
- Frame sampling: Fixed interval + quality filter
- Timestamp-aligned playback (using existing `svo_playback.py` pattern)

### Output
- **New JSON file** with calibrated extrinsics
- Format: Similar to `inside_network.json` but with accurate `pose` field
- Reference frame: **Marker is world origin** (all cameras expressed relative to ArUco box)

### Workflow
- **CLI with preview**: Command-line driven but shows visualization of detected markers
- Example: `uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json`

## Technical Decisions

### Intrinsics Source
- Use ZED SDK's pre-calibrated intrinsics from `cam.get_camera_information().camera_configuration.calibration_parameters.left_cam`
- Properties: `fx, fy, cx, cy, disto`

### Pose Estimation
- Use `cv2.solvePnP` with `SOLVEPNP_SQPNP` flag (from existing code)
- Consider `solvePnPRansac` for per-frame robustness

### Outlier Handling (Two-stage)
1. **Per-frame rejection**: Reject frames with high reprojection error (threshold ~2-5 pixels)
2. **RANSAC on pose set**: After collecting all valid poses, use RANSAC-style consensus

### Pose Averaging
- **Rotation**: Use `scipy.spatial.transform.Rotation.mean()` for geodesic mean
- **Translation**: Use median or weighted mean with MAD-based outlier rejection

### Math: Camera-to-World Transform
Each camera sees marker → `T_cam_marker` (camera-to-marker)
World origin = marker, so camera pose in world = `T_world_cam = inv(T_cam_marker)`

For camera i: `T_world_cam_i = inv(T_cam_i_marker)`

## Research Findings

### From Librarian (Multi-camera calibration)
- Relative transform: `T_BA = T_BM @ inv(T_AM)`
- Board-based detection improves robustness to occlusion
- Use `refineDetectedMarkers` for corner accuracy
- Handle missing views by only computing poses when enough markers visible

### From Librarian (Robust averaging)
- Use `scipy.spatial.transform.Rotation.mean(weights=...)` for rotation averaging
- Median/MAD on translation for outlier rejection
- RANSAC over pose set with rotation angle + translation distance thresholds
- Practical thresholds: rotation >2-5°, translation depends on scale

### Existing Codebase Patterns
- `find_extrinsic_object.py`: ArUco detection + solvePnP pattern
- `svo_playback.py`: Multi-SVO sync via timestamp alignment
- `aruco_box.py`: Diamond board geometry generation

## Open Questions
- None remaining

## Metis Gap Analysis (Addressed)

### Critical Gaps Resolved:
1. **World frame**: As defined in `standard_box_markers.parquet` (origin at box coordinate system)
2. **Image stream**: Use rectified LEFT view (no distortion coefficients needed)
3. **Transform convention**: Match `inside_network.json` format - appears to be T_world_from_cam (camera pose in world)
   - Format: space-separated 4x4 matrix, row-major
4. **Sync tolerance**: Moderate (<33ms, 1 frame at 30fps)

### Guardrails Added:
- Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
- Use reprojection error as primary quality metric
- Require ≥4 markers with sufficient 3D spread (not just coplanar)
- Whitelist only expected marker IDs (from parquet)
- Add self-check mode with quantitative quality report

## Scope Boundaries

### INCLUDE
- SVO file loading with timestamp sync
- ArUco detection on left camera image
- Pose estimation using solvePnP
- Per-frame quality filtering (reprojection error)
- Multi-frame pose averaging with outlier rejection
- JSON output with 4x4 pose matrices
- Preview visualization showing detected markers and axes
- CLI interface with click

### EXCLUDE
- Right camera processing (use left only for simplicity)
- Intrinsic calibration (use pre-calibrated from ZED SDK)
- Modifying `inside_network.json` in-place
- GUI-based frame selection
- Bundle adjustment refinement
- Depth-based verification