Files
zed-playground/py_workspace/.sisyphus/drafts/aruco-svo-calibration.md
T
crosstyan 9c861105f7 feat: add aruco-svo-calibration plan and utils scripts
- Add comprehensive work plan for ArUco-based multi-camera calibration
- Add recording_multi.py for multi-camera SVO recording
- Add streaming_receiver.py for network streaming
- Add svo_playback.py for synchronized playback
- Add zed_network_utils.py for camera configuration
- Add AGENTS.md with project context
2026-02-05 03:17:05 +00:00

4.4 KiB

Draft: ArUco-Based Multi-Camera Extrinsic Calibration from SVO

Requirements (confirmed)

Goal

Create a CLI tool that reads synchronized SVO recordings from multiple ZED cameras, detects ArUco markers on a 3D calibration box, computes camera extrinsics relative to the marker world origin, and outputs accurate pose matrices to replace the inaccurate ones in inside_network.json.

Calibration Target

  • Type: 3D box with 6 diamond board faces
  • Object points: Defined in aruco/output/standard_box_markers.parquet
  • Marker dictionary: DICT_4X4_50 (from existing code)
  • Minimum markers per frame: 4+ (one diamond face worth)

Input

  • Multiple SVO2 files (one per camera)
  • Frame sampling: Fixed interval + quality filter
  • Timestamp-aligned playback (using existing svo_playback.py pattern)

Output

  • New JSON file with calibrated extrinsics
  • Format: Similar to inside_network.json but with accurate pose field
  • Reference frame: Marker is world origin (all cameras expressed relative to ArUco box)

Workflow

  • CLI with preview: Command-line driven but shows visualization of detected markers
  • Example: uv run calibrate_extrinsics.py --svos *.svo2 --interval 30 --output calibrated.json

Technical Decisions

Intrinsics Source

  • Use ZED SDK's pre-calibrated intrinsics from cam.get_camera_information().camera_configuration.calibration_parameters.left_cam
  • Properties: fx, fy, cx, cy, disto

Pose Estimation

  • Use cv2.solvePnP with SOLVEPNP_SQPNP flag (from existing code)
  • Consider solvePnPRansac for per-frame robustness

Outlier Handling (Two-stage)

  1. Per-frame rejection: Reject frames with high reprojection error (threshold ~2-5 pixels)
  2. RANSAC on pose set: After collecting all valid poses, use RANSAC-style consensus

Pose Averaging

  • Rotation: Use scipy.spatial.transform.Rotation.mean() for geodesic mean
  • Translation: Use median or weighted mean with MAD-based outlier rejection

Math: Camera-to-World Transform

Each camera sees marker → T_cam_marker (camera-to-marker) World origin = marker, so camera pose in world = T_world_cam = inv(T_cam_marker)

For camera i: T_world_cam_i = inv(T_cam_i_marker)

Research Findings

From Librarian (Multi-camera calibration)

  • Relative transform: T_BA = T_BM @ inv(T_AM)
  • Board-based detection improves robustness to occlusion
  • Use refineDetectedMarkers for corner accuracy
  • Handle missing views by only computing poses when enough markers visible

From Librarian (Robust averaging)

  • Use scipy.spatial.transform.Rotation.mean(weights=...) for rotation averaging
  • Median/MAD on translation for outlier rejection
  • RANSAC over pose set with rotation angle + translation distance thresholds
  • Practical thresholds: rotation >2-5°, translation depends on scale

Existing Codebase Patterns

  • find_extrinsic_object.py: ArUco detection + solvePnP pattern
  • svo_playback.py: Multi-SVO sync via timestamp alignment
  • aruco_box.py: Diamond board geometry generation

Open Questions

  • None remaining

Metis Gap Analysis (Addressed)

Critical Gaps Resolved:

  1. World frame: As defined in standard_box_markers.parquet (origin at box coordinate system)
  2. Image stream: Use rectified LEFT view (no distortion coefficients needed)
  3. Transform convention: Match inside_network.json format - appears to be T_world_from_cam (camera pose in world)
    • Format: space-separated 4x4 matrix, row-major
  4. Sync tolerance: Moderate (<33ms, 1 frame at 30fps)

Guardrails Added:

  • Validate parquet schema early (require marker_id, corners with X,Y,Z in meters)
  • Use reprojection error as primary quality metric
  • Require ≥4 markers with sufficient 3D spread (not just coplanar)
  • Whitelist only expected marker IDs (from parquet)
  • Add self-check mode with quantitative quality report

Scope Boundaries

INCLUDE

  • SVO file loading with timestamp sync
  • ArUco detection on left camera image
  • Pose estimation using solvePnP
  • Per-frame quality filtering (reprojection error)
  • Multi-frame pose averaging with outlier rejection
  • JSON output with 4x4 pose matrices
  • Preview visualization showing detected markers and axes
  • CLI interface with click

EXCLUDE

  • Right camera processing (use left only for simplicity)
  • Intrinsic calibration (use pre-calibrated from ZED SDK)
  • Modifying inside_network.json in-place
  • GUI-based frame selection
  • Bundle adjustment refinement
  • Depth-based verification