Files
zed-playground/workspaces/py_workspace/docs/visualization-conventions.md
T
2026-03-06 17:17:59 +08:00

22 KiB
Raw Blame History

Visualization Conventions & Coordinate Frame Reference

Status: Canonical reference as of 2026-02-08. Applies to: visualize_extrinsics.py, calibrate_extrinsics.py, and inside_network.json.


Executive Summary

The visualize_extrinsics.py script went through multiple iterations of coordinate-frame switching (OpenCV ↔ OpenGL), Plotly camera/view hacks, and partial basis transforms that created compounding confusion about whether the visualization was correct. The root cause was conflating Plotly's scene camera settings with actual data-frame transforms: adjusting camera.up, autorange: "reversed", or eye position changes how you look at the data but does not change the coordinate frame the data lives in.

After several rounds of adding and removing --world-basis, --render-space, and --pose-convention flags, the visualizer was simplified to a single convention:

  • All data is in OpenCV convention (+X right, +Y down, +Z forward).
  • No basis switching. The --world-basis flag was removed.
  • Plotly's scene camera is configured with up = {x:0, y:-1, z:0} so that the OpenCV +Y-down axis renders as "down" on screen.

The confusion was never a bug in the calibration math — it was a visualization-layer problem caused by trying to make Plotly (which defaults to Y-up) display OpenCV data (which is Y-down) without a clear separation between "data frame" and "view frame."


Current Policy Checklist (2026-02-09)

For engineers maintaining or using these tools:

  • calibrate_extrinsics.py: Outputs T_world_from_cam (OpenCV). Auto-aligns to Y-down (gravity along +Y). Writes _meta block.
  • visualize_extrinsics.py: Renders raw JSON data. Ignores _meta. Sets view camera up to -Y.
  • apply_calibration_to_fusion_config.py: Direct pose copy only. No --cv-to-opengl conversion.
  • compare_pose_sets.py: Symmetric inputs (--pose-a-json, --pose-b-json). Heuristic parsing.
  • Conventions: Always OpenCV frame (+X Right, +Y Down, +Z Forward). Units in meters.

Ground Truth Conventions

1. Calibration Output: world_from_cam

calibrate_extrinsics.py stores poses as T_world_from_cam (4×4 homogeneous):

T_world_from_cam = invert_transform(T_cam_from_world)
  • solvePnP returns T_cam_from_world (maps world points into camera frame).
  • The script inverts this before saving to JSON.
  • The translation column T[:3, 3] is the camera center in world coordinates.
  • The rotation columns T[:3, :3] are the camera's local axes expressed in world frame.

JSON format (16 floats, row-major 4×4):

{
  "44289123": {
    "pose": "0.878804 -0.039482 0.475548 -2.155006 0.070301 0.996409 ..."
  }
}

2. Camera-Local Axes (OpenCV)

Every camera's local frame follows the OpenCV pinhole convention:

Axis Direction Color in visualizer
+X Right Red
+Y Down Green
+Z Forward (into scene) Blue

The frustum is drawn along the camera's local +Z axis. The four corners of the frustum's far plane are at (±w, ±h, frustum_scale) in camera-local coordinates.

3. Metadata & Auto-Alignment (_meta)

calibrate_extrinsics.py now writes a _meta key to the output JSON. This metadata describes the conventions used but is optional for consumers (like the visualizer) which may ignore it.

{
  "_meta": {
    "pose_convention": "world_from_cam",
    "frame_convention": "opencv",
    "auto_aligned": true,
    "gravity_direction_world": [0.0, 1.0, 0.0],
    "alignment": { ... }
  },
  "SN123": { ... }
}
  • pose_convention: Always world_from_cam.
  • frame_convention: Always opencv (+X Right, +Y Down, +Z Forward).
  • auto_aligned: true if --auto-align was used.
  • gravity_direction_world: The vector in world space that points "down" (gravity).
    • For Y-down (current default), this is [0, 1, 0].
    • For Y-up (legacy/Fusion), this would be [0, -1, 0].

Auto-Alignment Target: The --auto-align flag now targets a Y-down world frame by default (target_axis=[0, -1, 0] in internal logic, resulting in gravity pointing +Y).

  • Old behavior: Targeted Y-up (gravity along -Y).
  • New behavior: Targets Y-down (gravity along +Y) to match the OpenCV camera frame convention.
  • Result: The ground plane is still XZ, but "up" is -Y and "down" is +Y.

4. Plotly Scene/Camera Interpretation Pitfalls

Plotly's 3D scene has its own camera model that controls how you view the data:

Plotly setting What it does What it does NOT do
camera.up Sets which direction is "up" on screen Does not transform data coordinates
camera.eye Sets the viewpoint position Does not change axis orientation
yaxis.autorange = "reversed" Flips the Y axis tick direction Does not negate Y data values
aspectmode = "data" Preserves metric proportions Does not imply any convention

Critical insight: Changing camera.up from {y:1} to {y:-1} makes the plot look like Y-down is rendered correctly, but the underlying Plotly axis still runs bottom-to-top by default. This is purely a view transform — the data coordinates are unchanged.


Historical Confusion Timeline

This section documents the sequence of changes that led to confusion, for future reference. All commits are on visualize_extrinsics.py.

Phase 1: Initial Plotly Rewrite (7b9782a)

  • Rewrote the visualizer from matplotlib to Plotly with a --diagnose mode.
  • Used Plotly defaults (Y-up). OpenCV data (Y-down) appeared "upside down."
  • Frustums pointed in the correct direction in data space but looked inverted.

Phase 2: Y-Up Enforcement (a8d3751)

  • Attempted to fix by setting camera.up = {y:1} and using autorange: "reversed".
  • This made the view look correct for some angles but introduced axis-label confusion.
  • The Y axis ticks ran in the opposite direction from the data, misleading users.

Phase 3: Render-Space Option (ab88a24)

  • Added --render-space flag to switch between "cv" and "opengl" rendering.
  • The OpenGL path applied a basis-change matrix diag(1, -1, -1) to all data.
  • This actually transformed the data, not just the view — a correct approach but introduced a second code path that was hard to validate.

Phase 4: Ground Plane & Origin Triad (18e8142, 57f0dff)

  • Added ground plane overlay and world-origin axis triad.
  • These were drawn in the data frame, so they were correct in CV mode but appeared wrong in OpenGL mode (the basis transform was applied inconsistently to some elements but not others).

Phase 5: --world-basis with Global Transform (79f2ab0)

  • Renamed --render-space to --world-basis with cv and opengl options.
  • Introduced world_to_plot() as a central transform function.
  • In opengl mode: world_to_plot applied diag(1, -1, -1) to all points.
  • Problem: The Plotly camera.up and axis labels were not always updated consistently with the basis choice, leading to "it looks right from one angle but wrong from another" reports.

Phase 6: Restore After Removal (6330e0e)

  • --world-basis was briefly removed, then restored due to user request.
  • This back-and-forth left the README with stale documentation referencing both the old and new interfaces.

Phase 7: Final Cleanup — CV Only (d07c244)

  • Removed --world-basis entirely.
  • world_to_plot() became a no-op (identity function).
  • Plotly camera set to up = {x:0, y:-1, z:0} to render Y-down correctly.
  • Axis labels explicitly set to X (Right), Y (Down), Z (Forward).
  • Added --origin-axes-scale for independent control of the origin triad size.
  • Removed --diagnose, --pose-convention, and --render-space flags.

This is the current state.


Peculiar Behaviors Catalog

# Symptom Root Cause Fix / Explanation
1 Frustum appears to point in "-Z" direction Plotly default camera has Y-up; OpenCV frustum points +Z which looks "backward" when viewed from a Y-up perspective Set camera.up = {y:-1} (done in current code). The frustum is correct in data space.
2 Switching to --world-basis opengl makes some elements flip but not others The world_to_plot() transform was applied to camera traces but not consistently to ground plane or origin triad Removed --world-basis. Single convention eliminates partial-transform bugs.
3 yaxis.autorange = "reversed" makes ticks confusing Plotly reverses the tick labels but the data coordinates stay the same. Users see "0 at top, -2 at bottom" which contradicts Y-down intuition. Removed autorange: reversed. Use camera.up = {y:-1} instead, which rotates the view without mangling tick labels.
4 Camera positions don't match inside_network.json inside_network.json stores poses in the ZED Fusion coordinate frame (gravity-aligned, Y-up). calibrate_extrinsics.py stores poses in the ArUco marker object's frame (Y-down if the marker board is horizontal). These are different world frames. Not a bug. The two systems use different world origins and orientations. To compare, you must apply the alignment transform between the two frames. See FAQ below.
5 Origin triad too small or too large relative to cameras Origin triad defaulted to --scale (camera axis size), which is often much smaller than the camera spread Use --origin-axes-scale 0.6 (or similar) independently of --scale.
6 Bird-eye view shows unexpected orientation --birdseye uses orthographic projection looking down the Y axis. In CV convention, Y is "down" so this is looking from below the scene upward. Expected behavior. The bird-eye view shows the X-Z plane as seen from the -Y direction (below the cameras).

Known Pitfalls & Common Confusions

  1. Y-Up vs Y-Down:

    • OpenCV/ArUco: Y is Down. Gravity is [0, 1, 0].
    • OpenGL/Fusion: Y is Up. Gravity is [0, -1, 0].
    • Pitfall: Assuming "Y is vertical" is ambiguous. You must know the sign.
  2. Frame Mismatch vs Origin Mismatch:

    • Two pose sets can have the same convention (e.g., both Y-down) but different world origins (e.g., one at marker, one at camera 1).
    • Fix: Use compare_pose_sets.py to align them rigidly before comparing errors.
  3. Visualizer "Inversion":

    • The visualizer sets camera.up = {y:-1}. This makes the scene look "normal" (floor at bottom) even though the data is Y-down.
    • Pitfall: Don't try to "fix" the data to match the view. The view is already compensating for the data.

Canonical Rules Going Forward

  1. Single convention: All visualization data is in OpenCV frame. No basis switching.
  2. world_to_plot() is identity: It exists as a hook but performs no transform. If a future need arises for basis conversion, it should be the only place it happens.
  3. Plotly camera settings are view-only: Never use autorange: reversed or axis negation to simulate a coordinate change. Use camera.up and camera.eye only.
  4. Poses are world_from_cam: The 4×4 matrix maps camera-local points to world. Translation = camera position in world. Rotation columns = camera axes in world.
  5. Colors are RGB = XYZ: Red = X (right), Green = Y (down), Blue = Z (forward). This applies to both per-camera axis triads and the world-origin triad.
  6. Units are meters: Consistent with marker parquet geometry and calibration output.

Current CLI Behavior

Available Flags

visualize_extrinsics.py
  -i, --input TEXT          [required] Path to JSON extrinsics file
  -o, --output TEXT         Output path (.html or .png)
  --show                    Open interactive Plotly viewer
  --scale FLOAT             Camera axis length (default: 0.2)
  --frustum-scale FLOAT     Frustum depth (default: 0.5)
  --fov FLOAT               Horizontal FOV degrees (default: 60.0)
  --birdseye                Top-down orthographic view
  --show-ground/--no-show-ground    Ground plane toggle
  --ground-y FLOAT          Ground plane Y position (default: 0.0)
  --ground-size FLOAT       Ground plane side length (default: 8.0)
  --show-origin-axes/--no-show-origin-axes  Origin triad toggle (default: on)
  --origin-axes-scale FLOAT Origin triad size (defaults to --scale)
  --zed-configs TEXT         ZED calibration file(s) for accurate frustums
  --resolution [FHD1200|FHD|2K|HD|SVGA|VGA]
  --eye [left|right]

Removed Flags (Historical Only)

Flag Removed In Reason
--world-basis d07c244 Caused partial/inconsistent transforms. Single CV convention is simpler.
--pose-convention d07c244 Only world_from_cam is supported. No need for a flag.
--diagnose d07c244 Diagnostic checks moved out of the visualizer.
--render-space 79f2ab0 Renamed to --world-basis, then removed.

Note

: The README.md still contains stale references to --world-basis, --pose-convention, and --diagnose in the Troubleshooting section. These should be cleaned up to match the current CLI.


Verification Playbook

Quick Sanity Check

# Render with origin triad at 0.6m scale, save as PNG
uv run visualize_extrinsics.py \
  --input output/e2e_refine_depth_smoke_rerun.json \
  --output output/_final_opencv_origin_axes_scaled.png \
  --origin-axes-scale 0.6

Expected result:

  • Origin triad at (0,0,0) with Red→+X (right), Green→+Y (down), Blue→+Z (forward).
  • Camera frustums pointing along each camera's local +Z (blue axis).
  • Camera positions spread out in world space (not bunched at origin).
  • Y values for cameras should be negative (cameras are above the marker board, which is at Y≈0; "above" in CV convention means negative Y).

Interactive Validation

# Open interactive HTML for rotation/inspection
uv run visualize_extrinsics.py \
  --input output/e2e_refine_depth_smoke_rerun.json \
  --show \
  --origin-axes-scale 0.6

What to check:

  1. Rotate the view: The origin triad should remain consistent — Red/Green/Blue always point in the same data-space directions regardless of view angle.
  2. Hover over camera centers: Tooltip shows the camera serial number.
  3. Frustum orientation: Each frustum's open end faces away from the camera center along the camera's blue (Z) axis.

Bird-Eye Sanity Check

uv run visualize_extrinsics.py \
  --input output/e2e_refine_depth_smoke_rerun.json \
  --birdseye --show \
  --origin-axes-scale 0.6

Expected: Top-down view of the X-Z plane. Cameras should form a recognizable spatial layout matching the physical installation. The Red (X) axis points right, Blue (Z) axis points "up" on screen (forward in world).


FAQ

"Why does an OpenGL-like view look strange?"

Because the data is in OpenCV convention (Y-down, Z-forward) and Plotly defaults to Y-up. When you try to make Plotly act like an OpenGL viewer (Y-up, Z-backward), you need to either:

  1. Transform all data by applying diag(1, -1, -1) — correct but doubles the code paths and creates consistency risks.
  2. Adjust the Plotly camera — only changes the view, not the data. Axis labels and hover values still show CV coordinates.

We chose option (2) with camera.up = {y:-1}: minimal code, no data transformation, axis labels match the actual coordinate values. The trade-off is that the default Plotly orbit feels "inverted" compared to a Y-up 3D viewer. This is expected.

"Does flipping axes in the view equal changing the world frame?"

No. Plotly's camera.up, camera.eye, and autorange: reversed are purely view transforms. They change how the data is displayed but not what the coordinates mean. The data always lives in the frame it was computed in (OpenCV/ArUco world frame).

If you set camera.up = {y:1} (Plotly default), the plot will render Y-up on screen, but the data values are still Y-down. This creates a visual inversion that looks like "the cameras are upside down" — they're not; the view is just flipped.

"How do I compare with the C++ viewer and inside_network.json?"

The C++ ZED Fusion viewer and inside_network.json use a different world frame than calibrate_extrinsics.py:

Property calibrate_extrinsics.py ZED Fusion / inside_network.json
World origin ArUco marker object center Gravity-aligned, first camera or user-defined
Y direction Down (OpenCV) Up (gravity-aligned)
Pose meaning T_world_from_cam T_world_from_cam (same semantics, different world)
Units Meters Meters

To compare numerically:

  1. The relative poses between cameras should match (up to the alignment transform).
  2. The absolute positions will differ because the world origins are different.
  3. To convert: apply the alignment rotation that maps the ArUco world frame to the Fusion world frame. If --auto-align was used with a ground face, the ArUco frame is partially aligned (ground = XZ plane), but the origin and yaw may still differ.

Quick visual comparison: Look at the shape of the camera arrangement (distances and angles between cameras), not the absolute positions. If the shape matches, the calibration is consistent.

"Why are camera Y-positions negative?"

In OpenCV convention, +Y is down. Cameras mounted above the marker board (which defines Y≈0) have negative Y values. This is correct. A camera at Y = -1.3 is 1.3 meters above the board.

"What does inside_network.json camera 41831756's pose mean?"

Translation: [0.0, -1.175, 0.0]
Rotation: Identity

This camera is the reference frame origin (identity rotation) positioned 1.175m in the -Y direction. In the Fusion frame (Y-up), this means 1.175m below the world origin. In practice, this is the height offset of the camera relative to the Fusion coordinate system's origin.


Methodology: Comparing Different World Frames

Since inside_network.json (Fusion) and calibrate_extrinsics.py (ArUco) use different world origins, raw coordinate comparison is meaningless. We validated consistency using rigid SE(3) alignment:

  1. Match Serials: Identify cameras present in both JSON files.
  2. Extract Centers: Extract the translation column t from T_world_from_cam for each camera.
    • Crucial: Both systems use T_world_from_cam. It is not cam_from_world.
  3. Compute Alignment: Solve for the rigid transform (R_align, t_align) that minimizes the distance between the two point sets (Kabsch algorithm).
    • Scale is fixed at 1.0 (both systems use meters).
  4. Apply & Compare:
    • Transform Fusion points: P_aligned = R_align * P_fusion + t_align.
    • Position Residual: || P_aruco - P_aligned ||.
    • Orientation Check: Apply R_align to Fusion rotation matrices and compare column vectors (Right/Down/Forward) with ArUco rotations.
  5. Up-Vector Verification:
    • Fusion uses Y-Up (gravity). ArUco uses Y-Down (image).
    • After alignment, the transformed Fusion Y-axis should be approximately parallel to the ArUco -Y axis (or +Y depending on the specific alignment solution found, but they must be collinear with gravity).

Result: The overlay images in output/ were generated using this aligned frame. The low residuals (<2cm) confirm that the internal calibration is consistent, even though the absolute world coordinates differ.


compare_pose_sets.py Input Formats

The compare_pose_sets.py tool is designed to be agnostic to the source of the JSON files. It uses a symmetric, heuristic parser for both --pose-a-json and --pose-b-json.

Accepted JSON Schemas

The parser automatically detects and handles either of these two structures for any input file:

1. Flat Format (Standard Output) Used by calibrate_extrinsics.py and refine_extrinsics.py.

{
  "SERIAL_NUMBER": {
    "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1"
  }
}

2. Nested Fusion Format Used by ZED Fusion inside_network.json configuration files.

{
  "SERIAL_NUMBER": {
    "FusionConfiguration": {
      "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1"
    }
  }
}

Key Behaviors

  1. Interchangeability: You can swap inputs. Comparing A (ArUco) vs B (Fusion) is valid, as is A (Fusion) vs B (ArUco). The script aligns B to A.
  2. Pose Semantics: All poses are interpreted as T_world_from_cam (camera-to-world). The script does not invert matrices; it assumes the input strings are already in the correct convention.
  3. Minimum Overlap: The script requires at least 3 shared camera serials between the two files to compute a rigid alignment.
  4. Heuristic Parsing: For each serial key, the parser looks for FusionConfiguration.pose first, then falls back to pose.

Example: Swapped Inputs

Since the parser is symmetric, you can verify consistency by reversing the alignment direction:

# Align Fusion (B) to ArUco (A)
uv run compare_pose_sets.py \
    --pose-a-json output/e2e_refine_depth.json \
    --pose-b-json ../zed_settings/inside_network.json \
    --report-json output/report_aruco_ref.json

# Align ArUco (B) to Fusion (A)
uv run compare_pose_sets.py \
    --pose-a-json ../zed_settings/inside_network.json \
    --pose-b-json output/e2e_refine_depth.json \
    --report-json output/report_fusion_ref.json

Appendix: Stale README References

The following lines in py_workspace/README.md reference removed flags and should be updated:

  • Line ~104: References --pose-convention (removed).
  • Line ~105: References --world-basis opengl (removed).
  • Line ~116: References --diagnose (removed).

These were left from earlier iterations and do not reflect the current CLI.