17 KiB
Visualization Conventions & Coordinate Frame Reference
Status: Canonical reference as of 2026-02-08. Applies to:
visualize_extrinsics.py,calibrate_extrinsics.py, andinside_network.json.
Executive Summary
The visualize_extrinsics.py script went through multiple iterations of coordinate-frame
switching (OpenCV ↔ OpenGL), Plotly camera/view hacks, and partial basis transforms that
created compounding confusion about whether the visualization was correct. The root cause
was conflating Plotly's scene camera settings with actual data-frame transforms:
adjusting camera.up, autorange: "reversed", or eye position changes how you look at
the data but does not change the coordinate frame the data lives in.
After several rounds of adding and removing --world-basis, --render-space, and
--pose-convention flags, the visualizer was simplified to a single convention:
- All data is in OpenCV convention (+X right, +Y down, +Z forward).
- No basis switching. The
--world-basisflag was removed. - Plotly's scene camera is configured with
up = {x:0, y:-1, z:0}so that the OpenCV +Y-down axis renders as "down" on screen.
The confusion was never a bug in the calibration math — it was a visualization-layer problem caused by trying to make Plotly (which defaults to Y-up) display OpenCV data (which is Y-down) without a clear separation between "data frame" and "view frame."
Ground Truth Conventions
1. Calibration Output: world_from_cam
calibrate_extrinsics.py stores poses as T_world_from_cam (4×4 homogeneous):
T_world_from_cam = invert_transform(T_cam_from_world)
solvePnPreturnsT_cam_from_world(maps world points into camera frame).- The script inverts this before saving to JSON.
- The translation column
T[:3, 3]is the camera center in world coordinates. - The rotation columns
T[:3, :3]are the camera's local axes expressed in world frame.
JSON format (16 floats, row-major 4×4):
{
"44289123": {
"pose": "0.878804 -0.039482 0.475548 -2.155006 0.070301 0.996409 ..."
}
}
2. Camera-Local Axes (OpenCV)
Every camera's local frame follows the OpenCV pinhole convention:
| Axis | Direction | Color in visualizer |
|---|---|---|
| +X | Right | Red |
| +Y | Down | Green |
| +Z | Forward (into scene) | Blue |
The frustum is drawn along the camera's local +Z axis. The four corners of the
frustum's far plane are at (±w, ±h, frustum_scale) in camera-local coordinates.
3. Plotly Scene/Camera Interpretation Pitfalls
Plotly's 3D scene has its own camera model that controls how you view the data:
| Plotly setting | What it does | What it does NOT do |
|---|---|---|
camera.up |
Sets which direction is "up" on screen | Does not transform data coordinates |
camera.eye |
Sets the viewpoint position | Does not change axis orientation |
yaxis.autorange = "reversed" |
Flips the Y axis tick direction | Does not negate Y data values |
aspectmode = "data" |
Preserves metric proportions | Does not imply any convention |
Critical insight: Changing camera.up from {y:1} to {y:-1} makes the plot
look like Y-down is rendered correctly, but the underlying Plotly axis still runs
bottom-to-top by default. This is purely a view transform — the data coordinates are
unchanged.
Historical Confusion Timeline
This section documents the sequence of changes that led to confusion, for future
reference. All commits are on visualize_extrinsics.py.
Phase 1: Initial Plotly Rewrite (7b9782a)
- Rewrote the visualizer from matplotlib to Plotly with a
--diagnosemode. - Used Plotly defaults (Y-up). OpenCV data (Y-down) appeared "upside down."
- Frustums pointed in the correct direction in data space but looked inverted.
Phase 2: Y-Up Enforcement (a8d3751)
- Attempted to fix by setting
camera.up = {y:1}and usingautorange: "reversed". - This made the view look correct for some angles but introduced axis-label confusion.
- The Y axis ticks ran in the opposite direction from the data, misleading users.
Phase 3: Render-Space Option (ab88a24)
- Added
--render-spaceflag to switch between "cv" and "opengl" rendering. - The OpenGL path applied a basis-change matrix
diag(1, -1, -1)to all data. - This actually transformed the data, not just the view — a correct approach but introduced a second code path that was hard to validate.
Phase 4: Ground Plane & Origin Triad (18e8142, 57f0dff)
- Added ground plane overlay and world-origin axis triad.
- These were drawn in the data frame, so they were correct in CV mode but appeared wrong in OpenGL mode (the basis transform was applied inconsistently to some elements but not others).
Phase 5: --world-basis with Global Transform (79f2ab0)
- Renamed
--render-spaceto--world-basiswithcvandopengloptions. - Introduced
world_to_plot()as a central transform function. - In
openglmode:world_to_plotapplieddiag(1, -1, -1)to all points. - Problem: The Plotly
camera.upand axis labels were not always updated consistently with the basis choice, leading to "it looks right from one angle but wrong from another" reports.
Phase 6: Restore After Removal (6330e0e)
--world-basiswas briefly removed, then restored due to user request.- This back-and-forth left the README with stale documentation referencing both the old and new interfaces.
Phase 7: Final Cleanup — CV Only (d07c244)
- Removed
--world-basisentirely. world_to_plot()became a no-op (identity function).- Plotly camera set to
up = {x:0, y:-1, z:0}to render Y-down correctly. - Axis labels explicitly set to
X (Right),Y (Down),Z (Forward). - Added
--origin-axes-scalefor independent control of the origin triad size. - Removed
--diagnose,--pose-convention, and--render-spaceflags.
This is the current state.
Peculiar Behaviors Catalog
| # | Symptom | Root Cause | Fix / Explanation |
|---|---|---|---|
| 1 | Frustum appears to point in "-Z" direction | Plotly default camera has Y-up; OpenCV frustum points +Z which looks "backward" when viewed from a Y-up perspective | Set camera.up = {y:-1} (done in current code). The frustum is correct in data space. |
| 2 | Switching to --world-basis opengl makes some elements flip but not others |
The world_to_plot() transform was applied to camera traces but not consistently to ground plane or origin triad |
Removed --world-basis. Single convention eliminates partial-transform bugs. |
| 3 | yaxis.autorange = "reversed" makes ticks confusing |
Plotly reverses the tick labels but the data coordinates stay the same. Users see "0 at top, -2 at bottom" which contradicts Y-down intuition. | Removed autorange: reversed. Use camera.up = {y:-1} instead, which rotates the view without mangling tick labels. |
| 4 | Camera positions don't match inside_network.json |
inside_network.json stores poses in the ZED Fusion coordinate frame (gravity-aligned, Y-up). calibrate_extrinsics.py stores poses in the ArUco marker object's frame (Y-down if the marker board is horizontal). These are different world frames. |
Not a bug. The two systems use different world origins and orientations. To compare, you must apply the alignment transform between the two frames. See FAQ below. |
| 5 | Origin triad too small or too large relative to cameras | Origin triad defaulted to --scale (camera axis size), which is often much smaller than the camera spread |
Use --origin-axes-scale 0.6 (or similar) independently of --scale. |
| 6 | Bird-eye view shows unexpected orientation | --birdseye uses orthographic projection looking down the Y axis. In CV convention, Y is "down" so this is looking from below the scene upward. |
Expected behavior. The bird-eye view shows the X-Z plane as seen from the -Y direction (below the cameras). |
Canonical Rules Going Forward
- Single convention: All visualization data is in OpenCV frame. No basis switching.
world_to_plot()is identity: It exists as a hook but performs no transform. If a future need arises for basis conversion, it should be the only place it happens.- Plotly camera settings are view-only: Never use
autorange: reversedor axis negation to simulate a coordinate change. Usecamera.upandcamera.eyeonly. - Poses are
world_from_cam: The 4×4 matrix maps camera-local points to world. Translation = camera position in world. Rotation columns = camera axes in world. - Colors are RGB = XYZ: Red = X (right), Green = Y (down), Blue = Z (forward). This applies to both per-camera axis triads and the world-origin triad.
- Units are meters: Consistent with marker parquet geometry and calibration output.
Current CLI Behavior
Available Flags
visualize_extrinsics.py
-i, --input TEXT [required] Path to JSON extrinsics file
-o, --output TEXT Output path (.html or .png)
--show Open interactive Plotly viewer
--scale FLOAT Camera axis length (default: 0.2)
--frustum-scale FLOAT Frustum depth (default: 0.5)
--fov FLOAT Horizontal FOV degrees (default: 60.0)
--birdseye Top-down orthographic view
--show-ground/--no-show-ground Ground plane toggle
--ground-y FLOAT Ground plane Y position (default: 0.0)
--ground-size FLOAT Ground plane side length (default: 8.0)
--show-origin-axes/--no-show-origin-axes Origin triad toggle (default: on)
--origin-axes-scale FLOAT Origin triad size (defaults to --scale)
--zed-configs TEXT ZED calibration file(s) for accurate frustums
--resolution [FHD1200|FHD|2K|HD|SVGA|VGA]
--eye [left|right]
Removed Flags (Historical Only)
| Flag | Removed In | Reason |
|---|---|---|
--world-basis |
d07c244 |
Caused partial/inconsistent transforms. Single CV convention is simpler. |
--pose-convention |
d07c244 |
Only world_from_cam is supported. No need for a flag. |
--diagnose |
d07c244 |
Diagnostic checks moved out of the visualizer. |
--render-space |
79f2ab0 |
Renamed to --world-basis, then removed. |
Note
: The README.md still contains stale references to
--world-basis,--pose-convention, and--diagnosein the Troubleshooting section. These should be cleaned up to match the current CLI.
Verification Playbook
Quick Sanity Check
# Render with origin triad at 0.6m scale, save as PNG
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--output output/_final_opencv_origin_axes_scaled.png \
--origin-axes-scale 0.6
Expected result:
- Origin triad at (0,0,0) with Red→+X (right), Green→+Y (down), Blue→+Z (forward).
- Camera frustums pointing along each camera's local +Z (blue axis).
- Camera positions spread out in world space (not bunched at origin).
- Y values for cameras should be negative (cameras are above the marker board, which is at Y≈0; "above" in CV convention means negative Y).
Interactive Validation
# Open interactive HTML for rotation/inspection
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--show \
--origin-axes-scale 0.6
What to check:
- Rotate the view: The origin triad should remain consistent — Red/Green/Blue always point in the same data-space directions regardless of view angle.
- Hover over camera centers: Tooltip shows the camera serial number.
- Frustum orientation: Each frustum's open end faces away from the camera center along the camera's blue (Z) axis.
Bird-Eye Sanity Check
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--birdseye --show \
--origin-axes-scale 0.6
Expected: Top-down view of the X-Z plane. Cameras should form a recognizable spatial layout matching the physical installation. The Red (X) axis points right, Blue (Z) axis points "up" on screen (forward in world).
FAQ
"Why does an OpenGL-like view look strange?"
Because the data is in OpenCV convention (Y-down, Z-forward) and Plotly defaults to Y-up. When you try to make Plotly act like an OpenGL viewer (Y-up, Z-backward), you need to either:
- Transform all data by applying
diag(1, -1, -1)— correct but doubles the code paths and creates consistency risks. - Adjust the Plotly camera — only changes the view, not the data. Axis labels and hover values still show CV coordinates.
We chose option (2) with camera.up = {y:-1}: minimal code, no data transformation,
axis labels match the actual coordinate values. The trade-off is that the default
Plotly orbit feels "inverted" compared to a Y-up 3D viewer. This is expected.
"Does flipping axes in the view equal changing the world frame?"
No. Plotly's camera.up, camera.eye, and autorange: reversed are purely
view transforms. They change how the data is displayed but not what the coordinates
mean. The data always lives in the frame it was computed in (OpenCV/ArUco world frame).
If you set camera.up = {y:1} (Plotly default), the plot will render Y-up on screen,
but the data values are still Y-down. This creates a visual inversion that looks like
"the cameras are upside down" — they're not; the view is just flipped.
"How do I compare with the C++ viewer and inside_network.json?"
The C++ ZED Fusion viewer and inside_network.json use a different world frame
than calibrate_extrinsics.py:
| Property | calibrate_extrinsics.py |
ZED Fusion / inside_network.json |
|---|---|---|
| World origin | ArUco marker object center | Gravity-aligned, first camera or user-defined |
| Y direction | Down (OpenCV) | Up (gravity-aligned) |
| Pose meaning | T_world_from_cam |
T_world_from_cam (same semantics, different world) |
| Units | Meters | Meters |
To compare numerically:
- The relative poses between cameras should match (up to the alignment transform).
- The absolute positions will differ because the world origins are different.
- To convert: apply the alignment rotation that maps the ArUco world frame to the
Fusion world frame. If
--auto-alignwas used with a ground face, the ArUco frame is partially aligned (ground = XZ plane), but the origin and yaw may still differ.
Quick visual comparison: Look at the shape of the camera arrangement (distances and angles between cameras), not the absolute positions. If the shape matches, the calibration is consistent.
"Why are camera Y-positions negative?"
In OpenCV convention, +Y is down. Cameras mounted above the marker board (which defines
Y≈0) have negative Y values. This is correct. A camera at Y = -1.3 is 1.3 meters
above the board.
"What does inside_network.json camera 41831756's pose mean?"
Translation: [0.0, -1.175, 0.0]
Rotation: Identity
This camera is the reference frame origin (identity rotation) positioned 1.175m in the -Y direction. In the Fusion frame (Y-up), this means 1.175m below the world origin. In practice, this is the height offset of the camera relative to the Fusion coordinate system's origin.
Methodology: Comparing Different World Frames
Since inside_network.json (Fusion) and calibrate_extrinsics.py (ArUco) use different
world origins, raw coordinate comparison is meaningless. We validated consistency using
rigid SE(3) alignment:
- Match Serials: Identify cameras present in both JSON files.
- Extract Centers: Extract the translation column
tfromT_world_from_camfor each camera.- Crucial: Both systems use
T_world_from_cam. It is notcam_from_world.
- Crucial: Both systems use
- Compute Alignment: Solve for the rigid transform
(R_align, t_align)that minimizes the distance between the two point sets (Kabsch algorithm).- Scale is fixed at 1.0 (both systems use meters).
- Apply & Compare:
- Transform Fusion points:
P_aligned = R_align * P_fusion + t_align. - Position Residual:
|| P_aruco - P_aligned ||. - Orientation Check: Apply
R_alignto Fusion rotation matrices and compare column vectors (Right/Down/Forward) with ArUco rotations.
- Transform Fusion points:
- Up-Vector Verification:
- Fusion uses Y-Up (gravity). ArUco uses Y-Down (image).
- After alignment, the transformed Fusion Y-axis should be approximately parallel to the ArUco -Y axis (or +Y depending on the specific alignment solution found, but they must be collinear with gravity).
Result: The overlay images in output/ were generated using this aligned frame.
The low residuals (<2cm) confirm that the internal calibration is consistent, even
though the absolute world coordinates differ.
Appendix: Stale README References
The following lines in py_workspace/README.md reference removed flags and should be
updated:
- Line ~104: References
--pose-convention(removed). - Line ~105: References
--world-basis opengl(removed). - Line ~116: References
--diagnose(removed).
These were left from earlier iterations and do not reflect the current CLI.