# Visualization Conventions & Coordinate Frame Reference > **Status**: Canonical reference as of 2026-02-08. > **Applies to**: `visualize_extrinsics.py`, `calibrate_extrinsics.py`, and `inside_network.json`. --- ## Executive Summary The `visualize_extrinsics.py` script went through multiple iterations of coordinate-frame switching (OpenCV ↔ OpenGL), Plotly camera/view hacks, and partial basis transforms that created compounding confusion about whether the visualization was correct. The root cause was **conflating Plotly's scene camera settings with actual data-frame transforms**: adjusting `camera.up`, `autorange: "reversed"`, or eye position changes *how you look at the data* but does **not** change the coordinate frame the data lives in. After several rounds of adding and removing `--world-basis`, `--render-space`, and `--pose-convention` flags, the visualizer was simplified to a single convention: - **All data is in OpenCV convention** (+X right, +Y down, +Z forward). - **No basis switching**. The `--world-basis` flag was removed. - **Plotly's scene camera** is configured with `up = {x:0, y:-1, z:0}` so that the OpenCV +Y-down axis renders as "down" on screen. The confusion was never a bug in the calibration math — it was a visualization-layer problem caused by trying to make Plotly (which defaults to Y-up) display OpenCV data (which is Y-down) without a clear separation between "data frame" and "view frame." --- ## Ground Truth Conventions ### 1. Calibration Output: `world_from_cam` `calibrate_extrinsics.py` stores poses as **T_world_from_cam** (4×4 homogeneous): ``` T_world_from_cam = invert_transform(T_cam_from_world) ``` - `solvePnP` returns `T_cam_from_world` (maps world points into camera frame). - The script **inverts** this before saving to JSON. - The translation column `T[:3, 3]` is the **camera center in world coordinates**. - The rotation columns `T[:3, :3]` are the camera's local axes expressed in world frame. **JSON format** (16 floats, row-major 4×4): ```json { "44289123": { "pose": "0.878804 -0.039482 0.475548 -2.155006 0.070301 0.996409 ..." } } ``` ### 2. Camera-Local Axes (OpenCV) Every camera's local frame follows the OpenCV pinhole convention: | Axis | Direction | Color in visualizer | |------|-----------|-------------------| | +X | Right | Red | | +Y | Down | Green | | +Z | Forward (into scene) | Blue | The frustum is drawn along the camera's local +Z axis. The four corners of the frustum's far plane are at `(±w, ±h, frustum_scale)` in camera-local coordinates. ### 3. Plotly Scene/Camera Interpretation Pitfalls Plotly's 3D scene has its own camera model that controls **how you view** the data: | Plotly setting | What it does | What it does NOT do | |----------------|-------------|-------------------| | `camera.up` | Sets which direction is "up" on screen | Does not transform data coordinates | | `camera.eye` | Sets the viewpoint position | Does not change axis orientation | | `yaxis.autorange = "reversed"` | Flips the Y axis tick direction | Does not negate Y data values | | `aspectmode = "data"` | Preserves metric proportions | Does not imply any convention | **Critical insight**: Changing `camera.up` from `{y:1}` to `{y:-1}` makes the plot *look* like Y-down is rendered correctly, but the underlying Plotly axis still runs bottom-to-top by default. This is purely a view transform — the data coordinates are unchanged. --- ## Historical Confusion Timeline This section documents the sequence of changes that led to confusion, for future reference. All commits are on `visualize_extrinsics.py`. ### Phase 1: Initial Plotly Rewrite (`7b9782a`) - Rewrote the visualizer from matplotlib to Plotly with a `--diagnose` mode. - Used Plotly defaults (Y-up). OpenCV data (Y-down) appeared "upside down." - Frustums pointed in the correct direction in data space but *looked* inverted. ### Phase 2: Y-Up Enforcement (`a8d3751`) - Attempted to fix by setting `camera.up = {y:1}` and using `autorange: "reversed"`. - This made the view *look* correct for some angles but introduced axis-label confusion. - The Y axis ticks ran in the opposite direction from the data, misleading users. ### Phase 3: Render-Space Option (`ab88a24`) - Added `--render-space` flag to switch between "cv" and "opengl" rendering. - The OpenGL path applied a basis-change matrix `diag(1, -1, -1)` to all data. - This actually transformed the data, not just the view — a correct approach but introduced a second code path that was hard to validate. ### Phase 4: Ground Plane & Origin Triad (`18e8142`, `57f0dff`) - Added ground plane overlay and world-origin axis triad. - These were drawn in the *data* frame, so they were correct in CV mode but appeared wrong in OpenGL mode (the basis transform was applied inconsistently to some elements but not others). ### Phase 5: `--world-basis` with Global Transform (`79f2ab0`) - Renamed `--render-space` to `--world-basis` with `cv` and `opengl` options. - Introduced `world_to_plot()` as a central transform function. - In `opengl` mode: `world_to_plot` applied `diag(1, -1, -1)` to all points. - **Problem**: The Plotly `camera.up` and axis labels were not always updated consistently with the basis choice, leading to "it looks right from one angle but wrong from another" reports. ### Phase 6: Restore After Removal (`6330e0e`) - `--world-basis` was briefly removed, then restored due to user request. - This back-and-forth left the README with stale documentation referencing both the old and new interfaces. ### Phase 7: Final Cleanup — CV Only (`d07c244`) - **Removed `--world-basis` entirely.** - `world_to_plot()` became a no-op (identity function). - Plotly camera set to `up = {x:0, y:-1, z:0}` to render Y-down correctly. - Axis labels explicitly set to `X (Right)`, `Y (Down)`, `Z (Forward)`. - Added `--origin-axes-scale` for independent control of the origin triad size. - Removed `--diagnose`, `--pose-convention`, and `--render-space` flags. **This is the current state.** --- ## Peculiar Behaviors Catalog | # | Symptom | Root Cause | Fix / Explanation | |---|---------|-----------|-------------------| | 1 | Frustum appears to point in "-Z" direction | Plotly default camera has Y-up; OpenCV frustum points +Z which looks "backward" when viewed from a Y-up perspective | Set `camera.up = {y:-1}` (done in current code). The frustum is correct in data space. | | 2 | Switching to `--world-basis opengl` makes some elements flip but not others | The `world_to_plot()` transform was applied to camera traces but not consistently to ground plane or origin triad | Removed `--world-basis`. Single convention eliminates partial-transform bugs. | | 3 | `yaxis.autorange = "reversed"` makes ticks confusing | Plotly reverses the tick labels but the data coordinates stay the same. Users see "0 at top, -2 at bottom" which contradicts Y-down intuition. | Removed `autorange: reversed`. Use `camera.up = {y:-1}` instead, which rotates the view without mangling tick labels. | | 4 | Camera positions don't match `inside_network.json` | `inside_network.json` stores poses in the ZED Fusion coordinate frame (gravity-aligned, Y-up). `calibrate_extrinsics.py` stores poses in the ArUco marker object's frame (Y-down if the marker board is horizontal). These are **different world frames**. | Not a bug. The two systems use different world origins and orientations. To compare, you must apply the alignment transform between the two frames. See FAQ below. | | 5 | Origin triad too small or too large relative to cameras | Origin triad defaulted to `--scale` (camera axis size), which is often much smaller than the camera spread | Use `--origin-axes-scale 0.6` (or similar) independently of `--scale`. | | 6 | Bird-eye view shows unexpected orientation | `--birdseye` uses orthographic projection looking down the Y axis. In CV convention, Y is "down" so this is looking from below the scene upward. | Expected behavior. The bird-eye view shows the X-Z plane as seen from the -Y direction (below the cameras). | --- ## Canonical Rules Going Forward 1. **Single convention**: All visualization data is in OpenCV frame. No basis switching. 2. **`world_to_plot()` is identity**: It exists as a hook but performs no transform. If a future need arises for basis conversion, it should be the *only* place it happens. 3. **Plotly camera settings are view-only**: Never use `autorange: reversed` or axis negation to simulate a coordinate change. Use `camera.up` and `camera.eye` only. 4. **Poses are `world_from_cam`**: The 4×4 matrix maps camera-local points to world. Translation = camera position in world. Rotation columns = camera axes in world. 5. **Colors are RGB = XYZ**: Red = X (right), Green = Y (down), Blue = Z (forward). This applies to both per-camera axis triads and the world-origin triad. 6. **Units are meters**: Consistent with marker parquet geometry and calibration output. --- ## Current CLI Behavior ### Available Flags ``` visualize_extrinsics.py -i, --input TEXT [required] Path to JSON extrinsics file -o, --output TEXT Output path (.html or .png) --show Open interactive Plotly viewer --scale FLOAT Camera axis length (default: 0.2) --frustum-scale FLOAT Frustum depth (default: 0.5) --fov FLOAT Horizontal FOV degrees (default: 60.0) --birdseye Top-down orthographic view --show-ground/--no-show-ground Ground plane toggle --ground-y FLOAT Ground plane Y position (default: 0.0) --ground-size FLOAT Ground plane side length (default: 8.0) --show-origin-axes/--no-show-origin-axes Origin triad toggle (default: on) --origin-axes-scale FLOAT Origin triad size (defaults to --scale) --zed-configs TEXT ZED calibration file(s) for accurate frustums --resolution [FHD1200|FHD|2K|HD|SVGA|VGA] --eye [left|right] ``` ### Removed Flags (Historical Only) | Flag | Removed In | Reason | |------|-----------|--------| | `--world-basis` | `d07c244` | Caused partial/inconsistent transforms. Single CV convention is simpler. | | `--pose-convention` | `d07c244` | Only `world_from_cam` is supported. No need for a flag. | | `--diagnose` | `d07c244` | Diagnostic checks moved out of the visualizer. | | `--render-space` | `79f2ab0` | Renamed to `--world-basis`, then removed. | > **Note**: The README.md still contains stale references to `--world-basis`, > `--pose-convention`, and `--diagnose` in the Troubleshooting section. These should > be cleaned up to match the current CLI. --- ## Verification Playbook ### Quick Sanity Check ```bash # Render with origin triad at 0.6m scale, save as PNG uv run visualize_extrinsics.py \ --input output/e2e_refine_depth_smoke_rerun.json \ --output output/_final_opencv_origin_axes_scaled.png \ --origin-axes-scale 0.6 ``` **Expected result**: - Origin triad at (0,0,0) with Red→+X (right), Green→+Y (down), Blue→+Z (forward). - Camera frustums pointing along each camera's local +Z (blue axis). - Camera positions spread out in world space (not bunched at origin). - Y values for cameras should be negative (cameras are above the marker board, which is at Y≈0; "above" in CV convention means negative Y). ### Interactive Validation ```bash # Open interactive HTML for rotation/inspection uv run visualize_extrinsics.py \ --input output/e2e_refine_depth_smoke_rerun.json \ --show \ --origin-axes-scale 0.6 ``` **What to check**: 1. **Rotate the view**: The origin triad should remain consistent — Red/Green/Blue always point in the same data-space directions regardless of view angle. 2. **Hover over camera centers**: Tooltip shows the camera serial number. 3. **Frustum orientation**: Each frustum's open end faces away from the camera center along the camera's blue (Z) axis. ### Bird-Eye Sanity Check ```bash uv run visualize_extrinsics.py \ --input output/e2e_refine_depth_smoke_rerun.json \ --birdseye --show \ --origin-axes-scale 0.6 ``` **Expected**: Top-down view of the X-Z plane. Cameras should form a recognizable spatial layout matching the physical installation. The Red (X) axis points right, Blue (Z) axis points "up" on screen (forward in world). --- ## FAQ ### "Why does an OpenGL-like view look strange?" Because the data is in OpenCV convention (Y-down, Z-forward) and Plotly defaults to Y-up. When you try to make Plotly act like an OpenGL viewer (Y-up, Z-backward), you need to either: 1. **Transform all data** by applying `diag(1, -1, -1)` — correct but doubles the code paths and creates consistency risks. 2. **Adjust the Plotly camera** — only changes the view, not the data. Axis labels and hover values still show CV coordinates. We chose option (2) with `camera.up = {y:-1}`: minimal code, no data transformation, axis labels match the actual coordinate values. The trade-off is that the default Plotly orbit feels "inverted" compared to a Y-up 3D viewer. This is expected. ### "Does flipping axes in the view equal changing the world frame?" **No.** Plotly's `camera.up`, `camera.eye`, and `autorange: reversed` are purely view transforms. They change how the data is *displayed* but not what the coordinates *mean*. The data always lives in the frame it was computed in (OpenCV/ArUco world frame). If you set `camera.up = {y:1}` (Plotly default), the plot will render Y-up on screen, but the data values are still Y-down. This creates a visual inversion that looks like "the cameras are upside down" — they're not; the view is just flipped. ### "How do I compare with the C++ viewer and `inside_network.json`?" The C++ ZED Fusion viewer and `inside_network.json` use a **different world frame** than `calibrate_extrinsics.py`: | Property | `calibrate_extrinsics.py` | ZED Fusion / `inside_network.json` | |----------|--------------------------|-------------------------------------| | World origin | ArUco marker object center | Gravity-aligned, first camera or user-defined | | Y direction | Down (OpenCV) | Up (gravity-aligned) | | Pose meaning | `T_world_from_cam` | `T_world_from_cam` (same semantics, different world) | | Units | Meters | Meters | To compare numerically: 1. The **relative** poses between cameras should match (up to the alignment transform). 2. The **absolute** positions will differ because the world origins are different. 3. To convert: apply the alignment rotation that maps the ArUco world frame to the Fusion world frame. If `--auto-align` was used with a ground face, the ArUco frame is partially aligned (ground = XZ plane), but the origin and yaw may still differ. **Quick visual comparison**: Look at the *shape* of the camera arrangement (distances and angles between cameras), not the absolute positions. If the shape matches, the calibration is consistent. ### "Why are camera Y-positions negative?" In OpenCV convention, +Y is down. Cameras mounted above the marker board (which defines Y≈0) have negative Y values. This is correct. A camera at `Y = -1.3` is 1.3 meters above the board. ### "What does `inside_network.json` camera 41831756's pose mean?" ``` Translation: [0.0, -1.175, 0.0] Rotation: Identity ``` This camera is the reference frame origin (identity rotation) positioned 1.175m in the -Y direction. In the Fusion frame (Y-up), this means 1.175m *below* the world origin. In practice, this is the height offset of the camera relative to the Fusion coordinate system's origin. --- ## Methodology: Comparing Different World Frames Since `inside_network.json` (Fusion) and `calibrate_extrinsics.py` (ArUco) use different world origins, raw coordinate comparison is meaningless. We validated consistency using **rigid SE(3) alignment**: 1. **Match Serials**: Identify cameras present in both JSON files. 2. **Extract Centers**: Extract the translation column `t` from `T_world_from_cam` for each camera. * **Crucial**: Both systems use `T_world_from_cam`. It is **not** `cam_from_world`. 3. **Compute Alignment**: Solve for the rigid transform `(R_align, t_align)` that minimizes the distance between the two point sets (Kabsch algorithm). * Scale is fixed at 1.0 (both systems use meters). 4. **Apply & Compare**: * Transform Fusion points: `P_aligned = R_align * P_fusion + t_align`. * **Position Residual**: `|| P_aruco - P_aligned ||`. * **Orientation Check**: Apply `R_align` to Fusion rotation matrices and compare column vectors (Right/Down/Forward) with ArUco rotations. 5. **Up-Vector Verification**: * Fusion uses Y-Up (gravity). ArUco uses Y-Down (image). * After alignment, the transformed Fusion Y-axis should be approximately parallel to the ArUco -Y axis (or +Y depending on the specific alignment solution found, but they must be collinear with gravity). **Result**: The overlay images in `output/` were generated using this aligned frame. The low residuals (<2cm) confirm that the internal calibration is consistent, even though the absolute world coordinates differ. --- ## `compare_pose_sets.py` Input Formats The `compare_pose_sets.py` tool is designed to be agnostic to the source of the JSON files. It uses a **symmetric, heuristic parser** for both `--pose-a-json` and `--pose-b-json`. ### Accepted JSON Schemas The parser automatically detects and handles either of these two structures for any input file: **1. Flat Format (Standard Output)** Used by `calibrate_extrinsics.py` and `refine_extrinsics.py`. ```json { "SERIAL_NUMBER": { "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1" } } ``` **2. Nested Fusion Format** Used by ZED Fusion `inside_network.json` configuration files. ```json { "SERIAL_NUMBER": { "FusionConfiguration": { "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1" } } } ``` ### Key Behaviors 1. **Interchangeability**: You can swap inputs. Comparing A (ArUco) vs B (Fusion) is valid, as is A (Fusion) vs B (ArUco). The script aligns B to A. 2. **Pose Semantics**: All poses are interpreted as `T_world_from_cam` (camera-to-world). The script does **not** invert matrices; it assumes the input strings are already in the correct convention. 3. **Minimum Overlap**: The script requires at least **3 shared camera serials** between the two files to compute a rigid alignment. 4. **Heuristic Parsing**: For each serial key, the parser looks for `FusionConfiguration.pose` first, then falls back to `pose`. ### Example: Swapped Inputs Since the parser is symmetric, you can verify consistency by reversing the alignment direction: ```bash # Align Fusion (B) to ArUco (A) uv run compare_pose_sets.py \ --pose-a-json output/e2e_refine_depth.json \ --pose-b-json ../zed_settings/inside_network.json \ --report-json output/report_aruco_ref.json # Align ArUco (B) to Fusion (A) uv run compare_pose_sets.py \ --pose-a-json ../zed_settings/inside_network.json \ --pose-b-json output/e2e_refine_depth.json \ --report-json output/report_fusion_ref.json ``` --- ## Appendix: Stale README References The following lines in `py_workspace/README.md` reference removed flags and should be updated: - **Line ~104**: References `--pose-convention` (removed). - **Line ~105**: References `--world-basis opengl` (removed). - **Line ~116**: References `--diagnose` (removed). These were left from earlier iterations and do not reflect the current CLI.