Files
zed-playground/py_workspace/docs/visualization-conventions.md
T

365 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Visualization Conventions & Coordinate Frame Reference
> **Status**: Canonical reference as of 2026-02-08.
> **Applies to**: `visualize_extrinsics.py`, `calibrate_extrinsics.py`, and `inside_network.json`.
---
## Executive Summary
The `visualize_extrinsics.py` script went through multiple iterations of coordinate-frame
switching (OpenCV ↔ OpenGL), Plotly camera/view hacks, and partial basis transforms that
created compounding confusion about whether the visualization was correct. The root cause
was **conflating Plotly's scene camera settings with actual data-frame transforms**:
adjusting `camera.up`, `autorange: "reversed"`, or eye position changes *how you look at
the data* but does **not** change the coordinate frame the data lives in.
After several rounds of adding and removing `--world-basis`, `--render-space`, and
`--pose-convention` flags, the visualizer was simplified to a single convention:
- **All data is in OpenCV convention** (+X right, +Y down, +Z forward).
- **No basis switching**. The `--world-basis` flag was removed.
- **Plotly's scene camera** is configured with `up = {x:0, y:-1, z:0}` so that the
OpenCV +Y-down axis renders as "down" on screen.
The confusion was never a bug in the calibration math — it was a visualization-layer
problem caused by trying to make Plotly (which defaults to Y-up) display OpenCV data
(which is Y-down) without a clear separation between "data frame" and "view frame."
---
## Ground Truth Conventions
### 1. Calibration Output: `world_from_cam`
`calibrate_extrinsics.py` stores poses as **T_world_from_cam** (4×4 homogeneous):
```
T_world_from_cam = invert_transform(T_cam_from_world)
```
- `solvePnP` returns `T_cam_from_world` (maps world points into camera frame).
- The script **inverts** this before saving to JSON.
- The translation column `T[:3, 3]` is the **camera center in world coordinates**.
- The rotation columns `T[:3, :3]` are the camera's local axes expressed in world frame.
**JSON format** (16 floats, row-major 4×4):
```json
{
"44289123": {
"pose": "0.878804 -0.039482 0.475548 -2.155006 0.070301 0.996409 ..."
}
}
```
### 2. Camera-Local Axes (OpenCV)
Every camera's local frame follows the OpenCV pinhole convention:
| Axis | Direction | Color in visualizer |
|------|-----------|-------------------|
| +X | Right | Red |
| +Y | Down | Green |
| +Z | Forward (into scene) | Blue |
The frustum is drawn along the camera's local +Z axis. The four corners of the
frustum's far plane are at `(±w, ±h, frustum_scale)` in camera-local coordinates.
### 3. Plotly Scene/Camera Interpretation Pitfalls
Plotly's 3D scene has its own camera model that controls **how you view** the data:
| Plotly setting | What it does | What it does NOT do |
|----------------|-------------|-------------------|
| `camera.up` | Sets which direction is "up" on screen | Does not transform data coordinates |
| `camera.eye` | Sets the viewpoint position | Does not change axis orientation |
| `yaxis.autorange = "reversed"` | Flips the Y axis tick direction | Does not negate Y data values |
| `aspectmode = "data"` | Preserves metric proportions | Does not imply any convention |
**Critical insight**: Changing `camera.up` from `{y:1}` to `{y:-1}` makes the plot
*look* like Y-down is rendered correctly, but the underlying Plotly axis still runs
bottom-to-top by default. This is purely a view transform — the data coordinates are
unchanged.
---
## Historical Confusion Timeline
This section documents the sequence of changes that led to confusion, for future
reference. All commits are on `visualize_extrinsics.py`.
### Phase 1: Initial Plotly Rewrite (`7b9782a`)
- Rewrote the visualizer from matplotlib to Plotly with a `--diagnose` mode.
- Used Plotly defaults (Y-up). OpenCV data (Y-down) appeared "upside down."
- Frustums pointed in the correct direction in data space but *looked* inverted.
### Phase 2: Y-Up Enforcement (`a8d3751`)
- Attempted to fix by setting `camera.up = {y:1}` and using `autorange: "reversed"`.
- This made the view *look* correct for some angles but introduced axis-label confusion.
- The Y axis ticks ran in the opposite direction from the data, misleading users.
### Phase 3: Render-Space Option (`ab88a24`)
- Added `--render-space` flag to switch between "cv" and "opengl" rendering.
- The OpenGL path applied a basis-change matrix `diag(1, -1, -1)` to all data.
- This actually transformed the data, not just the view — a correct approach but
introduced a second code path that was hard to validate.
### Phase 4: Ground Plane & Origin Triad (`18e8142`, `57f0dff`)
- Added ground plane overlay and world-origin axis triad.
- These were drawn in the *data* frame, so they were correct in CV mode but
appeared wrong in OpenGL mode (the basis transform was applied inconsistently
to some elements but not others).
### Phase 5: `--world-basis` with Global Transform (`79f2ab0`)
- Renamed `--render-space` to `--world-basis` with `cv` and `opengl` options.
- Introduced `world_to_plot()` as a central transform function.
- In `opengl` mode: `world_to_plot` applied `diag(1, -1, -1)` to all points.
- **Problem**: The Plotly `camera.up` and axis labels were not always updated
consistently with the basis choice, leading to "it looks right from one angle
but wrong from another" reports.
### Phase 6: Restore After Removal (`6330e0e`)
- `--world-basis` was briefly removed, then restored due to user request.
- This back-and-forth left the README with stale documentation referencing both
the old and new interfaces.
### Phase 7: Final Cleanup — CV Only (`d07c244`)
- **Removed `--world-basis` entirely.**
- `world_to_plot()` became a no-op (identity function).
- Plotly camera set to `up = {x:0, y:-1, z:0}` to render Y-down correctly.
- Axis labels explicitly set to `X (Right)`, `Y (Down)`, `Z (Forward)`.
- Added `--origin-axes-scale` for independent control of the origin triad size.
- Removed `--diagnose`, `--pose-convention`, and `--render-space` flags.
**This is the current state.**
---
## Peculiar Behaviors Catalog
| # | Symptom | Root Cause | Fix / Explanation |
|---|---------|-----------|-------------------|
| 1 | Frustum appears to point in "-Z" direction | Plotly default camera has Y-up; OpenCV frustum points +Z which looks "backward" when viewed from a Y-up perspective | Set `camera.up = {y:-1}` (done in current code). The frustum is correct in data space. |
| 2 | Switching to `--world-basis opengl` makes some elements flip but not others | The `world_to_plot()` transform was applied to camera traces but not consistently to ground plane or origin triad | Removed `--world-basis`. Single convention eliminates partial-transform bugs. |
| 3 | `yaxis.autorange = "reversed"` makes ticks confusing | Plotly reverses the tick labels but the data coordinates stay the same. Users see "0 at top, -2 at bottom" which contradicts Y-down intuition. | Removed `autorange: reversed`. Use `camera.up = {y:-1}` instead, which rotates the view without mangling tick labels. |
| 4 | Camera positions don't match `inside_network.json` | `inside_network.json` stores poses in the ZED Fusion coordinate frame (gravity-aligned, Y-up). `calibrate_extrinsics.py` stores poses in the ArUco marker object's frame (Y-down if the marker board is horizontal). These are **different world frames**. | Not a bug. The two systems use different world origins and orientations. To compare, you must apply the alignment transform between the two frames. See FAQ below. |
| 5 | Origin triad too small or too large relative to cameras | Origin triad defaulted to `--scale` (camera axis size), which is often much smaller than the camera spread | Use `--origin-axes-scale 0.6` (or similar) independently of `--scale`. |
| 6 | Bird-eye view shows unexpected orientation | `--birdseye` uses orthographic projection looking down the Y axis. In CV convention, Y is "down" so this is looking from below the scene upward. | Expected behavior. The bird-eye view shows the X-Z plane as seen from the -Y direction (below the cameras). |
---
## Canonical Rules Going Forward
1. **Single convention**: All visualization data is in OpenCV frame. No basis switching.
2. **`world_to_plot()` is identity**: It exists as a hook but performs no transform.
If a future need arises for basis conversion, it should be the *only* place it happens.
3. **Plotly camera settings are view-only**: Never use `autorange: reversed` or axis
negation to simulate a coordinate change. Use `camera.up` and `camera.eye` only.
4. **Poses are `world_from_cam`**: The 4×4 matrix maps camera-local points to world.
Translation = camera position in world. Rotation columns = camera axes in world.
5. **Colors are RGB = XYZ**: Red = X (right), Green = Y (down), Blue = Z (forward).
This applies to both per-camera axis triads and the world-origin triad.
6. **Units are meters**: Consistent with marker parquet geometry and calibration output.
---
## Current CLI Behavior
### Available Flags
```
visualize_extrinsics.py
-i, --input TEXT [required] Path to JSON extrinsics file
-o, --output TEXT Output path (.html or .png)
--show Open interactive Plotly viewer
--scale FLOAT Camera axis length (default: 0.2)
--frustum-scale FLOAT Frustum depth (default: 0.5)
--fov FLOAT Horizontal FOV degrees (default: 60.0)
--birdseye Top-down orthographic view
--show-ground/--no-show-ground Ground plane toggle
--ground-y FLOAT Ground plane Y position (default: 0.0)
--ground-size FLOAT Ground plane side length (default: 8.0)
--show-origin-axes/--no-show-origin-axes Origin triad toggle (default: on)
--origin-axes-scale FLOAT Origin triad size (defaults to --scale)
--zed-configs TEXT ZED calibration file(s) for accurate frustums
--resolution [FHD1200|FHD|2K|HD|SVGA|VGA]
--eye [left|right]
```
### Removed Flags (Historical Only)
| Flag | Removed In | Reason |
|------|-----------|--------|
| `--world-basis` | `d07c244` | Caused partial/inconsistent transforms. Single CV convention is simpler. |
| `--pose-convention` | `d07c244` | Only `world_from_cam` is supported. No need for a flag. |
| `--diagnose` | `d07c244` | Diagnostic checks moved out of the visualizer. |
| `--render-space` | `79f2ab0` | Renamed to `--world-basis`, then removed. |
> **Note**: The README.md still contains stale references to `--world-basis`,
> `--pose-convention`, and `--diagnose` in the Troubleshooting section. These should
> be cleaned up to match the current CLI.
---
## Verification Playbook
### Quick Sanity Check
```bash
# Render with origin triad at 0.6m scale, save as PNG
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--output output/_final_opencv_origin_axes_scaled.png \
--origin-axes-scale 0.6
```
**Expected result**:
- Origin triad at (0,0,0) with Red→+X (right), Green→+Y (down), Blue→+Z (forward).
- Camera frustums pointing along each camera's local +Z (blue axis).
- Camera positions spread out in world space (not bunched at origin).
- Y values for cameras should be negative (cameras are above the marker board,
which is at Y≈0; "above" in CV convention means negative Y).
### Interactive Validation
```bash
# Open interactive HTML for rotation/inspection
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--show \
--origin-axes-scale 0.6
```
**What to check**:
1. **Rotate the view**: The origin triad should remain consistent — Red/Green/Blue
always point in the same data-space directions regardless of view angle.
2. **Hover over camera centers**: Tooltip shows the camera serial number.
3. **Frustum orientation**: Each frustum's open end faces away from the camera center
along the camera's blue (Z) axis.
### Bird-Eye Sanity Check
```bash
uv run visualize_extrinsics.py \
--input output/e2e_refine_depth_smoke_rerun.json \
--birdseye --show \
--origin-axes-scale 0.6
```
**Expected**: Top-down view of the X-Z plane. Cameras should form a recognizable
spatial layout matching the physical installation. The Red (X) axis points right,
Blue (Z) axis points "up" on screen (forward in world).
---
## FAQ
### "Why does an OpenGL-like view look strange?"
Because the data is in OpenCV convention (Y-down, Z-forward) and Plotly defaults to
Y-up. When you try to make Plotly act like an OpenGL viewer (Y-up, Z-backward), you
need to either:
1. **Transform all data** by applying `diag(1, -1, -1)` — correct but doubles the
code paths and creates consistency risks.
2. **Adjust the Plotly camera** — only changes the view, not the data. Axis labels
and hover values still show CV coordinates.
We chose option (2) with `camera.up = {y:-1}`: minimal code, no data transformation,
axis labels match the actual coordinate values. The trade-off is that the default
Plotly orbit feels "inverted" compared to a Y-up 3D viewer. This is expected.
### "Does flipping axes in the view equal changing the world frame?"
**No.** Plotly's `camera.up`, `camera.eye`, and `autorange: reversed` are purely
view transforms. They change how the data is *displayed* but not what the coordinates
*mean*. The data always lives in the frame it was computed in (OpenCV/ArUco world frame).
If you set `camera.up = {y:1}` (Plotly default), the plot will render Y-up on screen,
but the data values are still Y-down. This creates a visual inversion that looks like
"the cameras are upside down" — they're not; the view is just flipped.
### "How do I compare with the C++ viewer and `inside_network.json`?"
The C++ ZED Fusion viewer and `inside_network.json` use a **different world frame**
than `calibrate_extrinsics.py`:
| Property | `calibrate_extrinsics.py` | ZED Fusion / `inside_network.json` |
|----------|--------------------------|-------------------------------------|
| World origin | ArUco marker object center | Gravity-aligned, first camera or user-defined |
| Y direction | Down (OpenCV) | Up (gravity-aligned) |
| Pose meaning | `T_world_from_cam` | `T_world_from_cam` (same semantics, different world) |
| Units | Meters | Meters |
To compare numerically:
1. The **relative** poses between cameras should match (up to the alignment transform).
2. The **absolute** positions will differ because the world origins are different.
3. To convert: apply the alignment rotation that maps the ArUco world frame to the
Fusion world frame. If `--auto-align` was used with a ground face, the ArUco frame
is partially aligned (ground = XZ plane), but the origin and yaw may still differ.
**Quick visual comparison**: Look at the *shape* of the camera arrangement (distances
and angles between cameras), not the absolute positions. If the shape matches, the
calibration is consistent.
### "Why are camera Y-positions negative?"
In OpenCV convention, +Y is down. Cameras mounted above the marker board (which defines
Y≈0) have negative Y values. This is correct. A camera at `Y = -1.3` is 1.3 meters
above the board.
### "What does `inside_network.json` camera 41831756's pose mean?"
```
Translation: [0.0, -1.175, 0.0]
Rotation: Identity
```
This camera is the reference frame origin (identity rotation) positioned 1.175m in the
-Y direction. In the Fusion frame (Y-up), this means 1.175m *below* the world origin.
In practice, this is the height offset of the camera relative to the Fusion coordinate
system's origin.
---
## Methodology: Comparing Different World Frames
Since `inside_network.json` (Fusion) and `calibrate_extrinsics.py` (ArUco) use different
world origins, raw coordinate comparison is meaningless. We validated consistency using
**rigid SE(3) alignment**:
1. **Match Serials**: Identify cameras present in both JSON files.
2. **Extract Centers**: Extract the translation column `t` from `T_world_from_cam` for
each camera.
* **Crucial**: Both systems use `T_world_from_cam`. It is **not** `cam_from_world`.
3. **Compute Alignment**: Solve for the rigid transform `(R_align, t_align)` that
minimizes the distance between the two point sets (Kabsch algorithm).
* Scale is fixed at 1.0 (both systems use meters).
4. **Apply & Compare**:
* Transform Fusion points: `P_aligned = R_align * P_fusion + t_align`.
* **Position Residual**: `|| P_aruco - P_aligned ||`.
* **Orientation Check**: Apply `R_align` to Fusion rotation matrices and compare
column vectors (Right/Down/Forward) with ArUco rotations.
5. **Up-Vector Verification**:
* Fusion uses Y-Up (gravity). ArUco uses Y-Down (image).
* After alignment, the transformed Fusion Y-axis should be approximately parallel
to the ArUco -Y axis (or +Y depending on the specific alignment solution found,
but they must be collinear with gravity).
**Result**: The overlay images in `output/` were generated using this aligned frame.
The low residuals (<2cm) confirm that the internal calibration is consistent, even
though the absolute world coordinates differ.
---
## Appendix: Stale README References
The following lines in `py_workspace/README.md` reference removed flags and should be
updated:
- **Line ~104**: References `--pose-convention` (removed).
- **Line ~105**: References `--world-basis opengl` (removed).
- **Line ~116**: References `--diagnose` (removed).
These were left from earlier iterations and do not reflect the current CLI.