refactor: things

2026-03-06 17:17:59 +08:00
parent 8c6087683f
commit 33ab1a5d9d
171 changed files with 293 additions and 29894 deletions
@@ -0,0 +1,489 @@
+# Visualization Conventions & Coordinate Frame Reference
+
+> **Status**: Canonical reference as of 2026-02-08.
+> **Applies to**: `visualize_extrinsics.py`, `calibrate_extrinsics.py`, and `inside_network.json`.
+
+---
+
+## Executive Summary
+
+The `visualize_extrinsics.py` script went through multiple iterations of coordinate-frame
+switching (OpenCV ↔ OpenGL), Plotly camera/view hacks, and partial basis transforms that
+created compounding confusion about whether the visualization was correct. The root cause
+was **conflating Plotly's scene camera settings with actual data-frame transforms**:
+adjusting `camera.up`, `autorange: "reversed"`, or eye position changes *how you look at
+the data* but does **not** change the coordinate frame the data lives in.
+
+After several rounds of adding and removing `--world-basis`, `--render-space`, and
+`--pose-convention` flags, the visualizer was simplified to a single convention:
+
+- **All data is in OpenCV convention** (+X right, +Y down, +Z forward).
+- **No basis switching**. The `--world-basis` flag was removed.
+- **Plotly's scene camera** is configured with `up = {x:0, y:-1, z:0}` so that the
+  OpenCV +Y-down axis renders as "down" on screen.
+
+The confusion was never a bug in the calibration math — it was a visualization-layer
+problem caused by trying to make Plotly (which defaults to Y-up) display OpenCV data
+(which is Y-down) without a clear separation between "data frame" and "view frame."
+
+---
+
+## Current Policy Checklist (2026-02-09)
+
+For engineers maintaining or using these tools:
+
+- [x] **`calibrate_extrinsics.py`**: Outputs `T_world_from_cam` (OpenCV). Auto-aligns to **Y-down** (gravity along +Y). Writes `_meta` block.
+- [x] **`visualize_extrinsics.py`**: Renders raw JSON data. Ignores `_meta`. Sets view camera up to `-Y`.
+- [x] **`apply_calibration_to_fusion_config.py`**: Direct pose copy only. No `--cv-to-opengl` conversion.
+- [x] **`compare_pose_sets.py`**: Symmetric inputs (`--pose-a-json`, `--pose-b-json`). Heuristic parsing.
+- [x] **Conventions**: Always OpenCV frame (+X Right, +Y Down, +Z Forward). Units in meters.
+
+---
+
+## Ground Truth Conventions
+
+### 1. Calibration Output: `world_from_cam`
+
+`calibrate_extrinsics.py` stores poses as **T_world_from_cam** (4×4 homogeneous):
+
+```
+T_world_from_cam = invert_transform(T_cam_from_world)
+```
+
+- `solvePnP` returns `T_cam_from_world` (maps world points into camera frame).
+- The script **inverts** this before saving to JSON.
+- The translation column `T[:3, 3]` is the **camera center in world coordinates**.
+- The rotation columns `T[:3, :3]` are the camera's local axes expressed in world frame.
+
+**JSON format** (16 floats, row-major 4×4):
+```json
+{
+  "44289123": {
+    "pose": "0.878804 -0.039482 0.475548 -2.155006 0.070301 0.996409 ..."
+  }
+}
+```
+
+### 2. Camera-Local Axes (OpenCV)
+
+Every camera's local frame follows the OpenCV pinhole convention:
+
+| Axis | Direction | Color in visualizer |
+|------|-----------|-------------------|
+| +X   | Right     | Red               |
+| +Y   | Down      | Green             |
+| +Z   | Forward (into scene) | Blue   |
+
+The frustum is drawn along the camera's local +Z axis. The four corners of the
+frustum's far plane are at `(±w, ±h, frustum_scale)` in camera-local coordinates.
+
+### 3. Metadata & Auto-Alignment (`_meta`)
+
+`calibrate_extrinsics.py` now writes a `_meta` key to the output JSON. This metadata
+describes the conventions used but is **optional** for consumers (like the visualizer)
+which may ignore it.
+
+```json
+{
+  "_meta": {
+    "pose_convention": "world_from_cam",
+    "frame_convention": "opencv",
+    "auto_aligned": true,
+    "gravity_direction_world": [0.0, 1.0, 0.0],
+    "alignment": { ... }
+  },
+  "SN123": { ... }
+}
+```
+
+- `pose_convention`: Always `world_from_cam`.
+- `frame_convention`: Always `opencv` (+X Right, +Y Down, +Z Forward).
+- `auto_aligned`: `true` if `--auto-align` was used.
+- `gravity_direction_world`: The vector in world space that points "down" (gravity).
+  - For **Y-down** (current default), this is `[0, 1, 0]`.
+  - For **Y-up** (legacy/Fusion), this would be `[0, -1, 0]`.
+
+**Auto-Alignment Target**:
+The `--auto-align` flag now targets a **Y-down** world frame by default (`target_axis=[0, -1, 0]`
+in internal logic, resulting in gravity pointing +Y).
+- **Old behavior**: Targeted Y-up (gravity along -Y).
+- **New behavior**: Targets Y-down (gravity along +Y) to match the OpenCV camera frame convention.
+- **Result**: The ground plane is still XZ, but "up" is -Y and "down" is +Y.
+
+### 4. Plotly Scene/Camera Interpretation Pitfalls
+
+Plotly's 3D scene has its own camera model that controls **how you view** the data:
+
+| Plotly setting | What it does | What it does NOT do |
+|----------------|-------------|-------------------|
+| `camera.up` | Sets which direction is "up" on screen | Does not transform data coordinates |
+| `camera.eye` | Sets the viewpoint position | Does not change axis orientation |
+| `yaxis.autorange = "reversed"` | Flips the Y axis tick direction | Does not negate Y data values |
+| `aspectmode = "data"` | Preserves metric proportions | Does not imply any convention |
+
+**Critical insight**: Changing `camera.up` from `{y:1}` to `{y:-1}` makes the plot
+*look* like Y-down is rendered correctly, but the underlying Plotly axis still runs
+bottom-to-top by default. This is purely a view transform — the data coordinates are
+unchanged.
+
+---
+
+## Historical Confusion Timeline
+
+This section documents the sequence of changes that led to confusion, for future
+reference. All commits are on `visualize_extrinsics.py`.
+
+### Phase 1: Initial Plotly Rewrite (`7b9782a`)
+- Rewrote the visualizer from matplotlib to Plotly with a `--diagnose` mode.
+- Used Plotly defaults (Y-up). OpenCV data (Y-down) appeared "upside down."
+- Frustums pointed in the correct direction in data space but *looked* inverted.
+
+### Phase 2: Y-Up Enforcement (`a8d3751`)
+- Attempted to fix by setting `camera.up = {y:1}` and using `autorange: "reversed"`.
+- This made the view *look* correct for some angles but introduced axis-label confusion.
+- The Y axis ticks ran in the opposite direction from the data, misleading users.
+
+### Phase 3: Render-Space Option (`ab88a24`)
+- Added `--render-space` flag to switch between "cv" and "opengl" rendering.
+- The OpenGL path applied a basis-change matrix `diag(1, -1, -1)` to all data.
+- This actually transformed the data, not just the view — a correct approach but
+  introduced a second code path that was hard to validate.
+
+### Phase 4: Ground Plane & Origin Triad (`18e8142`, `57f0dff`)
+- Added ground plane overlay and world-origin axis triad.
+- These were drawn in the *data* frame, so they were correct in CV mode but
+  appeared wrong in OpenGL mode (the basis transform was applied inconsistently
+  to some elements but not others).
+
+### Phase 5: `--world-basis` with Global Transform (`79f2ab0`)
+- Renamed `--render-space` to `--world-basis` with `cv` and `opengl` options.
+- Introduced `world_to_plot()` as a central transform function.
+- In `opengl` mode: `world_to_plot` applied `diag(1, -1, -1)` to all points.
+- **Problem**: The Plotly `camera.up` and axis labels were not always updated
+  consistently with the basis choice, leading to "it looks right from one angle
+  but wrong from another" reports.
+
+### Phase 6: Restore After Removal (`6330e0e`)
+- `--world-basis` was briefly removed, then restored due to user request.
+- This back-and-forth left the README with stale documentation referencing both
+  the old and new interfaces.
+
+### Phase 7: Final Cleanup — CV Only (`d07c244`)
+- **Removed `--world-basis` entirely.**
+- `world_to_plot()` became a no-op (identity function).
+- Plotly camera set to `up = {x:0, y:-1, z:0}` to render Y-down correctly.
+- Axis labels explicitly set to `X (Right)`, `Y (Down)`, `Z (Forward)`.
+- Added `--origin-axes-scale` for independent control of the origin triad size.
+- Removed `--diagnose`, `--pose-convention`, and `--render-space` flags.
+
+**This is the current state.**
+
+---
+
+## Peculiar Behaviors Catalog
+
+| # | Symptom | Root Cause | Fix / Explanation |
+|---|---------|-----------|-------------------|
+| 1 | Frustum appears to point in "-Z" direction | Plotly default camera has Y-up; OpenCV frustum points +Z which looks "backward" when viewed from a Y-up perspective | Set `camera.up = {y:-1}` (done in current code). The frustum is correct in data space. |
+| 2 | Switching to `--world-basis opengl` makes some elements flip but not others | The `world_to_plot()` transform was applied to camera traces but not consistently to ground plane or origin triad | Removed `--world-basis`. Single convention eliminates partial-transform bugs. |
+| 3 | `yaxis.autorange = "reversed"` makes ticks confusing | Plotly reverses the tick labels but the data coordinates stay the same. Users see "0 at top, -2 at bottom" which contradicts Y-down intuition. | Removed `autorange: reversed`. Use `camera.up = {y:-1}` instead, which rotates the view without mangling tick labels. |
+| 4 | Camera positions don't match `inside_network.json` | `inside_network.json` stores poses in the ZED Fusion coordinate frame (gravity-aligned, Y-up). `calibrate_extrinsics.py` stores poses in the ArUco marker object's frame (Y-down if the marker board is horizontal). These are **different world frames**. | Not a bug. The two systems use different world origins and orientations. To compare, you must apply the alignment transform between the two frames. See FAQ below. |
+| 5 | Origin triad too small or too large relative to cameras | Origin triad defaulted to `--scale` (camera axis size), which is often much smaller than the camera spread | Use `--origin-axes-scale 0.6` (or similar) independently of `--scale`. |
+| 6 | Bird-eye view shows unexpected orientation | `--birdseye` uses orthographic projection looking down the Y axis. In CV convention, Y is "down" so this is looking from below the scene upward. | Expected behavior. The bird-eye view shows the X-Z plane as seen from the -Y direction (below the cameras). |
+
+---
+
+## Known Pitfalls & Common Confusions
+
+1.  **Y-Up vs Y-Down**:
+    -   **OpenCV/ArUco**: Y is **Down**. Gravity is `[0, 1, 0]`.
+    -   **OpenGL/Fusion**: Y is **Up**. Gravity is `[0, -1, 0]`.
+    -   **Pitfall**: Assuming "Y is vertical" is ambiguous. You must know the sign.
+
+2.  **Frame Mismatch vs Origin Mismatch**:
+    -   Two pose sets can have the **same convention** (e.g., both Y-down) but different **world origins** (e.g., one at marker, one at camera 1).
+    -   **Fix**: Use `compare_pose_sets.py` to align them rigidly before comparing errors.
+
+3.  **Visualizer "Inversion"**:
+    -   The visualizer sets `camera.up = {y:-1}`. This makes the scene look "normal" (floor at bottom) even though the data is Y-down.
+    -   **Pitfall**: Don't try to "fix" the data to match the view. The view is already compensating for the data.
+
+---
+
+## Canonical Rules Going Forward
+
+1. **Single convention**: All visualization data is in OpenCV frame. No basis switching.
+2. **`world_to_plot()` is identity**: It exists as a hook but performs no transform.
+   If a future need arises for basis conversion, it should be the *only* place it happens.
+3. **Plotly camera settings are view-only**: Never use `autorange: reversed` or axis
+   negation to simulate a coordinate change. Use `camera.up` and `camera.eye` only.
+4. **Poses are `world_from_cam`**: The 4×4 matrix maps camera-local points to world.
+   Translation = camera position in world. Rotation columns = camera axes in world.
+5. **Colors are RGB = XYZ**: Red = X (right), Green = Y (down), Blue = Z (forward).
+   This applies to both per-camera axis triads and the world-origin triad.
+6. **Units are meters**: Consistent with marker parquet geometry and calibration output.
+
+---
+
+## Current CLI Behavior
+
+### Available Flags
+
+```
+visualize_extrinsics.py
+  -i, --input TEXT          [required] Path to JSON extrinsics file
+  -o, --output TEXT         Output path (.html or .png)
+  --show                    Open interactive Plotly viewer
+  --scale FLOAT             Camera axis length (default: 0.2)
+  --frustum-scale FLOAT     Frustum depth (default: 0.5)
+  --fov FLOAT               Horizontal FOV degrees (default: 60.0)
+  --birdseye                Top-down orthographic view
+  --show-ground/--no-show-ground    Ground plane toggle
+  --ground-y FLOAT          Ground plane Y position (default: 0.0)
+  --ground-size FLOAT       Ground plane side length (default: 8.0)
+  --show-origin-axes/--no-show-origin-axes  Origin triad toggle (default: on)
+  --origin-axes-scale FLOAT Origin triad size (defaults to --scale)
+  --zed-configs TEXT         ZED calibration file(s) for accurate frustums
+  --resolution [FHD1200|FHD|2K|HD|SVGA|VGA]
+  --eye [left|right]
+```
+
+### Removed Flags (Historical Only)
+
+| Flag | Removed In | Reason |
+|------|-----------|--------|
+| `--world-basis` | `d07c244` | Caused partial/inconsistent transforms. Single CV convention is simpler. |
+| `--pose-convention` | `d07c244` | Only `world_from_cam` is supported. No need for a flag. |
+| `--diagnose` | `d07c244` | Diagnostic checks moved out of the visualizer. |
+| `--render-space` | `79f2ab0` | Renamed to `--world-basis`, then removed. |
+
+> **Note**: The README.md still contains stale references to `--world-basis`,
+> `--pose-convention`, and `--diagnose` in the Troubleshooting section. These should
+> be cleaned up to match the current CLI.
+
+---
+
+## Verification Playbook
+
+### Quick Sanity Check
+
+```bash
+# Render with origin triad at 0.6m scale, save as PNG
+uv run visualize_extrinsics.py \
+  --input output/e2e_refine_depth_smoke_rerun.json \
+  --output output/_final_opencv_origin_axes_scaled.png \
+  --origin-axes-scale 0.6
+```
+
+**Expected result**:
+- Origin triad at (0,0,0) with Red→+X (right), Green→+Y (down), Blue→+Z (forward).
+- Camera frustums pointing along each camera's local +Z (blue axis).
+- Camera positions spread out in world space (not bunched at origin).
+- Y values for cameras should be negative (cameras are above the marker board,
+  which is at Y≈0; "above" in CV convention means negative Y).
+
+### Interactive Validation
+
+```bash
+# Open interactive HTML for rotation/inspection
+uv run visualize_extrinsics.py \
+  --input output/e2e_refine_depth_smoke_rerun.json \
+  --show \
+  --origin-axes-scale 0.6
+```
+
+**What to check**:
+1. **Rotate the view**: The origin triad should remain consistent — Red/Green/Blue
+   always point in the same data-space directions regardless of view angle.
+2. **Hover over camera centers**: Tooltip shows the camera serial number.
+3. **Frustum orientation**: Each frustum's open end faces away from the camera center
+   along the camera's blue (Z) axis.
+
+### Bird-Eye Sanity Check
+
+```bash
+uv run visualize_extrinsics.py \
+  --input output/e2e_refine_depth_smoke_rerun.json \
+  --birdseye --show \
+  --origin-axes-scale 0.6
+```
+
+**Expected**: Top-down view of the X-Z plane. Cameras should form a recognizable
+spatial layout matching the physical installation. The Red (X) axis points right,
+Blue (Z) axis points "up" on screen (forward in world).
+
+---
+
+## FAQ
+
+### "Why does an OpenGL-like view look strange?"
+
+Because the data is in OpenCV convention (Y-down, Z-forward) and Plotly defaults to
+Y-up. When you try to make Plotly act like an OpenGL viewer (Y-up, Z-backward), you
+need to either:
+
+1. **Transform all data** by applying `diag(1, -1, -1)` — correct but doubles the
+   code paths and creates consistency risks.
+2. **Adjust the Plotly camera** — only changes the view, not the data. Axis labels
+   and hover values still show CV coordinates.
+
+We chose option (2) with `camera.up = {y:-1}`: minimal code, no data transformation,
+axis labels match the actual coordinate values. The trade-off is that the default
+Plotly orbit feels "inverted" compared to a Y-up 3D viewer. This is expected.
+
+### "Does flipping axes in the view equal changing the world frame?"
+
+**No.** Plotly's `camera.up`, `camera.eye`, and `autorange: reversed` are purely
+view transforms. They change how the data is *displayed* but not what the coordinates
+*mean*. The data always lives in the frame it was computed in (OpenCV/ArUco world frame).
+
+If you set `camera.up = {y:1}` (Plotly default), the plot will render Y-up on screen,
+but the data values are still Y-down. This creates a visual inversion that looks like
+"the cameras are upside down" — they're not; the view is just flipped.
+
+### "How do I compare with the C++ viewer and `inside_network.json`?"
+
+The C++ ZED Fusion viewer and `inside_network.json` use a **different world frame**
+than `calibrate_extrinsics.py`:
+
+| Property | `calibrate_extrinsics.py` | ZED Fusion / `inside_network.json` |
+|----------|--------------------------|-------------------------------------|
+| World origin | ArUco marker object center | Gravity-aligned, first camera or user-defined |
+| Y direction | Down (OpenCV) | Up (gravity-aligned) |
+| Pose meaning | `T_world_from_cam` | `T_world_from_cam` (same semantics, different world) |
+| Units | Meters | Meters |
+
+To compare numerically:
+1. The **relative** poses between cameras should match (up to the alignment transform).
+2. The **absolute** positions will differ because the world origins are different.
+3. To convert: apply the alignment rotation that maps the ArUco world frame to the
+   Fusion world frame. If `--auto-align` was used with a ground face, the ArUco frame
+   is partially aligned (ground = XZ plane), but the origin and yaw may still differ.
+
+**Quick visual comparison**: Look at the *shape* of the camera arrangement (distances
+and angles between cameras), not the absolute positions. If the shape matches, the
+calibration is consistent.
+
+### "Why are camera Y-positions negative?"
+
+In OpenCV convention, +Y is down. Cameras mounted above the marker board (which defines
+Y≈0) have negative Y values. This is correct. A camera at `Y = -1.3` is 1.3 meters
+above the board.
+
+### "What does `inside_network.json` camera 41831756's pose mean?"
+
+```
+Translation: [0.0, -1.175, 0.0]
+Rotation: Identity
+```
+
+This camera is the reference frame origin (identity rotation) positioned 1.175m in the
+-Y direction. In the Fusion frame (Y-up), this means 1.175m *below* the world origin.
+In practice, this is the height offset of the camera relative to the Fusion coordinate
+system's origin.
+
+---
+
+## Methodology: Comparing Different World Frames
+
+Since `inside_network.json` (Fusion) and `calibrate_extrinsics.py` (ArUco) use different
+world origins, raw coordinate comparison is meaningless. We validated consistency using
+**rigid SE(3) alignment**:
+
+1.  **Match Serials**: Identify cameras present in both JSON files.
+2.  **Extract Centers**: Extract the translation column `t` from `T_world_from_cam` for
+    each camera.
+    *   **Crucial**: Both systems use `T_world_from_cam`. It is **not** `cam_from_world`.
+3.  **Compute Alignment**: Solve for the rigid transform `(R_align, t_align)` that
+    minimizes the distance between the two point sets (Kabsch algorithm).
+    *   Scale is fixed at 1.0 (both systems use meters).
+4.  **Apply & Compare**:
+    *   Transform Fusion points: `P_aligned = R_align * P_fusion + t_align`.
+    *   **Position Residual**: `|| P_aruco - P_aligned ||`.
+    *   **Orientation Check**: Apply `R_align` to Fusion rotation matrices and compare
+        column vectors (Right/Down/Forward) with ArUco rotations.
+5.  **Up-Vector Verification**:
+    *   Fusion uses Y-Up (gravity). ArUco uses Y-Down (image).
+    *   After alignment, the transformed Fusion Y-axis should be approximately parallel
+        to the ArUco -Y axis (or +Y depending on the specific alignment solution found,
+        but they must be collinear with gravity).
+
+**Result**: The overlay images in `output/` were generated using this aligned frame.
+The low residuals (<2cm) confirm that the internal calibration is consistent, even
+though the absolute world coordinates differ.
+
+---
+
+## `compare_pose_sets.py` Input Formats
+
+The `compare_pose_sets.py` tool is designed to be agnostic to the source of the JSON files.
+It uses a **symmetric, heuristic parser** for both `--pose-a-json` and `--pose-b-json`.
+
+### Accepted JSON Schemas
+
+The parser automatically detects and handles either of these two structures for any input file:
+
+**1. Flat Format (Standard Output)**
+Used by `calibrate_extrinsics.py` and `refine_extrinsics.py`.
+```json
+{
+  "SERIAL_NUMBER": {
+    "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1"
+  }
+}
+```
+
+**2. Nested Fusion Format**
+Used by ZED Fusion `inside_network.json` configuration files.
+```json
+{
+  "SERIAL_NUMBER": {
+    "FusionConfiguration": {
+      "pose": "r00 r01 r02 tx r10 r11 r12 ty r20 r21 r22 tz 0 0 0 1"
+    }
+  }
+}
+```
+
+### Key Behaviors
+
+1.  **Interchangeability**: You can swap inputs. Comparing A (ArUco) vs B (Fusion) is valid,
+    as is A (Fusion) vs B (ArUco). The script aligns B to A.
+2.  **Pose Semantics**: All poses are interpreted as `T_world_from_cam` (camera-to-world).
+    The script does **not** invert matrices; it assumes the input strings are already in the
+    correct convention.
+3.  **Minimum Overlap**: The script requires at least **3 shared camera serials** between
+    the two files to compute a rigid alignment.
+4.  **Heuristic Parsing**: For each serial key, the parser looks for `FusionConfiguration.pose`
+    first, then falls back to `pose`.
+
+### Example: Swapped Inputs
+
+Since the parser is symmetric, you can verify consistency by reversing the alignment direction:
+
+```bash
+# Align Fusion (B) to ArUco (A)
+uv run compare_pose_sets.py \
+    --pose-a-json output/e2e_refine_depth.json \
+    --pose-b-json ../zed_settings/inside_network.json \
+    --report-json output/report_aruco_ref.json
+
+# Align ArUco (B) to Fusion (A)
+uv run compare_pose_sets.py \
+    --pose-a-json ../zed_settings/inside_network.json \
+    --pose-b-json output/e2e_refine_depth.json \
+    --report-json output/report_fusion_ref.json
+```
+
+---
+
+## Appendix: Stale README References
+
+The following lines in `py_workspace/README.md` reference removed flags and should be
+updated:
+
+- **Line ~104**: References `--pose-convention` (removed).
+- **Line ~105**: References `--world-basis opengl` (removed).
+- **Line ~116**: References `--diagnose` (removed).
+
+These were left from earlier iterations and do not reflect the current CLI.