diff --git a/note.md b/note.md
index c821d20..d430191 100644
--- a/note.md
+++ b/note.md
@@ -107,6 +107,56 @@ The actual sequence is:
 
 Thank you for catching this! It's crucial to understand the correct order of operations, especially when dealing with coordinate system transformations.
 
+Let me break this down more clearly:
+
+## Why do we need these conversions?
+
+1. **World Coordinate Conversion (Z-up to Y-up)**:
+   - Your original data uses a Z-up coordinate system (Z is "up" in the real world)
+   - Three.js uses a Y-up coordinate system (Y is "up" in Three.js)
+   - We need to convert between these two world coordinate systems
+
+2. **Camera Coordinate Conversion (OpenCV to OpenGL)**:
+   - Camera coordinate systems define how the camera "sees" the world
+   - OpenCV uses a right-handed system where Z points forward from the camera
+   - OpenGL/Three.js uses a system where Z points backward from the camera
+   - We need to convert between these camera conventions
+
+## What's the difference?
+
+- **World conversion** (Z-up to Y-up) affects how we represent points in the world
+- **Camera conversion** (OpenCV to OpenGL) affects how the camera interprets what it sees
+
+These are completely separate transforms that operate on different coordinate spaces:
+- One operates on the world before the camera sees it
+- The other operates on the camera's view of the world
+
+## Why this specific order?
+
+The order matters because we're dealing with a transformation pipeline:
+
+1. First, we want to convert the world from Z-up to Y-up (`worldCvt`)
+   - This makes the world match Three.js's expectations
+
+2. Then, apply the camera's extrinsic matrix (`Rt`)
+   - This transforms world points into the camera's coordinate system
+   - At this point, we're in OpenCV camera space
+
+3. Finally, convert from OpenCV to OpenGL camera conventions (`cameraCvt`)
+   - This converts the camera space to match Three.js's expectations
+
+If we changed the order:
+- Applying camera conversion before the camera transformation would incorrectly transform world points
+- Applying world conversion after the camera transformation would transform points that are already in camera space
+
+Think of it as a pipeline:
+1. Fix the world coordinates (world conversion)
+2. View the world through the camera (camera extrinsic)
+3. Adjust how the camera interprets what it sees (camera conversion)
+
+The fact that matrix multiplication is associative means we can compute this entire pipeline as a single matrix operation, but the conceptual order still matters for getting the correct result.
+
+
 ---
 
 [Matrix4.multiply](https://threejs.org/docs/#api/en/math/Matrix4.multiply)