Files
camera-extrinsic-play/note.md

3.9 KiB

I'll write down the transformation sequence using LaTeX notation. Let me break down the complete operation:

For a point P in the original world space, the final transformation can be written as:


P_{final} = (C_{GL} \cdot E \cdot W_{Y})^{-1} \cdot P

Where:

  • E is your original extrinsic matrix (world-to-camera transform)
  • W_{Y} is the Z-up to Y-up world conversion matrix
  • C_{GL} is the OpenCV to OpenGL camera conversion matrix
  • ^{-1} denotes matrix inversion

Breaking down each matrix:


W_{Y} = \begin{bmatrix} 
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & -1 & 0 & 0 \\
0 & 0 & 0 & 1
\end{bmatrix}

C_{GL} = \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & -1 & 0 & 0 \\
0 & 0 & -1 & 0 \\
0 & 0 & 0 & 1
\end{bmatrix}

The order of operations (from right to left) is:

  1. Convert world from Z-up to Y-up (W_{Y})
  2. Apply camera transform (E)
  3. Convert camera space to OpenGL convention (C_{GL})
  4. Invert the whole transformation (^{-1})

This matches our code where we write:

finalTransform
  .multiply(cameraConversion)    // C_GL
  .multiply(worldToCamera)       // E
  .multiply(worldZupToYup)       // W_Y
  .invert()                      // ^{-1}

Let me explain the reasoning behind this order:

  1. First, W_Y (Z-up to Y-up):

    • We need to convert the world coordinates BEFORE applying the camera transform
    • Because your original extrinsic matrix E was defined in Z-up world coordinates
    • If we didn't do this first, the camera transform would be applied in the wrong coordinate system
  2. Then, E (your extrinsic matrix):

    • This transforms points from the (now Y-up) world space into camera space
    • The camera space at this point is still in OpenCV convention
  3. Finally, C_{GL} (OpenCV to OpenGL):

    • This converts the camera space from OpenCV convention to OpenGL convention
    • It must be applied AFTER the camera transform because it's converting the camera's view space
    • If we applied it before, we'd be incorrectly modifying the world coordinates

Think about it like this:

  1. We first need to fix the world coordinate system to match Three.js (W_Y)
  2. Then we can apply your camera's view of this world (E)
  3. Finally, we adapt how the camera "sees" to match OpenGL conventions (C_{GL})

If we did it in a different order:

  • Applying C_{GL} before E would incorrectly transform the world coordinates instead of the camera space
  • Applying W_Y after E would transform the camera space instead of the world coordinates
  • The transformations would effectively be applied to the wrong spaces

The inversion at the end (^{-1}) is needed because Three.js expects camera-to-world transforms rather than world-to-camera transforms.

Ah yes, you're absolutely right! I made a mistake in my explanation. In matrix multiplication, the operations are indeed applied from right to left.

When we write:


P_{final} = (C_{GL} \cdot E \cdot W_{Y})^{-1} \cdot P

The actual order of operations on point P (before inversion) would be:

  1. W_Y \cdot P: Convert from Z-up to Y-up world coordinates
  2. E \cdot (W_Y \cdot P): Apply camera transform
  3. C_{GL} \cdot (E \cdot (W_Y \cdot P)): Convert to OpenGL camera convention

In our code:

finalTransform
  .multiply(cameraConversion)    // C_GL
  .multiply(worldToCamera)       // E
  .multiply(worldZupToYup)       // W_Y

Three.js's multiply() method actually applies the matrices in the opposite order of how we write them (it pre-multiplies), which is why our code worked. But my explanation of the order was incorrect.

The actual sequence is:

  1. C_{GL}: Convert camera space to OpenGL convention
  2. E: Transform from world to camera
  3. W_Y: Convert world from Z-up to Y-up

Thank you for catching this! It's crucial to understand the correct order of operations, especially when dealing with coordinate system transformations.


Matrix4.multiply