2025-12-09 15:54:50 +08:00
2025-12-09 15:54:50 +08:00
2025-12-09 15:54:50 +08:00
2025-12-09 15:54:50 +08:00

Homework 01: Survey & Design of 3D Multi-view Human Pose Estimation System

调研现有的多视角单人 (multiple view of single person) 多视角多人 (multi-view multi-person) 人体姿态估计系统/管线

参考系统

EasyMocap

Note

它也是有其显然的问题的, 找出其问题所在可作为各位同学的练习

FreeMocap

Free Motion Capture for Everyone

个人项目, 规模不大

OpenSim 系列

斯坦福大学的项目, 大抵的架构能力是比我强的, 但是目标和我试图达到的不同

Pose2Sim

Pose2Sim stands for "OpenPose to OpenSim", as it originally used OpenPose inputs (2D keypoints coordinates) and led to an OpenSim result (full-body 3D joint angles).

  • Camera synchronization
  • Multi-person identification
  • Robust triangulation
  • 3D coordinates filtering
  • Marker augmentation

OpenCap

Pose2Sim 试图做的事情似乎类似, 但是给的管线是基于两台 iPhone 便可实现的, 更加成熟

Sports2D

Splyza Motion 做的事情类似

单视角

Important

note that the motion must lie in the sagittal or frontal plane.

应该只进行一次深度估计; 也就是说朝着/远离相机的视线方向的运动是无法估计的

Note

注意, 只能是单相机, 否则没有相机外参, 或者说不知道原点在哪里; 而单相机实现如 SMPLify 类似的实现只需要估计深度

注意

devil is in the details

每一个数据流, 其 source 从哪里来, 以什么格式 (tensor? image? encoded video? byte stream?) 传输, 传输的协议 (socket? shared memory? file IO?) 是什么, 传输的频率是多少 (等间隔? 若等间隔, 采样频率是多少 Hz? 不等间隔时间戳哪里来的? 以什么为时基?) 预估的带宽是多少? (这又决定了传输协议的选择)

必定有 source 与 sink, 还需要填充中间的部分 (能填充得越详细越好)

是单个进程还是多个进程? (答案应该是显然的) 能否分布式? (边缘计算的扩展可能)

表示层往往是 sink, 而渲染 3D 表示往往是 (游戏) 引擎等价的工作量

可以用 LLM, 但是能否分辨祂是否在胡说八道?

参考答案

AI 生成的架构图, 不代表我此刻的实际想法, which changes every moment.

能把每一个横线的细节都填上嘛? (确实有许多纰漏)

flowchart TD
  %% =========================
  %% Multi-view 2D stage
  %% =========================
  subgraph VIEWS["Per-view input (cameras 1..N)"]
    direction LR
    C1["Cam 1\n2D detections: 133×2 (+conf)"] --> T1["2D latest tracking cache\n(view 1)"]
    C2["Cam 2\n2D detections: 133×2 (+conf)"] --> T2["2D latest tracking cache\n(view 2)"]
    C3["Cam 3\n2D detections: 133×2 (+conf)"] --> T3["2D latest tracking cache\n(view 3)"]
    C4["Cam 4\n2D detections: 133×2 (+conf)"] --> T4["2D latest tracking cache\n(view 4)"]
  end

  %% =========================
  %% Cross-view association
  %% =========================
  subgraph ASSOC["Cross-view data association (epipolar)"]
    direction TB
    EPI["Epipolar constraint\n(Sampson / point-to-epiline)"]:::core
    CYCLE["Cycle consistency / view-graph pruning"]:::core
    GROUP["Assemble per-target multi-view observation set\n{view_id → 133×2}"]:::core
    EPI --> CYCLE --> GROUP
  end

  T1 --> EPI
  T2 --> EPI
  T3 --> EPI
  T4 --> EPI

  %% =========================
  %% Geometry / lifting
  %% =========================
  subgraph GEOM["3D measurement construction"]
    direction TB
    RT["Camera models\nK, [R|t], SO(3)/SE(3)"]:::meta
    DLT["DLT / triangulation (init)"]:::core
    NN["Optional NN lifting / completion"]:::core
    BA["Optional reprojection refinement\n(15 iters)"]:::core
    Y["3D measurement y(t)\nJ×3 positions (+quality / R / cov)"]:::out
    RT --> DLT --> NN --> BA --> Y
  end

  GROUP --> DLT

  %% =========================
  %% Tracking filter + lifecycle
  %% =========================
  subgraph FILTER["Tracking filter (per target)"]
    direction TB
    GATE["Gating\n(Mahalanobis / per-joint + global)"]:::core
    IMM["IMM (motion model bank)\n(CV/CA or low/med/high Q)"]:::core
    PRED["Predict\nΔt, self-propagate"]:::core
    UPD["Update\nKF (linear)\nstate: [p(3J), v(3J)]"]:::core
    MISS["Miss handling & track lifecycle\n(tentative → confirmed → deleted)"]:::meta

    GATE --> IMM --> PRED --> UPD --> MISS
  end

  Y --> GATE

  %% Optional inertial fusion
  IMU["IMU (optional)"]:::meta --> INERT["EKF/UKF branch (optional)\nwhen augmenting state with orientation"]:::meta --> IMM

  %% =========================
  %% IK + optional feedback
  %% =========================
  subgraph IKSTAGE["IK stage (constraint / anatomy)"]
    direction TB
    IK["IK optimization target\n(minimize joint position error,\nadd bone length / joint limits)"]:::core
    FB["Optional feedback to filter\npseudo-measurement z_IK with large R"]:::meta
    IK --> FB
  end

  UPD --> IK
  FB -.-> GATE

  %% =========================
  %% SMPL / mesh fitting
  %% =========================
  subgraph SMPLSTAGE["SMPL / SMPL-X fitting"]
    direction TB
    VP["VPoser / pose prior"]:::core
    SMPL["SMPL(θ, β, root)\nfit to joints / reprojection"]:::core
    JR["JR: Joint Regressor\nmesh → joints (loop closure)"]:::core
    OUT["Outputs\nmesh + joints + pose params"]:::out
    VP --> SMPL --> JR --> OUT
    JR -. residual / reproject .-> SMPL
  end

  IK --> SMPL

  classDef core fill:#0b1020,stroke:#5eead4,color:#e5e7eb,stroke-width:1.2px;
  classDef meta fill:#111827,stroke:#93c5fd,color:#e5e7eb,stroke-dasharray: 4 3;
  classDef out fill:#052e2b,stroke:#34d399,color:#ecfeff,stroke-width:1.4px;

如果图片不能正确预览, 见 fig/fig.svg

Description
No description provided
Readme 51 KiB
Languages
Mermaid 100%