Files
homework01/README.md

8.3 KiB
Raw Blame History

Homework 01: Survey & Design of 3D Multi-view Human Pose Estimation System

调研现有的多视角单人 (multiple view of single person) 多视角多人 (multi-view multi-person) 人体姿态估计系统/管线

参考系统

EasyMocap

Note

它也是有其显然的问题的, 找出其问题所在可作为各位同学的练习

FreeMocap

Free Motion Capture for Everyone

个人项目, 规模不大

OpenSim 系列

斯坦福大学的项目, 大抵的架构能力是比我强的, 但是目标和我试图达到的不同

Pose2Sim

Pose2Sim stands for "OpenPose to OpenSim", as it originally used OpenPose inputs (2D keypoints coordinates) and led to an OpenSim result (full-body 3D joint angles).

  • Camera synchronization
  • Multi-person identification
  • Robust triangulation
  • 3D coordinates filtering
  • Marker augmentation

OpenCap

Pose2Sim 试图做的事情似乎类似, 但是给的管线是基于两台 iPhone 便可实现的, 更偏向 commercial

Note

我估计 OpenCap 选择 iPhone 是因为其 ARKit 有 self-calibration 的能力 (只是猜测)

论文: OpenCap: Human movement dynamics from smartphone videos

Sports2D

Splyza Motion 做的事情类似

单视角

Important

note that the motion must lie in the sagittal or frontal plane.

应该只进行一次深度估计; 也就是说朝着/远离相机的视线方向的运动是无法估计的

Note

注意, 只能是单相机, 否则没有相机外参, 或者说不知道原点在哪里; 而单相机实现如 SMPLify 类似的实现只需要估计深度

注意

Devil is in the details. 架构图画完并不算完成——真正的问题都藏在横线里。

对系统里的每一条数据流,你都得说清楚:

  • 数据从哪来、要到哪去,中间经过哪些环节(别只写一个"大盒子")。
  • 传的到底是什么图像压缩视频张量keypoints字节流字段怎么定义
  • 怎么传socket、共享内存、文件、消息队列你选它的理由是什么延迟/吞吐/背压怎么处理?
  • 多快传:固定 Hz 还是乱序/抖动?时间戳是谁给的,用的是什么时基?怎么对齐多相机?
  • 大概多大:带宽估算要写出来,因为这会反过来决定你能不能用某种协议。

另外别忘了:系统必然有 source 和 sink——你还得把中间每一段填满填得越细越好

再往上一步:你到底是单进程、单机多进程,还是天然就要分布式?(答案往往很显然)将来要做边缘计算时,哪些东西能下沉到边缘,哪些必须留在中心?

最后提醒一句:表示层通常是终点,但 3D 渲染/交互的工作量经常等价于做半个 (游戏) 引擎——别指望 "顺手画一下"

Note

注:允许使用 LLM 辅助调研与写作,但需要你能够指出其结论的依据、假设与不确定性;换言之,你必须证明自己能识别"看似合理但实际上不成立" 的回答。

参考答案

AI 生成的架构图, 不代表我此刻的实际想法, which changes every moment.

能把每一个横线的细节都填上嘛? (在本文行文时时回顾发现其中确实有许多纰漏, 而我暂且无意修正)

flowchart TD
  %% =========================
  %% Multi-view 2D stage
  %% =========================
  subgraph VIEWS["Per-view input (cameras 1..N)"]
    direction LR
    C1["Cam 1\n2D detections: 133×2 (+conf)"] --> T1["2D latest tracking cache\n(view 1)"]
    C2["Cam 2\n2D detections: 133×2 (+conf)"] --> T2["2D latest tracking cache\n(view 2)"]
    C3["Cam 3\n2D detections: 133×2 (+conf)"] --> T3["2D latest tracking cache\n(view 3)"]
    C4["Cam 4\n2D detections: 133×2 (+conf)"] --> T4["2D latest tracking cache\n(view 4)"]
  end

  %% =========================
  %% Cross-view association
  %% =========================
  subgraph ASSOC["Cross-view data association (epipolar)"]
    direction TB
    EPI["Epipolar constraint\n(Sampson / point-to-epiline)"]:::core
    CYCLE["Cycle consistency / view-graph pruning"]:::core
    GROUP["Assemble per-target multi-view observation set\n{view_id → 133×2}"]:::core
    EPI --> CYCLE --> GROUP
  end

  T1 --> EPI
  T2 --> EPI
  T3 --> EPI
  T4 --> EPI

  %% =========================
  %% Geometry / lifting
  %% =========================
  subgraph GEOM["3D measurement construction"]
    direction TB
    RT["Camera models\nK, [R|t], SO(3)/SE(3)"]:::meta
    DLT["DLT / triangulation (init)"]:::core
    NN["Optional NN lifting / completion"]:::core
    BA["Optional reprojection refinement\n(15 iters)"]:::core
    Y["3D measurement y(t)\nJ×3 positions (+quality / R / cov)"]:::out
    RT --> DLT --> NN --> BA --> Y
  end

  GROUP --> DLT

  %% =========================
  %% Tracking filter + lifecycle
  %% =========================
  subgraph FILTER["Tracking filter (per target)"]
    direction TB
    GATE["Gating\n(Mahalanobis / per-joint + global)"]:::core
    IMM["IMM (motion model bank)\n(CV/CA or low/med/high Q)"]:::core
    PRED["Predict\nΔt, self-propagate"]:::core
    UPD["Update\nKF (linear)\nstate: [p(3J), v(3J)]"]:::core
    MISS["Miss handling & track lifecycle\n(tentative → confirmed → deleted)"]:::meta

    GATE --> IMM --> PRED --> UPD --> MISS
  end

  Y --> GATE

  %% Optional inertial fusion
  IMU["IMU (optional)"]:::meta --> INERT["EKF/UKF branch (optional)\nwhen augmenting state with orientation"]:::meta --> IMM

  %% =========================
  %% IK + optional feedback
  %% =========================
  subgraph IKSTAGE["IK stage (constraint / anatomy)"]
    direction TB
    IK["IK optimization target\n(minimize joint position error,\nadd bone length / joint limits)"]:::core
    FB["Optional feedback to filter\npseudo-measurement z_IK with large R"]:::meta
    IK --> FB
  end

  UPD --> IK
  FB -.-> GATE

  %% =========================
  %% SMPL / mesh fitting
  %% =========================
  subgraph SMPLSTAGE["SMPL / SMPL-X fitting"]
    direction TB
    VP["VPoser / pose prior"]:::core
    SMPL["SMPL(θ, β, root)\nfit to joints / reprojection"]:::core
    JR["JR: Joint Regressor\nmesh → joints (loop closure)"]:::core
    OUT["Outputs\nmesh + joints + pose params"]:::out
    VP --> SMPL --> JR --> OUT
    JR -. residual / reproject .-> SMPL
  end

  IK --> SMPL

  classDef core fill:#0b1020,stroke:#5eead4,color:#e5e7eb,stroke-width:1.2px;
  classDef meta fill:#111827,stroke:#93c5fd,color:#e5e7eb,stroke-dasharray: 4 3;
  classDef out fill:#052e2b,stroke:#34d399,color:#ecfeff,stroke-width:1.4px;

如果图片不能正确预览, 见 fig/fig.svg