Pose2Sim stands for "OpenPose to OpenSim", as it originally used OpenPose inputs (2D keypoints coordinates) and led to an OpenSim result (full-body 3D joint angles).

Camera synchronization
Multi-person identification
Robust triangulation
3D coordinates filtering
Marker augmentation

OpenCap

和 Pose2Sim 试图做的事情似乎类似, 但是给的管线是基于两台 iPhone 便可实现的, 更偏向 commercial

Note

我估计 OpenCap 选择 iPhone 是因为其 ARKit 有 self-calibration 的能力 (只是猜测)

论文: OpenCap: Human movement dynamics from smartphone videos

Sports2D

和 Splyza Motion 做的事情类似

单视角

Important

note that the motion must lie in the sagittal or frontal plane.

应该只进行一次深度估计; 也就是说朝着/远离相机的视线方向的运动是无法估计的

Note

注意, 只能是单相机, 否则没有相机外参, 或者说不知道原点在哪里; 而单相机实现如 SMPLify 类似的实现只需要估计深度

注意

Devil is in the details. 架构图画完并不算完成——真正的问题都藏在横线里。

对系统里的每一条数据流，你都得说清楚：

数据从哪来、要到哪去，中间经过哪些环节（别只写一个"大盒子"）。
传的到底是什么：图像？压缩视频？张量？keypoints？字节流？字段怎么定义？
怎么传：socket、共享内存、文件、消息队列？你选它的理由是什么？延迟/吞吐/背压怎么处理？
多快传：固定 Hz 还是乱序/抖动？时间戳是谁给的，用的是什么时基？怎么对齐多相机？
大概多大：带宽估算要写出来，因为这会反过来决定你能不能用某种协议。

另外别忘了：系统必然有 source 和 sink——你还得把中间每一段填满，填得越细越好；

再往上一步：你到底是单进程、单机多进程，还是天然就要分布式？（答案往往很显然）将来要做边缘计算时，哪些东西能下沉到边缘，哪些必须留在中心？

最后提醒一句：表示层通常是终点，但 3D 渲染/交互的工作量经常等价于做半个 (游戏) 引擎——别指望 "顺手画一下"

Note

注：允许使用 LLM 辅助调研与写作，但需要你能够指出其结论的依据、假设与不确定性；换言之，你必须证明自己能识别"看似合理但实际上不成立" 的回答。

参考答案

AI 生成的架构图, 不代表我此刻的实际想法, which changes every moment.

能把每一个横线的细节都填上嘛? (在本文行文时时回顾发现其中确实有许多纰漏, 而我暂且无意修正)

flowchart TD
  %% =========================
  %% Multi-view 2D stage
  %% =========================
  subgraph VIEWS["Per-view input (cameras 1..N)"]
    direction LR
    C1["Cam 1\n2D detections: 133×2 (+conf)"] --> T1["2D latest tracking cache\n(view 1)"]
    C2["Cam 2\n2D detections: 133×2 (+conf)"] --> T2["2D latest tracking cache\n(view 2)"]
    C3["Cam 3\n2D detections: 133×2 (+conf)"] --> T3["2D latest tracking cache\n(view 3)"]
    C4["Cam 4\n2D detections: 133×2 (+conf)"] --> T4["2D latest tracking cache\n(view 4)"]
  end

  %% =========================
  %% Cross-view association
  %% =========================
  subgraph ASSOC["Cross-view data association (epipolar)"]
    direction TB
    EPI["Epipolar constraint\n(Sampson / point-to-epiline)"]:::core
    CYCLE["Cycle consistency / view-graph pruning"]:::core
    GROUP["Assemble per-target multi-view observation set\n{view_id → 133×2}"]:::core
    EPI --> CYCLE --> GROUP
  end

  T1 --> EPI
  T2 --> EPI
  T3 --> EPI
  T4 --> EPI

  %% =========================
  %% Geometry / lifting
  %% =========================
  subgraph GEOM["3D measurement construction"]
    direction TB
    RT["Camera models\nK, [R|t], SO(3)/SE(3)"]:::meta
    DLT["DLT / triangulation (init)"]:::core
    NN["Optional NN lifting / completion"]:::core
    BA["Optional reprojection refinement\n(1–5 iters)"]:::core
    Y["3D measurement y(t)\nJ×3 positions (+quality / R / cov)"]:::out
    RT --> DLT --> NN --> BA --> Y
  end

  GROUP --> DLT

  %% =========================
  %% Tracking filter + lifecycle
  %% =========================
  subgraph FILTER["Tracking filter (per target)"]
    direction TB
    GATE["Gating\n(Mahalanobis / per-joint + global)"]:::core
    IMM["IMM (motion model bank)\n(CV/CA or low/med/high Q)"]:::core
    PRED["Predict\nΔt, self-propagate"]:::core
    UPD["Update\nKF (linear)\nstate: [p(3J), v(3J)]"]:::core
    MISS["Miss handling & track lifecycle\n(tentative → confirmed → deleted)"]:::meta

    GATE --> IMM --> PRED --> UPD --> MISS
  end

  Y --> GATE

  %% Optional inertial fusion
  IMU["IMU (optional)"]:::meta --> INERT["EKF/UKF branch (optional)\nwhen augmenting state with orientation"]:::meta --> IMM

  %% =========================
  %% IK + optional feedback
  %% =========================
  subgraph IKSTAGE["IK stage (constraint / anatomy)"]
    direction TB
    IK["IK optimization target\n(minimize joint position error,\nadd bone length / joint limits)"]:::core
    FB["Optional feedback to filter\npseudo-measurement z_IK with large R"]:::meta
    IK --> FB
  end

  UPD --> IK
  FB -.-> GATE

  %% =========================
  %% SMPL / mesh fitting
  %% =========================
  subgraph SMPLSTAGE["SMPL / SMPL-X fitting"]
    direction TB
    VP["VPoser / pose prior"]:::core
    SMPL["SMPL(θ, β, root)\nfit to joints / reprojection"]:::core
    JR["JR: Joint Regressor\nmesh → joints (loop closure)"]:::core
    OUT["Outputs\nmesh + joints + pose params"]:::out
    VP --> SMPL --> JR --> OUT
    JR -. residual / reproject .-> SMPL
  end

  IK --> SMPL

  classDef core fill:#0b1020,stroke:#5eead4,color:#e5e7eb,stroke-width:1.2px;
  classDef meta fill:#111827,stroke:#93c5fd,color:#e5e7eb,stroke-dasharray: 4 3;
  classDef out fill:#052e2b,stroke:#34d399,color:#ecfeff,stroke-width:1.4px;

如果图片不能正确预览, 见 fig/fig.svg

8.3 KiB Raw Blame History Unescape Escape

Homework 01: Survey & Design of 3D Multi-view Human Pose Estimation System

参考系统

EasyMocap

FreeMocap

OpenSim 系列

Pose2Sim

OpenCap

Sports2D

注意

参考答案

8.3 KiB

Raw Blame History