# Homework 01: Survey & Design of 3D Multi-view Human Pose Estimation System 调研现有的多视角单人 (multiple view of single person) 多视角多人 (multi-view multi-person) 人体姿态估计系统/管线 ## 参考系统 ### [EasyMocap](https://chingswy.github.io/easymocap-public-doc/) - Github: [zju3dv/EasyMocap](https://github.com/zju3dv/EasyMocap) - IK fitting [easymocap/multistage/fitting.py](https://github.com/zju3dv/EasyMocap/blob/master/easymocap/multistage/fitting.py) - Pose Prior [easymocap/multistage/gmm.py](https://github.com/zju3dv/EasyMocap/blob/master/easymocap/multistage/gmm.py) - Tracking: [Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views](https://arxiv.org/abs/1901.04111) - implementation: [easymocap/assignment/track.py](https://github.com/zju3dv/EasyMocap/blob/master/easymocap/assignment/track.py#L199) note that assignment is implemented in C++ (matchSVT, or [matchSVT.hpp](https://github.com/zju3dv/EasyMocap/blob/master/library/pymatch/include/matchSVT.hpp)) - demo page: [mvmp.md](https://github.com/zju3dv/EasyMocap/blob/master/doc/mvmp.md) or [04 multiperson](https://chingswy.github.io/easymocap-public-doc/develop/04_multiperson.html) - Visualization: [Open3D](https://www.open3d.org/) (see [apps/vis/vis_client.py](https://github.com/zju3dv/EasyMocap/blob/master/apps/vis/vis_client.py), a very boring TCP naive socket client-server architecture) - see also [doc/realtime_visualization.md](https://github.com/zju3dv/EasyMocap/blob/master/doc/realtime_visualization.md) > [!NOTE] > 它也是有其显然的问题的, 找出其问题所在可作为各位同学的练习 ### [FreeMocap](https://freemocap.org) > Free Motion Capture for Everyone - Github: [freemocap/freemocap](https://github.com/freemocap/freemocap) - YouTube: [@jonmatthis](https://www.youtube.com/@jonmatthis) 似乎是大学讲师, 有分享 lectures 个人项目, 规模不大 ### [OpenSim](https://opensim.stanford.edu) 系列 斯坦福大学的项目, 大抵的架构能力是比我强的, 但是目标和我试图达到的不同 #### [Pose2Sim](https://github.com/perfanalytics/pose2sim) > Pose2Sim stands for "OpenPose to OpenSim", as it originally used OpenPose inputs (2D keypoints coordinates) and led to an OpenSim result (full-body 3D joint angles). - Camera synchronization - Multi-person identification - Robust triangulation - 3D coordinates filtering - Marker augmentation #### [OpenCap](https://www.opencap.ai) 和 [Pose2Sim](https://github.com/perfanalytics/pose2sim) 试图做的事情似乎类似, 但是给的管线是基于两台 iPhone 便可实现的, 更加成熟 #### [Sports2D](https://github.com/davidpagnon/Sports2D) 和 [Splyza Motion](https://motion.products.splyza.com) 做的事情类似 单视角 > [!IMPORTANT] > note that the motion must lie in the sagittal or frontal plane. > > 应该只进行一次深度估计; 也就是说朝着/远离相机的视线方向的运动是无法估计的 > [!NOTE] > 注意, 只能是单相机, 否则没有相机外参, 或者说不知道原点在哪里; 而单相机实现如 [SMPLify](https://smplify.is.tue.mpg.de) 类似的实现只需要估计深度 ## 注意 devil is in the details 每一个数据流, 其 source 从哪里来, 以什么格式 (tensor? image? encoded video? byte stream?) 传输, 传输的协议 (socket? shared memory? file IO?) 是什么, 传输的频率是多少 (等间隔? 若等间隔, 采样频率是多少 Hz? 不等间隔时间戳哪里来的? 以什么为时基?) 预估的带宽是多少? (这又决定了传输协议的选择) 必定有 source 与 sink, 还需要填充中间的部分 (能填充得越详细越好) 是单个进程还是多个进程? (答案应该是显然的) 能否分布式? (边缘计算的扩展可能) 表示层往往是 sink, 而渲染 3D 表示往往是 (游戏) 引擎等价的工作量 可以用 LLM, 但是能否分辨祂是否在胡说八道? ## 参考答案 AI 生成的架构图, 不代表我此刻的实际想法, which changes every moment. 能把每一个横线的细节都填上嘛? (确实有许多纰漏) ```mermaid flowchart TD %% ========================= %% Multi-view 2D stage %% ========================= subgraph VIEWS["Per-view input (cameras 1..N)"] direction LR C1["Cam 1\n2D detections: 133×2 (+conf)"] --> T1["2D latest tracking cache\n(view 1)"] C2["Cam 2\n2D detections: 133×2 (+conf)"] --> T2["2D latest tracking cache\n(view 2)"] C3["Cam 3\n2D detections: 133×2 (+conf)"] --> T3["2D latest tracking cache\n(view 3)"] C4["Cam 4\n2D detections: 133×2 (+conf)"] --> T4["2D latest tracking cache\n(view 4)"] end %% ========================= %% Cross-view association %% ========================= subgraph ASSOC["Cross-view data association (epipolar)"] direction TB EPI["Epipolar constraint\n(Sampson / point-to-epiline)"]:::core CYCLE["Cycle consistency / view-graph pruning"]:::core GROUP["Assemble per-target multi-view observation set\n{view_id → 133×2}"]:::core EPI --> CYCLE --> GROUP end T1 --> EPI T2 --> EPI T3 --> EPI T4 --> EPI %% ========================= %% Geometry / lifting %% ========================= subgraph GEOM["3D measurement construction"] direction TB RT["Camera models\nK, [R|t], SO(3)/SE(3)"]:::meta DLT["DLT / triangulation (init)"]:::core NN["Optional NN lifting / completion"]:::core BA["Optional reprojection refinement\n(1–5 iters)"]:::core Y["3D measurement y(t)\nJ×3 positions (+quality / R / cov)"]:::out RT --> DLT --> NN --> BA --> Y end GROUP --> DLT %% ========================= %% Tracking filter + lifecycle %% ========================= subgraph FILTER["Tracking filter (per target)"] direction TB GATE["Gating\n(Mahalanobis / per-joint + global)"]:::core IMM["IMM (motion model bank)\n(CV/CA or low/med/high Q)"]:::core PRED["Predict\nΔt, self-propagate"]:::core UPD["Update\nKF (linear)\nstate: [p(3J), v(3J)]"]:::core MISS["Miss handling & track lifecycle\n(tentative → confirmed → deleted)"]:::meta GATE --> IMM --> PRED --> UPD --> MISS end Y --> GATE %% Optional inertial fusion IMU["IMU (optional)"]:::meta --> INERT["EKF/UKF branch (optional)\nwhen augmenting state with orientation"]:::meta --> IMM %% ========================= %% IK + optional feedback %% ========================= subgraph IKSTAGE["IK stage (constraint / anatomy)"] direction TB IK["IK optimization target\n(minimize joint position error,\nadd bone length / joint limits)"]:::core FB["Optional feedback to filter\npseudo-measurement z_IK with large R"]:::meta IK --> FB end UPD --> IK FB -.-> GATE %% ========================= %% SMPL / mesh fitting %% ========================= subgraph SMPLSTAGE["SMPL / SMPL-X fitting"] direction TB VP["VPoser / pose prior"]:::core SMPL["SMPL(θ, β, root)\nfit to joints / reprojection"]:::core JR["JR: Joint Regressor\nmesh → joints (loop closure)"]:::core OUT["Outputs\nmesh + joints + pose params"]:::out VP --> SMPL --> JR --> OUT JR -. residual / reproject .-> SMPL end IK --> SMPL classDef core fill:#0b1020,stroke:#5eead4,color:#e5e7eb,stroke-width:1.2px; classDef meta fill:#111827,stroke:#93c5fd,color:#e5e7eb,stroke-dasharray: 4 3; classDef out fill:#052e2b,stroke:#34d399,color:#ecfeff,stroke-width:1.4px; ``` 如果图片不能正确预览, 见 [fig/fig.svg](fig/fig.svg)