Files
crosstyan 061d5b4592 build: add reproducible mmdet patch script
Add an idempotent patch-mmdet-version-gate entrypoint that rewrites the installed mmdet mmcv compatibility assert into a warning so the local rebuilt mmcv wheel can be reused after uv sync.

Cover the source rewrite with focused tests and document the post-sync command in the README so the local environment patch is reproducible instead of being a one-off manual edit inside .venv.
2026-03-26 16:35:56 +08:00

92 lines
4.2 KiB
Markdown

# pose_tracking_exp
Offline multiview body tracking experiments built around:
- `RapidPoseTriangulation` for geometric birth proposals
- a typed replay format for recorded per-camera detections
- a recursive active/lost tracker with fixed bone lengths
## Install
```bash
uv sync --group dev
```
## Run
```bash
uv run pose-tracking-exp run_tracking data/scene.json data/replay.jsonl
```
`scene.json` may declare camera extrinsics in either format:
- `opencv_world_to_camera`: OpenCV `solvePnP` / `cv2.projectPoints` convention. This is the default.
- `rpt_camera_pose`: camera pose in world coordinates, which is what `RapidPoseTriangulation` expects internally.
The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config.
If you already have an older hand-authored scene file that stored RPT camera pose directly, set `extrinsic_format` explicitly to `rpt_camera_pose`.
## Convert cvmmap Pose Payload Records
```bash
uv run pose-tracking-exp convert-cvmmap-pose input.jsonl output.jsonl
```
The current cvmmap `.pose` wire format is fixed to `COCO-WholeBody-133` keypoints.
That is a transport compatibility constraint, not a tracker limitation: the tracker-side normalizer accepts both `coco17` and `coco_wholebody133`, because the first 17 body joints share the standard COCO ordering.
References:
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
- https://github.com/jin-s13/COCO-WholeBody
## Run Detection
```bash
uv sync --group dev --group detection
uv run patch-mmdet-version-gate
uv run pose-tracking-exp run_detection --config detection.toml camera0 camera1
uv run pose-tracking-exp run_detection --source video --output-dir data/detections --config detection.toml cam0=/data/cam0.mp4 cam1=/data/cam1.mp4
```
The embedded 2D detection module is organized as a swapable shim:
- `FrameSource`: where images come from
- `PoseShim`: object detection + pose estimation backend
- `PoseSink`: where structured detections are published or stored
The default backend is `yolo_rtmpose`, and the heavy runtime dependencies live in the optional `detection` dependency group.
Checkpoint paths are explicit config fields; the code does not hardcode local checkpoint locations.
The only inferred path is the MMPose config path, which is resolved relative to the installed `mmpose` package when `pose_config_path` is omitted.
For offline video runs, the default sink is parquet and writes one `*_detected.parquet` file per source. `run_tracking` can consume that directory directly as replay input.
`uv run patch-mmdet-version-gate` is an idempotent local-environment patch for the current `mmdet` compatibility assert against the rebuilt `mmcv` wheel. Re-run it after `uv sync` if the environment is recreated.
Example `detection.toml`:
```bash
instances = ["camera0", "camera1"]
device = "cuda"
yolo_checkpoint = "/path/to/yolo_checkpoint.pt"
pose_checkpoint = "/path/to/coco_wholebody_pose_checkpoint.pth"
```
## Actual Test Helper
```bash
uv run --group dev --group detection python -m tests.support.actual_test /mnt/hddl/data/ActualTest_WeiHua --segment Segment_2 --frame-start 1100 --max-frames 120
```
`actual_test` is a test/support helper, not part of the public installed CLI surface.
It keeps the union of per-camera frame indices and fills missing camera rows with empty detections, so later 2-camera stretches are still usable instead of being dropped by a 4-camera intersection.
## Actual Test Calibration Caveat
`ActualTest_WeiHua/camera_params.parquet` appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling `RapidPoseTriangulation`, because RPT expects camera centers and camera-to-world rotation.
In repo terms:
- OpenCV reprojection keeps `R`, `T`, and `rvec` as world-to-camera extrinsics.
- RPT export uses the derived camera pose `pose_R = R^T` and `pose_T = -R^T t`.
There is still one upstream caveat: the ParaJumping calibration notebook averages `rvec` samples component-wise before writing the parquet. That is a rough approximation for rotations and can introduce some bias even when the convention is handled correctly.