Files
crosstyan 061d5b4592 build: add reproducible mmdet patch script
Add an idempotent patch-mmdet-version-gate entrypoint that rewrites the installed mmdet mmcv compatibility assert into a warning so the local rebuilt mmcv wheel can be reused after uv sync.

Cover the source rewrite with focused tests and document the post-sync command in the README so the local environment patch is reproducible instead of being a one-off manual edit inside .venv.
2026-03-26 16:35:56 +08:00

4.2 KiB

pose_tracking_exp

Offline multiview body tracking experiments built around:

  • RapidPoseTriangulation for geometric birth proposals
  • a typed replay format for recorded per-camera detections
  • a recursive active/lost tracker with fixed bone lengths

Install

uv sync --group dev

Run

uv run pose-tracking-exp run_tracking data/scene.json data/replay.jsonl

scene.json may declare camera extrinsics in either format:

  • opencv_world_to_camera: OpenCV solvePnP / cv2.projectPoints convention. This is the default.
  • rpt_camera_pose: camera pose in world coordinates, which is what RapidPoseTriangulation expects internally.

The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config. If you already have an older hand-authored scene file that stored RPT camera pose directly, set extrinsic_format explicitly to rpt_camera_pose.

Convert cvmmap Pose Payload Records

uv run pose-tracking-exp convert-cvmmap-pose input.jsonl output.jsonl

The current cvmmap .pose wire format is fixed to COCO-WholeBody-133 keypoints. That is a transport compatibility constraint, not a tracker limitation: the tracker-side normalizer accepts both coco17 and coco_wholebody133, because the first 17 body joints share the standard COCO ordering.

References:

Run Detection

uv sync --group dev --group detection
uv run patch-mmdet-version-gate
uv run pose-tracking-exp run_detection --config detection.toml camera0 camera1
uv run pose-tracking-exp run_detection --source video --output-dir data/detections --config detection.toml cam0=/data/cam0.mp4 cam1=/data/cam1.mp4

The embedded 2D detection module is organized as a swapable shim:

  • FrameSource: where images come from
  • PoseShim: object detection + pose estimation backend
  • PoseSink: where structured detections are published or stored

The default backend is yolo_rtmpose, and the heavy runtime dependencies live in the optional detection dependency group. Checkpoint paths are explicit config fields; the code does not hardcode local checkpoint locations. The only inferred path is the MMPose config path, which is resolved relative to the installed mmpose package when pose_config_path is omitted. For offline video runs, the default sink is parquet and writes one *_detected.parquet file per source. run_tracking can consume that directory directly as replay input. uv run patch-mmdet-version-gate is an idempotent local-environment patch for the current mmdet compatibility assert against the rebuilt mmcv wheel. Re-run it after uv sync if the environment is recreated.

Example detection.toml:

instances = ["camera0", "camera1"]
device = "cuda"
yolo_checkpoint = "/path/to/yolo_checkpoint.pt"
pose_checkpoint = "/path/to/coco_wholebody_pose_checkpoint.pth"

Actual Test Helper

uv run --group dev --group detection python -m tests.support.actual_test /mnt/hddl/data/ActualTest_WeiHua --segment Segment_2 --frame-start 1100 --max-frames 120

actual_test is a test/support helper, not part of the public installed CLI surface. It keeps the union of per-camera frame indices and fills missing camera rows with empty detections, so later 2-camera stretches are still usable instead of being dropped by a 4-camera intersection.

Actual Test Calibration Caveat

ActualTest_WeiHua/camera_params.parquet appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling RapidPoseTriangulation, because RPT expects camera centers and camera-to-world rotation.

In repo terms:

  • OpenCV reprojection keeps R, T, and rvec as world-to-camera extrinsics.
  • RPT export uses the derived camera pose pose_R = R^T and pose_T = -R^T t.

There is still one upstream caveat: the ParaJumping calibration notebook averages rvec samples component-wise before writing the parquet. That is a rough approximation for rotations and can introduce some bias even when the convention is handled correctly.