feat!: reorganize detection and tracking pipeline

Refactor the package into common, schema, detection, and tracking namespaces and move dataset-specific ActualTest utilities into tests/support.

Add a pluggable detection stack with typed protocols, pydantic-settings config, loguru-based runner logging, cvmmap and headless video sources, NATS and parquet sinks, and a structured coco-wholebody133 payload path.

Teach tracking replay loading to consume parquet detection directories directly, preserve empty frames, and keep the video-to-parquet-to-tracking workflow usable for offline E2E runs.

Vendor the local mmcv and xtcocotools wheels under Git LFS, update uv sources/lock state, and refresh the mmcv build so mmcv.ops loads successfully with the current torch+cu130 environment.
This commit is contained in:
2026-03-26 16:24:27 +08:00
parent f1a2372b3c
commit 2c0d51ab31
56 changed files with 5179 additions and 889 deletions
+1
View File
@@ -0,0 +1 @@
vendor/wheels/*.whl filter=lfs diff=lfs merge=lfs -text
+50 -5
View File
@@ -9,13 +9,13 @@ Offline multiview body tracking experiments built around:
## Install ## Install
```bash ```bash
uv sync --extra dev uv sync --group dev
``` ```
## Run ## Run
```bash ```bash
uv run pose-tracking-exp run data/scene.json data/replay.jsonl uv run pose-tracking-exp run_tracking data/scene.json data/replay.jsonl
``` ```
`scene.json` may declare camera extrinsics in either format: `scene.json` may declare camera extrinsics in either format:
@@ -26,13 +26,58 @@ uv run pose-tracking-exp run data/scene.json data/replay.jsonl
The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config. The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config.
If you already have an older hand-authored scene file that stored RPT camera pose directly, set `extrinsic_format` explicitly to `rpt_camera_pose`. If you already have an older hand-authored scene file that stored RPT camera pose directly, set `extrinsic_format` explicitly to `rpt_camera_pose`.
## Convert ParaJumping Payload Records ## Convert cvmmap Pose Payload Records
```bash ```bash
uv run pose-tracking-exp convert-parajumping input.jsonl output.jsonl uv run pose-tracking-exp convert-cvmmap-pose input.jsonl output.jsonl
``` ```
## ActualTest Calibration Caveat The current cvmmap `.pose` wire format is fixed to `COCO-WholeBody-133` keypoints.
That is a transport compatibility constraint, not a tracker limitation: the tracker-side normalizer accepts both `coco17` and `coco_wholebody133`, because the first 17 body joints share the standard COCO ordering.
References:
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
- https://github.com/jin-s13/COCO-WholeBody
## Run Detection
```bash
uv sync --group dev --group detection
uv run pose-tracking-exp run_detection --config detection.toml camera0 camera1
uv run pose-tracking-exp run_detection --source video --output-dir data/detections --config detection.toml cam0=/data/cam0.mp4 cam1=/data/cam1.mp4
```
The embedded 2D detection module is organized as a swapable shim:
- `FrameSource`: where images come from
- `PoseShim`: object detection + pose estimation backend
- `PoseSink`: where structured detections are published or stored
The default backend is `yolo_rtmpose`, and the heavy runtime dependencies live in the optional `detection` dependency group.
Checkpoint paths are explicit config fields; the code does not hardcode local checkpoint locations.
The only inferred path is the MMPose config path, which is resolved relative to the installed `mmpose` package when `pose_config_path` is omitted.
For offline video runs, the default sink is parquet and writes one `*_detected.parquet` file per source. `run_tracking` can consume that directory directly as replay input.
Example `detection.toml`:
```bash
instances = ["camera0", "camera1"]
device = "cuda"
yolo_checkpoint = "/path/to/yolo_checkpoint.pt"
pose_checkpoint = "/path/to/coco_wholebody_pose_checkpoint.pth"
```
## Actual Test Helper
```bash
uv run --group dev --group detection python -m tests.support.actual_test /mnt/hddl/data/ActualTest_WeiHua --segment Segment_2 --frame-start 1100 --max-frames 120
```
`actual_test` is a test/support helper, not part of the public installed CLI surface.
It keeps the union of per-camera frame indices and fills missing camera rows with empty detections, so later 2-camera stretches are still usable instead of being dropped by a 4-camera intersection.
## Actual Test Calibration Caveat
`ActualTest_WeiHua/camera_params.parquet` appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling `RapidPoseTriangulation`, because RPT expects camera centers and camera-to-world rotation. `ActualTest_WeiHua/camera_params.parquet` appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling `RapidPoseTriangulation`, because RPT expects camera centers and camera-to-world rotation.
+22 -2
View File
@@ -7,13 +7,14 @@ name = "pose-tracking-exp"
version = "0.1.0" version = "0.1.0"
description = "Offline multiview pose tracking experiment with RPT-backed proposal births" description = "Offline multiview pose tracking experiment with RPT-backed proposal births"
readme = "README.md" readme = "README.md"
requires-python = ">=3.12" requires-python = ">=3.12,<3.13"
dependencies = [ dependencies = [
"anyio>=4.11.0",
"beartype>=0.19.0", "beartype>=0.19.0",
"click>=8.2.1", "click>=8.2.1",
"jaxtyping>=0.3.2", "jaxtyping>=0.3.2",
"numpy>=2.1.0", "numpy>=2.1.0",
"opencv-python>=4.12.0.88", "opencv-python-headless>=4.12.0.88",
"pyarrow>=21.0.0", "pyarrow>=21.0.0",
"rapid-pose-triangulation", "rapid-pose-triangulation",
"scipy>=1.15.0", "scipy>=1.15.0",
@@ -22,8 +23,24 @@ dependencies = [
[dependency-groups] [dependency-groups]
dev = [ dev = [
"basedpyright>=1.31.0", "basedpyright>=1.31.0",
"jupyterlab>=4.5.6",
"pytest>=8.4.0", "pytest>=8.4.0",
] ]
detection = [
"cvmmap-client",
"loguru>=0.7.3",
"mmcv",
"mmdet>=3.3.0",
"mmengine>=0.10.7",
"mmpose>=1.3.2",
"nats-py>=2.11.0",
"pydantic>=2.11.7",
"pydantic-settings>=2.0.0",
"torch>=2.7.0",
"torchvision>=0.22.0",
"ultralytics>=8.3.166",
"xtcocotools",
]
[project.scripts] [project.scripts]
pose-tracking-exp = "pose_tracking_exp.cli:main" pose-tracking-exp = "pose_tracking_exp.cli:main"
@@ -33,6 +50,9 @@ packages = ["src/pose_tracking_exp"]
[tool.uv.sources] [tool.uv.sources]
rapid-pose-triangulation = { path = "../RapidPoseTriangulation", editable = true } rapid-pose-triangulation = { path = "../RapidPoseTriangulation", editable = true }
cvmmap-client = { path = "../cvmmap-python-client", editable = true }
mmcv = { path = "vendor/wheels/mmcv-2.2.0-cp312-cp312-linux_x86_64.whl" }
xtcocotools = { path = "vendor/wheels/xtcocotools-1.14.3-cp312-cp312-linux_x86_64.whl" }
[tool.pytest.ini_options] [tool.pytest.ini_options]
testpaths = ["tests"] testpaths = ["tests"]
+12 -15
View File
@@ -1,40 +1,37 @@
from pose_tracking_exp.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT from pose_tracking_exp.common.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT
from pose_tracking_exp.models import ( from pose_tracking_exp.detection.cvmmap_payload import CvmmapPosePayloadCodec, decode_pose_payload
ActiveTrackState, from pose_tracking_exp.schema import (
CameraCalibration, CameraCalibration,
CameraFrame, CameraFrame,
FrameBundle, FrameBundle,
PoseDetection, PoseDetection,
ProposalCluster,
ReplaySequence, ReplaySequence,
SceneConfig, SceneConfig,
TentativeTrackState,
TrackerConfig, TrackerConfig,
TrackedFrameResult,
) )
from pose_tracking_exp.parajumping import decode_pose_payload from pose_tracking_exp.tracking import (
from pose_tracking_exp.replay import load_replay_file, load_scene_file PoseTracker,
from pose_tracking_exp.sync import synchronize_frames load_parquet_replay_dir,
from pose_tracking_exp.tracker import PoseTracker load_replay_file,
load_scene_file,
synchronize_frames,
)
__all__ = [ __all__ = [
"BODY20_JOINT_NAMES", "BODY20_JOINT_NAMES",
"BODY20_OBSERVATION_COUNT", "BODY20_OBSERVATION_COUNT",
"ActiveTrackState",
"CameraCalibration", "CameraCalibration",
"CameraFrame", "CameraFrame",
"CvmmapPosePayloadCodec",
"FrameBundle", "FrameBundle",
"PoseDetection", "PoseDetection",
"PoseTracker", "PoseTracker",
"ProposalCluster", "load_parquet_replay_dir",
"ReplaySequence", "ReplaySequence",
"SceneConfig", "SceneConfig",
"TentativeTrackState",
"TrackedFrameResult",
"TrackerConfig", "TrackerConfig",
"decode_pose_payload", "decode_pose_payload",
"load_replay_file", "load_replay_file",
"load_scene_file", "load_scene_file",
"synchronize_frames", "synchronize_frames",
] ]
+110 -88
View File
@@ -1,15 +1,12 @@
import json import json
import sys
from pathlib import Path from pathlib import Path
from typing import Literal, cast
import click import click
from pose_tracking_exp.actualtest import load_actualtest_scene, load_actualtest_segment_bundles from pose_tracking_exp.detection.cvmmap_payload import convert_payload_jsonl_lines
from pose_tracking_exp.models import TrackerConfig from pose_tracking_exp.schema import TrackerConfig
from pose_tracking_exp.parajumping import convert_payload_jsonl_lines from pose_tracking_exp.tracking import PoseTracker, load_replay_file, load_scene_file, synchronize_frames
from pose_tracking_exp.replay import load_replay_file, load_scene_file
from pose_tracking_exp.sync import synchronize_frames
from pose_tracking_exp.tracker import PoseTracker
@click.group() @click.group()
@@ -17,19 +14,120 @@ def main() -> None:
"""Offline multiview pose tracking experiment CLI.""" """Offline multiview pose tracking experiment CLI."""
@main.command("convert-parajumping") @main.command("convert-cvmmap-pose")
@click.argument("input_path", type=click.Path(path_type=Path, exists=True, dir_okay=False)) @click.argument("input_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
@click.argument("output_path", type=click.Path(path_type=Path, dir_okay=False)) @click.argument("output_path", type=click.Path(path_type=Path, dir_okay=False))
def convert_parajumping(input_path: Path, output_path: Path) -> None: def convert_cvmmap_pose(input_path: Path, output_path: Path) -> None:
lines = input_path.read_text(encoding="utf-8").splitlines() lines = input_path.read_text(encoding="utf-8").splitlines()
converted = convert_payload_jsonl_lines(lines) converted = convert_payload_jsonl_lines(lines)
output_path.write_text("\n".join(converted) + ("\n" if converted else ""), encoding="utf-8") output_path.write_text("\n".join(converted) + ("\n" if converted else ""), encoding="utf-8")
@main.command("run") @main.command("run_detection")
@click.argument("inputs", nargs=-1, type=str, required=False)
@click.option(
"--config",
"config_path",
type=click.Path(dir_okay=False, path_type=Path),
default=None,
help="Optional TOML detection runner config file.",
)
@click.option(
"--source",
"source_kind",
type=click.Choice(("cvmmap", "video")),
default="cvmmap",
show_default=True,
help="Frame source implementation to use.",
)
@click.option(
"--sink",
"sink_kind",
type=click.Choice(("auto", "nats", "parquet")),
default="auto",
show_default=True,
help="Output sink. `auto` picks nats for cvmmap and parquet for video.",
)
@click.option(
"--output-dir",
type=click.Path(file_okay=False, path_type=Path),
default=None,
help="Required for parquet sink output.",
)
@click.option(
"--log-level",
default="INFO",
show_default=True,
type=click.Choice(("DEBUG", "INFO", "WARNING", "ERROR")),
)
def run_detection(
inputs: tuple[str, ...],
config_path: Path | None,
source_kind: str,
sink_kind: str,
output_dir: Path | None,
log_level: str,
) -> None:
import anyio
from loguru import logger
from pose_tracking_exp.detection import (
CvmmapFrameSource,
NatsPoseSink,
ParquetPoseSink,
VideoFrameSource,
build_pose_shim,
load_detection_runner_config,
parse_video_input_specs,
resolve_instances,
run_detection_runner,
)
logger.remove()
logger.add(
sys.stderr,
level=log_level,
format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {name}:{function}:{line} | {message}",
)
config = load_detection_runner_config(config_path)
config.validate_runtime_paths()
if source_kind == "cvmmap":
resolved_instances = resolve_instances(inputs, config.instances)
config = config.model_copy(update={"instances": resolved_instances})
sources = tuple(CvmmapFrameSource(instance) for instance in resolved_instances)
else:
video_inputs = parse_video_input_specs(inputs)
sources = tuple(
VideoFrameSource(video_path, source_name=source_name)
for source_name, video_path in video_inputs
)
pose_shim = build_pose_shim(config)
resolved_sink_kind = sink_kind
if resolved_sink_kind == "auto":
resolved_sink_kind = "nats" if source_kind == "cvmmap" else "parquet"
if resolved_sink_kind == "nats":
pose_sink = NatsPoseSink(config.nats_host)
else:
if output_dir is None:
raise click.ClickException("--output-dir is required for parquet sink output.")
pose_sink = ParquetPoseSink(output_dir)
anyio.run(
run_detection_runner,
sources,
pose_shim,
pose_sink,
config,
)
@main.command("run_tracking")
@click.argument("scene_path", type=click.Path(path_type=Path, exists=True, dir_okay=False)) @click.argument("scene_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
@click.argument("replay_path", type=click.Path(path_type=Path, exists=True, dir_okay=False)) @click.argument("replay_path", type=click.Path(path_type=Path, exists=True))
def run(scene_path: Path, replay_path: Path) -> None: def run_tracking(scene_path: Path, replay_path: Path) -> None:
scene = load_scene_file(scene_path) scene = load_scene_file(scene_path)
replay = load_replay_file(scene_path, replay_path) replay = load_replay_file(scene_path, replay_path)
config = TrackerConfig() config = TrackerConfig()
@@ -52,79 +150,3 @@ def run(scene_path: Path, replay_path: Path) -> None:
for result in results for result in results
] ]
click.echo(json.dumps(payload, indent=2)) click.echo(json.dumps(payload, indent=2))
@main.command("run-actualtest")
@click.argument("root_path", type=click.Path(path_type=Path, exists=True, file_okay=False))
@click.option("--segment", "segment_name", default="Segment_1", show_default=True)
@click.option("--frame-start", default=690, type=int, show_default=True)
@click.option("--frame-stop", type=int)
@click.option("--max-frames", type=int)
@click.option("--mode", type=click.Choice(["single_person", "general"]), default="single_person", show_default=True)
@click.option("--proposal-min-score", default=0.5, type=float, show_default=True)
@click.option("--tentative-min-age", default=2, type=int, show_default=True)
@click.option("--tentative-hits-required", default=2, type=int, show_default=True)
@click.option("--tentative-promote-score", default=1.2, type=float, show_default=True)
def run_actualtest(
root_path: Path,
segment_name: str,
frame_start: int,
frame_stop: int | None,
max_frames: int | None,
mode: str,
proposal_min_score: float,
tentative_min_age: int,
tentative_hits_required: int,
tentative_promote_score: float,
) -> None:
tracker_mode = cast(Literal["general", "single_person"], mode)
scene = load_actualtest_scene(root_path)
bundles = load_actualtest_segment_bundles(
root_path,
segment_name,
frame_start=frame_start,
frame_stop=frame_stop,
max_frames=max_frames,
)
config = TrackerConfig(
mode=tracker_mode,
proposal_min_score=proposal_min_score,
tentative_min_age=tentative_min_age,
tentative_hits_required=tentative_hits_required,
tentative_promote_score=tentative_promote_score,
)
tracker = PoseTracker(scene, config)
results = tracker.run(bundles)
diagnostics = tracker.diagnostics_snapshot()
payload = {
"segment": segment_name,
"mode": tracker_mode,
"bundle_count": len(results),
"active_track_frames": sum(1 for result in results if result.active_tracks),
"proposal_frames": sum(1 for result in results if result.proposals),
"max_active_tracks": max((len(result.active_tracks) for result in results), default=0),
"diagnostics": {
"match_existing_calls": diagnostics.match_existing_calls,
"match_existing_seconds": diagnostics.match_existing_seconds,
"proposal_build_calls": diagnostics.proposal_build_calls,
"proposal_build_seconds": diagnostics.proposal_build_seconds,
"promotions": diagnostics.promotions,
"reacquisitions": diagnostics.reacquisitions,
"active_updates": diagnostics.active_updates,
"seed_initializations": diagnostics.seed_initializations,
"nonlinear_refinements": diagnostics.nonlinear_refinements,
},
"results": [
{
"bundle_index": result.bundle_index,
"source_frame_index": bundle.views[0].frame_index if bundle.views else -1,
"timestamp_unix_ns": result.timestamp_unix_ns,
"tentative_track_ids": [track.track_id for track in result.tentative_tracks],
"active_track_ids": [track.track_id for track in result.active_tracks],
"lost_track_ids": [track.track_id for track in result.lost_tracks],
"proposal_count": len(result.proposals),
}
for bundle, result in zip(bundles, results, strict=True)
],
}
click.echo(json.dumps(payload, indent=2))
+33
View File
@@ -0,0 +1,33 @@
from pose_tracking_exp.common.camera_math import project_pose
from pose_tracking_exp.common.joints import (
BODY20_INDEX_BY_NAME,
BODY20_JOINT_NAMES,
BODY20_OBSERVATION_COUNT,
COCO_BODY17_INDEX_BY_NAME,
COCO_BODY17_NAMES,
CORE_JOINT_INDICES,
CORE_JOINT_NAMES,
)
from pose_tracking_exp.common.normalization import (
core_reprojection_distance,
infer_bbox_from_keypoints,
normalize_coco_body20,
normalize_openpose25_body20,
normalize_rtmpose_body20,
)
__all__ = [
"BODY20_INDEX_BY_NAME",
"BODY20_JOINT_NAMES",
"BODY20_OBSERVATION_COUNT",
"COCO_BODY17_INDEX_BY_NAME",
"COCO_BODY17_NAMES",
"CORE_JOINT_INDICES",
"CORE_JOINT_NAMES",
"core_reprojection_distance",
"infer_bbox_from_keypoints",
"normalize_coco_body20",
"normalize_openpose25_body20",
"normalize_rtmpose_body20",
"project_pose",
]
@@ -1,8 +1,8 @@
import cv2 import cv2
import numpy as np import numpy as np
from pose_tracking_exp.models import CameraCalibration from pose_tracking_exp.common.tensor_types import Pose3D
from pose_tracking_exp.tensor_types import Pose3D from pose_tracking_exp.schema.camera import CameraCalibration
def project_pose(camera: CameraCalibration, pose3d: Pose3D) -> np.ndarray: def project_pose(camera: CameraCalibration, pose3d: Pose3D) -> np.ndarray:
@@ -0,0 +1,43 @@
from pathlib import Path
import pyarrow as pa
from pose_tracking_exp.schema.detection import PoseDetections
DETECTED_PARQUET_SUFFIX = "_detected.parquet"
DETECTION_PARQUET_SCHEMA = pa.schema(
[
pa.field("frame_index", pa.int64()),
pa.field("timestamp_unix_ns", pa.int64()),
pa.field("source_width", pa.int32()),
pa.field("source_height", pa.int32()),
pa.field("boxes", pa.list_(pa.list_(pa.float32()))),
pa.field("box_scores", pa.list_(pa.float32())),
pa.field("kps", pa.list_(pa.list_(pa.list_(pa.float32())))),
pa.field("kps_scores", pa.list_(pa.list_(pa.float32()))),
pa.field("keypoint_schema", pa.string()),
]
)
def detection_parquet_path(output_dir: Path, source_name: str) -> Path:
return output_dir / f"{source_name}{DETECTED_PARQUET_SUFFIX}"
def pose_detections_to_row(detections: PoseDetections) -> dict[str, object]:
if detections.box_scores is None:
raise ValueError("Parquet sink requires box_scores to be present.")
if detections.keypoint_scores is None:
raise ValueError("Parquet sink requires keypoint_scores to be present.")
return {
"frame_index": int(detections.frame_index),
"timestamp_unix_ns": int(detections.timestamp_unix_ns),
"source_width": int(detections.source_size[0]),
"source_height": int(detections.source_size[1]),
"boxes": detections.boxes_xyxy.astype("float32", copy=False).tolist(),
"box_scores": detections.box_scores.astype("float32", copy=False).tolist(),
"kps": detections.keypoints_xy.astype("float32", copy=False).tolist(),
"kps_scores": detections.keypoint_scores.astype("float32", copy=False).tolist(),
"keypoint_schema": detections.keypoint_schema,
}
@@ -39,7 +39,7 @@ CORE_JOINT_NAMES: tuple[str, ...] = (
CORE_JOINT_INDICES: tuple[int, ...] = tuple(BODY20_INDEX_BY_NAME[name] for name in CORE_JOINT_NAMES) CORE_JOINT_INDICES: tuple[int, ...] = tuple(BODY20_INDEX_BY_NAME[name] for name in CORE_JOINT_NAMES)
RTMPOSE_BODY17_INDEX_BY_NAME = { COCO_BODY17_INDEX_BY_NAME = {
"nose": 0, "nose": 0,
"eye_left": 1, "eye_left": 1,
"eye_right": 2, "eye_right": 2,
@@ -59,5 +59,8 @@ RTMPOSE_BODY17_INDEX_BY_NAME = {
"ankle_right": 16, "ankle_right": 16,
} }
RTMPOSE_BODY17_NAMES = tuple(RTMPOSE_BODY17_INDEX_BY_NAME.keys()) COCO_BODY17_NAMES = tuple(COCO_BODY17_INDEX_BY_NAME.keys())
# RTMPose whole-body uses the standard COCO body-17 ordering for the first 17 joints.
RTMPOSE_BODY17_INDEX_BY_NAME = COCO_BODY17_INDEX_BY_NAME
RTMPOSE_BODY17_NAMES = COCO_BODY17_NAMES
@@ -1,12 +1,51 @@
import math import math
from collections.abc import Mapping from collections.abc import Mapping
from typing import Literal
import numpy as np import numpy as np
from beartype import beartype from beartype import beartype
from jaxtyping import jaxtyped from jaxtyping import jaxtyped
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME, BODY20_OBSERVATION_COUNT, RTMPOSE_BODY17_INDEX_BY_NAME from pose_tracking_exp.common.joints import (
from pose_tracking_exp.tensor_types import FloatArray, JointXY, Pose2D BODY20_INDEX_BY_NAME,
BODY20_OBSERVATION_COUNT,
COCO_BODY17_INDEX_BY_NAME,
)
from pose_tracking_exp.common.tensor_types import FloatArray, JointXY, Pose2D
def _validate_coco_shape(
keypoints_xy: FloatArray,
confidences: FloatArray,
*,
keypoint_schema: Literal["coco17", "coco_wholebody133"] | None,
) -> Literal["coco17", "coco_wholebody133"]:
if keypoints_xy.ndim != 2 or keypoints_xy.shape[1] != 2:
raise ValueError(
f"Expected keypoints with shape (N, 2), got {keypoints_xy.shape}."
)
if confidences.ndim != 1 or confidences.shape[0] != keypoints_xy.shape[0]:
raise ValueError(
"Expected confidences with shape matching keypoint count. "
f"Got {confidences.shape} for {keypoints_xy.shape}."
)
detected_schema: Literal["coco17", "coco_wholebody133"]
if keypoints_xy.shape[0] == 17:
detected_schema = "coco17"
elif keypoints_xy.shape[0] == 133:
detected_schema = "coco_wholebody133"
else:
raise ValueError(
"Expected COCO-compatible keypoints with 17 or 133 joints, "
f"got {keypoints_xy.shape[0]}."
)
if keypoint_schema is not None and keypoint_schema != detected_schema:
raise ValueError(
f"Expected {keypoint_schema} keypoints, got shape {keypoints_xy.shape}."
)
return detected_schema
def _visible_mean(points: list[tuple[np.ndarray, float]], fallback_xy: np.ndarray) -> tuple[np.ndarray, float]: def _visible_mean(points: list[tuple[np.ndarray, float]], fallback_xy: np.ndarray) -> tuple[np.ndarray, float]:
@@ -68,18 +107,20 @@ def _normalize_named_keypoints(
@jaxtyped(typechecker=beartype) @jaxtyped(typechecker=beartype)
def normalize_rtmpose_body20( def normalize_coco_body20(
keypoints_xy: FloatArray, keypoints_xy: FloatArray,
confidences: FloatArray, confidences: FloatArray,
*,
keypoint_schema: Literal["coco17", "coco_wholebody133"] | None = None,
) -> Pose2D: ) -> Pose2D:
if keypoints_xy.shape != (133, 2): _validate_coco_shape(
raise ValueError(f"Expected RTMPose keypoints with shape (133, 2), got {keypoints_xy.shape}.") keypoints_xy,
if confidences.shape != (133,): confidences,
raise ValueError(f"Expected RTMPose confidences with shape (133,), got {confidences.shape}.") keypoint_schema=keypoint_schema,
)
keypoint_map = { keypoint_map = {
name: (keypoints_xy[source_index], float(confidences[source_index])) name: (keypoints_xy[source_index], float(confidences[source_index]))
for name, source_index in RTMPOSE_BODY17_INDEX_BY_NAME.items() for name, source_index in COCO_BODY17_INDEX_BY_NAME.items()
} }
return _normalize_named_keypoints( return _normalize_named_keypoints(
keypoint_map, keypoint_map,
@@ -89,6 +130,18 @@ def normalize_rtmpose_body20(
) )
@jaxtyped(typechecker=beartype)
def normalize_rtmpose_body20(
keypoints_xy: FloatArray,
confidences: FloatArray,
) -> Pose2D:
return normalize_coco_body20(
keypoints_xy,
confidences,
keypoint_schema="coco_wholebody133",
)
@jaxtyped(typechecker=beartype) @jaxtyped(typechecker=beartype)
def normalize_openpose25_body20(keypoints: FloatArray) -> Pose2D: def normalize_openpose25_body20(keypoints: FloatArray) -> Pose2D:
if keypoints.shape != (25, 3): if keypoints.shape != (25, 3):
@@ -0,0 +1,58 @@
from pose_tracking_exp.detection.config import (
DEFAULT_BACKEND,
DetectionRunnerConfig,
load_detection_runner_config,
resolve_default_pose_config,
resolve_instances,
)
from pose_tracking_exp.detection.factory import build_pose_shim
from pose_tracking_exp.detection.runner import (
SimpleMovingAverage,
SourceSlot,
run_detection_runner,
store_latest_frame,
take_pending_batch,
)
from pose_tracking_exp.detection.sinks import NatsPoseSink, ParquetPoseSink
from pose_tracking_exp.detection.sources import (
CvmmapFrameSource,
IteratorFrameSource,
VideoFrameSource,
parse_video_input_specs,
)
from pose_tracking_exp.schema.detection import BoxDetections, CocoKeypointSchema, PoseBatchRequest, PoseDetections, SourceFrame
from pose_tracking_exp.detection.yolo_rtmpose import (
WholeBodyPoseEstimator,
YoloRtmposeShim,
build_yolo_rtmpose_shim,
legacy_torch_checkpoint_loading,
)
__all__ = [
"BoxDetections",
"CocoKeypointSchema",
"CvmmapFrameSource",
"DEFAULT_BACKEND",
"DetectionRunnerConfig",
"IteratorFrameSource",
"NatsPoseSink",
"ParquetPoseSink",
"PoseBatchRequest",
"PoseDetections",
"SimpleMovingAverage",
"SourceFrame",
"SourceSlot",
"VideoFrameSource",
"WholeBodyPoseEstimator",
"YoloRtmposeShim",
"build_pose_shim",
"build_yolo_rtmpose_shim",
"legacy_torch_checkpoint_loading",
"load_detection_runner_config",
"parse_video_input_specs",
"resolve_default_pose_config",
"resolve_instances",
"run_detection_runner",
"store_latest_frame",
"take_pending_batch",
]
+147
View File
@@ -0,0 +1,147 @@
import tomllib
from pathlib import Path
from typing import Any, Literal, cast
import click
from pydantic import (
PositiveFloat,
PositiveInt,
ValidationError,
field_validator,
model_validator,
)
from pydantic_settings import (
BaseSettings,
PydanticBaseSettingsSource,
SettingsConfigDict,
)
DEFAULT_BACKEND = "yolo_rtmpose"
ENV_PREFIX = "POSE_TRACKING_EXP_DETECTION_"
POSE_CONFIG_RELATIVE_PATH = Path(
"wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py"
)
def resolve_default_pose_config() -> Path:
import mmpose
module_file = getattr(mmpose, "__file__", None)
if module_file is None:
raise FileNotFoundError("Could not locate the installed mmpose package.")
config_path = (
Path(module_file).resolve().parent
/ ".mim"
/ "configs"
/ POSE_CONFIG_RELATIVE_PATH
)
if not config_path.exists():
raise FileNotFoundError(f"Default pose config is missing: {config_path}")
return config_path
class DetectionRunnerConfig(BaseSettings):
model_config = SettingsConfigDict(
env_prefix=ENV_PREFIX,
extra="forbid",
)
instances: tuple[str, ...] = ()
backend: Literal["yolo_rtmpose"] = DEFAULT_BACKEND
device: str = "cuda"
nats_host: str = "nats://localhost:4222"
yolo_checkpoint: Path
yolo_conf_threshold: float = 0.6
pose_checkpoint: Path
pose_config_path: Path | None = None
bbox_area_threshold: PositiveInt = 50 * 50
max_batch_frames: PositiveInt = 8
max_batch_wait_ms: int = 4
slow_frame_budget_seconds: PositiveFloat = 1 / 22
@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
return (
env_settings,
init_settings,
dotenv_settings,
file_secret_settings,
)
@field_validator("instances", mode="before")
@classmethod
def _parse_instances(cls, value: object) -> object:
if isinstance(value, str):
return tuple(item.strip() for item in value.split(",") if item.strip())
return value
@field_validator("max_batch_wait_ms")
@classmethod
def _validate_wait_ms(cls, value: int) -> int:
if value < 0:
raise ValueError("max_batch_wait_ms must be non-negative.")
return value
@model_validator(mode="after")
def _resolve_pose_config(self) -> "DetectionRunnerConfig":
if self.pose_config_path is None:
self.pose_config_path = resolve_default_pose_config()
return self
def validate_runtime_paths(self) -> None:
missing: list[Path] = []
for candidate in (
self.yolo_checkpoint,
self.pose_checkpoint,
self.pose_config_path,
):
if candidate is None:
raise FileNotFoundError
if not candidate.exists():
missing.append(candidate)
if missing:
formatted = ", ".join(str(path) for path in missing)
raise click.ClickException(f"Missing runtime assets: {formatted}")
def load_detection_runner_config(config_path: Path | None) -> DetectionRunnerConfig:
config_data: dict[str, object] = {}
if config_path is not None:
with config_path.open("rb") as handle:
parsed = tomllib.load(handle)
if not isinstance(parsed, dict):
raise click.ClickException("Detection runner config must be a TOML table.")
config_data = parsed
try:
# TOML/env values are validated by Pydantic at construction.
return DetectionRunnerConfig(**cast(dict[str, Any], config_data))
except (ValidationError, ValueError, FileNotFoundError) as exc:
raise click.ClickException(str(exc)) from exc
def resolve_instances(
cli_instances: tuple[str, ...],
configured_instances: tuple[str, ...],
) -> tuple[str, ...]:
selected = cli_instances or configured_instances
if not selected:
raise click.ClickException(
"Provide at least one instance on the command line or via config `instances = [...]`."
)
unique_instances: list[str] = []
seen: set[str] = set()
for instance in selected:
if instance in seen:
raise click.ClickException(f"Duplicate instance requested: {instance}")
unique_instances.append(instance)
seen.add(instance)
return tuple(unique_instances)
@@ -0,0 +1,237 @@
"""cvmmap pose payload helpers.
The current `.pose` wire format is fixed-width for COCO-WholeBody-133 keypoints.
That is a protocol compatibility choice, not a tracker limitation: the tracker
normalizer accepts either `coco17` or `coco_wholebody133` because the first
17 body joints share the standard COCO ordering.
References:
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
- https://github.com/jin-s13/COCO-WholeBody
"""
import base64
import json
from dataclasses import dataclass
import numpy as np
from beartype import beartype
from pose_tracking_exp.common.normalization import normalize_coco_body20
from pose_tracking_exp.schema import CameraFrame, PoseDetection
from pose_tracking_exp.schema.detection import PoseDetections
PROTOCOL_HEADER = bytes([0x80]) + b"POSE"
COCO_WHOLEBODY_KEYPOINT_COUNT = 133
@dataclass(slots=True)
class DecodedPosePayload:
frame_index: int
reference_size: tuple[int, int]
timestamp_unix_ns: int
detections: tuple[PoseDetection, ...]
class CvmmapPosePayloadCodec:
def encode(self, detections: PoseDetections) -> bytes:
return encode_pose_payload(detections)
def _read_u8(payload: memoryview, offset: int) -> tuple[int, int]:
return int(payload[offset]), offset + 1
def _read_u16_array(payload: memoryview, offset: int, count: int) -> tuple[np.ndarray, int]:
size = count * 2
array = np.frombuffer(payload[offset : offset + size], dtype="<u2", count=count).astype(np.float64)
return array, offset + size
@beartype
def decode_pose_payload(payload: bytes) -> DecodedPosePayload:
if not payload.startswith(PROTOCOL_HEADER):
raise ValueError("Invalid cvmmap pose payload header.")
view = memoryview(payload)
offset = len(PROTOCOL_HEADER)
frame_index = int.from_bytes(view[offset : offset + 4], "little")
offset += 4
reference_size = tuple(int(x) for x in np.frombuffer(view[offset : offset + 4], dtype="<u2", count=2))
offset += 4
num_bbox = int(view[offset])
offset += 1
bbox_raw, offset = _read_u16_array(view, offset, num_bbox * 4)
bboxes = bbox_raw.reshape(num_bbox, 4) if num_bbox > 0 else np.zeros((0, 4), dtype=np.float64)
num_bbox_conf = int(view[offset])
offset += 1
bbox_confidence = np.frombuffer(view[offset : offset + num_bbox_conf], dtype=np.uint8, count=num_bbox_conf)
offset += num_bbox_conf
num_keypoints = int(view[offset])
offset += 1
keypoints_raw, offset = _read_u16_array(
view,
offset,
num_keypoints * COCO_WHOLEBODY_KEYPOINT_COUNT * 2,
)
keypoints_xy = (
keypoints_raw.reshape(num_keypoints, COCO_WHOLEBODY_KEYPOINT_COUNT, 2)
if num_keypoints > 0
else np.zeros((0, COCO_WHOLEBODY_KEYPOINT_COUNT, 2), dtype=np.float64)
)
num_keypoint_conf = int(view[offset])
offset += 1
keypoint_confidence_count = num_keypoint_conf * COCO_WHOLEBODY_KEYPOINT_COUNT
keypoint_confidence = (
np.frombuffer(
view[offset : offset + keypoint_confidence_count],
dtype=np.uint8,
count=keypoint_confidence_count,
).astype(np.float64)
/ 255.0
)
offset += keypoint_confidence_count
timestamp_unix_ns = int.from_bytes(view[offset : offset + 8], "little")
if num_keypoint_conf > 0 and num_keypoint_conf != num_keypoints:
raise ValueError("Unexpected keypoint confidence set count.")
detection_items: list[PoseDetection] = []
confidences = (
keypoint_confidence.reshape(num_keypoints, COCO_WHOLEBODY_KEYPOINT_COUNT)
if num_keypoints > 0
else np.zeros((0, COCO_WHOLEBODY_KEYPOINT_COUNT), dtype=np.float64)
)
for index in range(num_keypoints):
normalized = normalize_coco_body20(
keypoints_xy[index],
confidences[index],
keypoint_schema="coco_wholebody133",
)
bbox_score = float(bbox_confidence[index] / 255.0) if index < bbox_confidence.shape[0] else 0.0
bbox = bboxes[index] if index < bboxes.shape[0] else np.zeros(4, dtype=np.float64)
detection_items.append(
PoseDetection(
bbox=np.asarray(bbox, dtype=np.float64),
bbox_confidence=bbox_score,
keypoints=np.asarray(normalized, dtype=np.float64),
)
)
return DecodedPosePayload(
frame_index=frame_index,
reference_size=(reference_size[0], reference_size[1]),
timestamp_unix_ns=timestamp_unix_ns,
detections=tuple(detection_items),
)
@beartype
def encode_pose_payload(detections: PoseDetections) -> bytes:
detections.validate()
if detections.keypoint_schema != "coco_wholebody133":
raise ValueError(
"The cvmmap `.pose` payload currently requires `coco_wholebody133` keypoints."
)
frame_index_bytes = int(detections.frame_index).to_bytes(4, "little")
reference_size_bytes = np.asarray(detections.source_size, dtype=np.dtype("<u2")).tobytes()
num_bbox = int(detections.boxes_xyxy.shape[0])
num_bbox_bytes = num_bbox.to_bytes(1, "little")
bbox_bytes = np.ascontiguousarray(
detections.boxes_xyxy.astype(np.uint16),
dtype=np.dtype("<u2"),
).tobytes()
num_bbox_confidence_bytes = bytes([0])
bbox_confidence_bytes = bytes()
if detections.box_scores is not None:
num_bbox_confidence_bytes = int(detections.box_scores.shape[0]).to_bytes(1, "little")
bbox_confidence_bytes = np.ascontiguousarray(
np.clip(detections.box_scores * np.iinfo(np.uint8).max, 0, 255).astype(np.uint8),
dtype=np.dtype("<u1"),
).tobytes()
num_keypoints = int(detections.keypoints_xy.shape[0])
num_keypoints_bytes = num_keypoints.to_bytes(1, "little")
keypoints_bytes = np.ascontiguousarray(
detections.keypoints_xy.astype(np.uint16),
dtype=np.dtype("<u2"),
).tobytes()
num_keypoint_confidence_bytes = bytes([0])
keypoint_confidence_bytes = bytes()
if detections.keypoint_scores is not None:
num_keypoint_confidence_bytes = int(detections.keypoint_scores.shape[0]).to_bytes(1, "little")
keypoint_confidence_bytes = np.ascontiguousarray(
np.clip(detections.keypoint_scores * np.iinfo(np.uint8).max, 0, 255).astype(np.uint8),
dtype=np.dtype("<u1"),
).tobytes()
timestamp_unix_ns_bytes = int(detections.timestamp_unix_ns).to_bytes(8, "little")
return (
PROTOCOL_HEADER
+ frame_index_bytes
+ reference_size_bytes
+ num_bbox_bytes
+ bbox_bytes
+ num_bbox_confidence_bytes
+ bbox_confidence_bytes
+ num_keypoints_bytes
+ keypoints_bytes
+ num_keypoint_confidence_bytes
+ keypoint_confidence_bytes
+ timestamp_unix_ns_bytes
)
@beartype
def frame_from_payload(camera_name: str, payload: bytes) -> CameraFrame:
decoded = decode_pose_payload(payload)
return CameraFrame(
camera_name=camera_name,
frame_index=decoded.frame_index,
timestamp_unix_ns=decoded.timestamp_unix_ns,
detections=decoded.detections,
source_size=decoded.reference_size,
)
@beartype
def convert_payload_record(record: dict[str, object]) -> dict[str, object]:
camera_name = str(record["camera"])
payload_b64 = str(record["payload_b64"])
frame = frame_from_payload(camera_name, base64.b64decode(payload_b64))
return {
"camera": frame.camera_name,
"frame_index": frame.frame_index,
"timestamp_unix_ns": frame.timestamp_unix_ns,
"source_size": list(frame.source_size),
"detections": [
{
"bbox": detection.bbox.tolist(),
"bbox_confidence": detection.bbox_confidence,
"keypoints": detection.keypoints.tolist(),
}
for detection in frame.detections
],
}
@beartype
def convert_payload_jsonl_lines(lines: list[str]) -> list[str]:
output_lines: list[str] = []
for line in lines:
if not line.strip():
continue
record = json.loads(line)
converted = convert_payload_record(record)
output_lines.append(json.dumps(converted))
return output_lines
@@ -0,0 +1,3 @@
from pose_tracking_exp.detection.sources.cvmmap import CvmmapFrameSource
__all__ = ["CvmmapFrameSource"]
@@ -0,0 +1,21 @@
import click
from pose_tracking_exp.detection.config import DEFAULT_BACKEND, DetectionRunnerConfig
from pose_tracking_exp.detection.protocols import PoseShim
from pose_tracking_exp.detection.yolo_rtmpose import build_yolo_rtmpose_shim
def build_pose_shim(config: DetectionRunnerConfig) -> PoseShim:
if config.backend == DEFAULT_BACKEND:
if config.pose_config_path is None:
raise click.ClickException("pose_config_path must be resolved before building the backend.")
return build_yolo_rtmpose_shim(
yolo_checkpoint=config.yolo_checkpoint,
yolo_conf_threshold=config.yolo_conf_threshold,
pose_checkpoint=config.pose_checkpoint,
pose_config_path=config.pose_config_path,
device=config.device,
max_batch_frames=config.max_batch_frames,
bbox_area_threshold=config.bbox_area_threshold,
)
raise click.ClickException(f"Unsupported detection backend: {config.backend}")
@@ -0,0 +1,3 @@
from pose_tracking_exp.detection.sinks.nats import NatsPoseSink
__all__ = ["NatsPoseSink"]
@@ -0,0 +1,49 @@
from collections.abc import AsyncIterator, Sequence
from typing import Protocol
import numpy as np
from pose_tracking_exp.schema.detection import BoxDetections, PoseBatchRequest, PoseDetections, SourceFrame
class FrameSource(Protocol):
source_name: str
def frames(self) -> AsyncIterator[SourceFrame]:
...
class ObjectDetector(Protocol):
def detect_many(
self,
frames_rgb: Sequence[np.ndarray],
*,
classes: Sequence[int] | None = None,
) -> list[BoxDetections]:
...
class PoseEstimator(Protocol):
def estimate_batch(
self,
requests: Sequence[PoseBatchRequest],
) -> list[tuple[np.ndarray, np.ndarray]]:
...
class PoseShim(Protocol):
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
...
class PosePayloadCodec(Protocol):
def encode(self, detections: PoseDetections) -> bytes:
...
class PoseSink(Protocol):
async def publish_pose(self, detections: PoseDetections) -> None:
...
async def aclose(self) -> None:
...
+238
View File
@@ -0,0 +1,238 @@
from dataclasses import dataclass
from time import perf_counter
import anyio
from anyio.to_thread import run_sync as to_thread_run_sync
from loguru import logger
from pose_tracking_exp.detection.config import DetectionRunnerConfig
from pose_tracking_exp.detection.protocols import FrameSource, PoseShim, PoseSink
from pose_tracking_exp.schema.detection import SourceFrame
PERFORMANCE_WINDOW = 60
@dataclass(slots=True)
class PendingFrame:
source_name: str
frame: SourceFrame
@dataclass(slots=True)
class SourceSlot:
source_name: str
pending_frame: PendingFrame | None = None
last_seen_frame_index: int | None = None
received_frames: int = 0
dropped_frames: int = 0
processed_frames: int = 0
published_frames: int = 0
closed: bool = False
def store_latest_frame(slot: SourceSlot, frame: SourceFrame) -> None:
slot.received_frames += 1
if slot.pending_frame is not None:
slot.dropped_frames += 1
slot.pending_frame = PendingFrame(source_name=slot.source_name, frame=frame)
def pending_source_count(slots: dict[str, SourceSlot]) -> int:
return sum(slot.pending_frame is not None for slot in slots.values())
def take_pending_batch(
slots: dict[str, SourceSlot],
max_batch_frames: int,
) -> list[PendingFrame]:
batch: list[PendingFrame] = []
for slot in slots.values():
if slot.pending_frame is None:
continue
batch.append(slot.pending_frame)
slot.pending_frame = None
if len(batch) >= max_batch_frames:
break
return batch
def all_sources_closed_and_idle(slots: dict[str, SourceSlot]) -> bool:
return all(slot.closed and slot.pending_frame is None for slot in slots.values())
class SimpleMovingAverage:
def __init__(self, window: int) -> None:
self._window = window
self._sum = 0.0
self._size = 0
self._value: float | None = None
def next(self, value: float) -> float:
if self._size < self._window:
self._sum += value
self._size += 1
self._value = self._sum / self._size
else:
self._sum -= self._sum / self._window
self._sum += value
self._value = self._sum / self._window
return float(self._value)
def get(self) -> float | None:
return self._value
async def run_detection_runner(
sources: tuple[FrameSource, ...],
pose_shim: PoseShim,
pose_sink: PoseSink,
config: DetectionRunnerConfig,
) -> None:
performance_sma = SimpleMovingAverage(PERFORMANCE_WINDOW)
batch_size_sma = SimpleMovingAverage(PERFORMANCE_WINDOW)
scheduler_condition = anyio.Condition()
slots = {
source.source_name: SourceSlot(source_name=source.source_name) for source in sources
}
inference_limiter = anyio.CapacityLimiter(1)
async def ingest_loop(source: FrameSource) -> None:
logger.info(
"[{}] source initialized; waiting for first frame metadata",
source.source_name,
)
try:
async for frame in source.frames():
should_log_init = False
previous_frame_index: int | None = None
async with scheduler_condition:
slot = slots[source.source_name]
previous_frame_index = slot.last_seen_frame_index
should_log_init = previous_frame_index is None
slot.last_seen_frame_index = frame.frame_index
store_latest_frame(slot, frame)
scheduler_condition.notify_all()
if should_log_init:
logger.info(
"[{}] initialized with frame shape={}x{} frame_index={}",
source.source_name,
frame.image_bgr.shape[1],
frame.image_bgr.shape[0],
frame.frame_index,
)
elif previous_frame_index is not None and frame.frame_index != previous_frame_index + 1:
logger.warning(
"[{}] skip frame detected: {} -> {}",
source.source_name,
previous_frame_index,
frame.frame_index,
)
finally:
async with scheduler_condition:
slots[source.source_name].closed = True
scheduler_condition.notify_all()
logger.info("[{}] source closed", source.source_name)
async def scheduler_loop() -> None:
while True:
async with scheduler_condition:
while pending_source_count(slots) == 0:
if all_sources_closed_and_idle(slots):
return
await scheduler_condition.wait()
if (
pending_source_count(slots) < config.max_batch_frames
and config.max_batch_wait_ms > 0
and not all_sources_closed_and_idle(slots)
):
with anyio.move_on_after(config.max_batch_wait_ms / 1000):
while (
pending_source_count(slots) < config.max_batch_frames
and not all_sources_closed_and_idle(slots)
):
await scheduler_condition.wait()
batch = take_pending_batch(slots, config.max_batch_frames)
start = perf_counter()
pose_infos = await to_thread_run_sync(
pose_shim.process_many,
[item.frame for item in batch],
limiter=inference_limiter,
)
elapsed = perf_counter() - start
average_elapsed = elapsed / len(batch)
performance_sma.next(average_elapsed)
batch_size_sma.next(float(len(batch)))
if average_elapsed > config.slow_frame_budget_seconds:
logger.warning(
"slow batch: size={} total={:.2f}ms avg={:.2f}ms",
len(batch),
elapsed * 1000,
average_elapsed * 1000,
)
for pending_frame, pose_info in zip(batch, pose_infos, strict=True):
slot = slots[pending_frame.source_name]
slot.processed_frames += 1
await pose_sink.publish_pose(pose_info)
slot.published_frames += 1
if pose_info.boxes_xyxy.shape[0] == 0:
logger.debug(
"[{}:{}] no detections",
pending_frame.source_name,
pending_frame.frame.frame_index,
)
async def log_performance() -> None:
while True:
await anyio.sleep(5)
async with scheduler_condition:
if all_sources_closed_and_idle(slots):
return
slot_snapshot = {
source_name: (
slot.received_frames,
slot.dropped_frames,
slot.processed_frames,
slot.published_frames,
)
for source_name, slot in slots.items()
}
per_source = " ".join(
(
f"[{source_name}]"
f" recv={received}"
f" drop={dropped}"
f" proc={processed}"
f" pub={published}"
)
for source_name, (received, dropped, processed, published) in slot_snapshot.items()
)
if value := performance_sma.get():
batch_size = batch_size_sma.get() or 1.0
logger.info(
"{:.2f}it/s ({:.2f}ms/frame) batch={:.2f} {}",
1 / value,
value * 1000,
batch_size,
per_source,
)
else:
logger.info("warming up {}", per_source)
try:
async with anyio.create_task_group() as task_group:
for source in sources:
task_group.start_soon(ingest_loop, source)
task_group.start_soon(log_performance)
await scheduler_loop()
task_group.cancel_scope.cancel()
finally:
await pose_sink.aclose()
@@ -0,0 +1,7 @@
from pose_tracking_exp.detection.sinks.nats import NatsPoseSink
from pose_tracking_exp.detection.sinks.parquet import ParquetPoseSink
__all__ = [
"NatsPoseSink",
"ParquetPoseSink",
]
@@ -0,0 +1,29 @@
from pose_tracking_exp.detection.cvmmap_payload import CvmmapPosePayloadCodec
from pose_tracking_exp.schema.detection import PoseDetections
class NatsPoseSink:
def __init__(self, nats_host: str) -> None:
self._nats_host = nats_host
self._client = None
self._codec = CvmmapPosePayloadCodec()
async def _client_or_connect(self):
if self._client is None:
from nats.aio.client import Client as NatsClient
client = NatsClient()
await client.connect(servers=[self._nats_host])
self._client = client
return self._client
async def publish_pose(self, detections: PoseDetections) -> None:
client = await self._client_or_connect()
payload = self._codec.encode(detections)
await client.publish(f"{detections.source_name}.pose", payload)
async def aclose(self) -> None:
if self._client is None:
return
await self._client.drain()
self._client = None
@@ -0,0 +1,51 @@
from pathlib import Path
import pyarrow as pa
import pyarrow.parquet as pq
from pose_tracking_exp.common.detection_parquet import (
DETECTION_PARQUET_SCHEMA,
detection_parquet_path,
pose_detections_to_row,
)
from pose_tracking_exp.schema.detection import PoseDetections
class ParquetPoseSink:
def __init__(self, output_dir: Path, *, flush_rows: int = 64) -> None:
self._output_dir = output_dir
self._flush_rows = flush_rows
self._buffers: dict[str, list[dict[str, object]]] = {}
self._writers: dict[str, pq.ParquetWriter] = {}
self._output_dir.mkdir(parents=True, exist_ok=True)
def _writer_for(self, source_name: str) -> pq.ParquetWriter:
writer = self._writers.get(source_name)
if writer is not None:
return writer
path = detection_parquet_path(self._output_dir, source_name)
writer = pq.ParquetWriter(path, DETECTION_PARQUET_SCHEMA)
self._writers[source_name] = writer
return writer
def _flush_source(self, source_name: str) -> None:
rows = self._buffers.get(source_name)
if not rows:
return
table = pa.Table.from_pylist(rows, schema=DETECTION_PARQUET_SCHEMA)
self._writer_for(source_name).write_table(table)
rows.clear()
async def publish_pose(self, detections: PoseDetections) -> None:
rows = self._buffers.setdefault(detections.source_name, [])
rows.append(pose_detections_to_row(detections))
if len(rows) >= self._flush_rows:
self._flush_source(detections.source_name)
async def aclose(self) -> None:
for source_name in tuple(self._buffers):
self._flush_source(source_name)
for writer in self._writers.values():
writer.close()
self._writers.clear()
@@ -0,0 +1,10 @@
from pose_tracking_exp.detection.sources.adapters import IteratorFrameSource
from pose_tracking_exp.detection.sources.cvmmap import CvmmapFrameSource
from pose_tracking_exp.detection.sources.video import VideoFrameSource, parse_video_input_specs
__all__ = [
"CvmmapFrameSource",
"IteratorFrameSource",
"VideoFrameSource",
"parse_video_input_specs",
]
@@ -0,0 +1,47 @@
from collections.abc import AsyncIterator, Callable, Iterator
from typing import Protocol
from anyio.to_thread import run_sync as to_thread_run_sync
from pose_tracking_exp.schema.detection import SourceFrame
class BlockingFrameProducer(Protocol):
source_name: str
def iter_frames(self) -> Iterator[SourceFrame]:
...
def _next_or_none(iterator: Iterator[SourceFrame]) -> SourceFrame | None:
return next(iterator, None)
class IteratorFrameSource:
def __init__(
self,
source_name: str,
iterator_factory: Callable[[], Iterator[SourceFrame]],
) -> None:
self.source_name = source_name
self._iterator_factory = iterator_factory
async def frames(self) -> AsyncIterator[SourceFrame]:
iterator = self._iterator_factory()
try:
while True:
frame = await to_thread_run_sync(_next_or_none, iterator)
if frame is None:
return
yield frame
finally:
close = getattr(iterator, "close", None)
if callable(close):
await to_thread_run_sync(close)
def wrap_blocking_source(producer: BlockingFrameProducer) -> IteratorFrameSource:
return IteratorFrameSource(
source_name=producer.source_name,
iterator_factory=producer.iter_frames,
)
@@ -0,0 +1,22 @@
from collections.abc import AsyncIterator
import numpy as np
from pose_tracking_exp.schema.detection import SourceFrame
class CvmmapFrameSource:
def __init__(self, source_name: str) -> None:
self.source_name = source_name
async def frames(self) -> AsyncIterator[SourceFrame]:
from cvmmap import CvMmapClient
client = CvMmapClient(self.source_name)
async for frame, meta in client:
yield SourceFrame(
source_name=self.source_name,
image_bgr=np.array(frame, copy=True),
frame_index=meta.frame_count,
timestamp_unix_ns=meta.timestamp_ns,
)
@@ -0,0 +1,83 @@
from collections.abc import AsyncIterator, Iterator, Sequence
from pathlib import Path
import click
import cv2
import numpy as np
from pose_tracking_exp.detection.sources.adapters import wrap_blocking_source
from pose_tracking_exp.schema.detection import SourceFrame
_DEFAULT_VIDEO_FPS = 30.0
def parse_video_input_specs(specs: Sequence[str]) -> tuple[tuple[str, Path], ...]:
inputs: list[tuple[str, Path]] = []
seen: set[str] = set()
for spec in specs:
source_name, separator, raw_path = spec.partition("=")
if separator == "" or not source_name or not raw_path:
raise click.ClickException(
f"Video input must be in source=path form, got: {spec!r}"
)
if source_name in seen:
raise click.ClickException(f"Duplicate video source requested: {source_name}")
path = Path(raw_path).expanduser().resolve()
if not path.exists():
raise click.ClickException(f"Missing video input: {path}")
inputs.append((source_name, path))
seen.add(source_name)
if not inputs:
raise click.ClickException("Provide at least one --input source=path entry.")
return tuple(inputs)
class VideoFrameSource:
def __init__(
self,
video_path: Path,
*,
source_name: str | None = None,
default_fps: float = _DEFAULT_VIDEO_FPS,
) -> None:
self.video_path = video_path
self.source_name = source_name or video_path.stem
self._default_fps = default_fps
self._adapter = wrap_blocking_source(self)
async def frames(self) -> AsyncIterator[SourceFrame]:
async for frame in self._adapter.frames():
yield frame
def iter_frames(self) -> Iterator[SourceFrame]:
capture = cv2.VideoCapture(str(self.video_path))
if not capture.isOpened():
capture.release()
raise click.ClickException(f"Could not open video input: {self.video_path}")
fps = float(capture.get(cv2.CAP_PROP_FPS))
if not np.isfinite(fps) or fps <= 0:
fps = self._default_fps
frame_index = 0
try:
while True:
success, frame = capture.read()
if not success or frame is None:
return
pos_msec = float(capture.get(cv2.CAP_PROP_POS_MSEC))
if np.isfinite(pos_msec) and (pos_msec > 0.0 or frame_index == 0):
timestamp_unix_ns = int(round(pos_msec * 1_000_000.0))
else:
timestamp_unix_ns = int(round((frame_index / fps) * 1_000_000_000.0))
yield SourceFrame(
source_name=self.source_name,
image_bgr=np.ascontiguousarray(frame),
frame_index=frame_index,
timestamp_unix_ns=timestamp_unix_ns,
)
frame_index += 1
finally:
capture.release()
@@ -0,0 +1,263 @@
from contextlib import contextmanager
from collections.abc import Sequence
from pathlib import Path
from typing import Any, cast
import cv2
import numpy as np
from pose_tracking_exp.detection.protocols import ObjectDetector, PoseEstimator
from pose_tracking_exp.schema.detection import BoxDetections, PoseBatchRequest, PoseDetections, SourceFrame
COCO_PERSON_CLASS_ID = 0
class YoloObjectDetector:
def __init__(
self,
checkpoint: Path,
*,
device: str,
conf_threshold: float,
max_batch_frames: int,
) -> None:
import ultralytics
yolo_ctor = getattr(ultralytics, "YOLO")
self._model: Any = yolo_ctor(str(checkpoint))
self._device = device
self._conf_threshold = conf_threshold
self._max_batch_frames = max_batch_frames
def detect_many(
self,
frames_rgb: Sequence[np.ndarray],
*,
classes: Sequence[int] | None = None,
) -> list[BoxDetections]:
if not frames_rgb:
return []
frames_list = list(frames_rgb)
results = self._model(
frames_list,
conf=self._conf_threshold,
device=self._device,
classes=classes,
batch=min(self._max_batch_frames, len(frames_list)),
verbose=False,
)
detections: list[BoxDetections] = []
for frame_rgb, result in zip(frames_list, results, strict=True):
boxes = result.boxes
if boxes is None:
detections.append(
BoxDetections(
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
scores=np.empty((0,), dtype=np.float32),
reference_frame_shape=(frame_rgb.shape[0], frame_rgb.shape[1]),
)
)
continue
detections.append(
BoxDetections(
boxes_xyxy=boxes.xyxy.cpu().numpy(),
scores=boxes.conf.cpu().numpy(),
reference_frame_shape=(frame_rgb.shape[0], frame_rgb.shape[1]),
)
)
return detections
@contextmanager
def legacy_torch_checkpoint_loading():
import torch
original_torch_load = torch.load
def patched_torch_load(*args, **kwargs):
kwargs.setdefault("weights_only", False)
return original_torch_load(*args, **kwargs)
torch.load = patched_torch_load
try:
yield
finally:
torch.load = original_torch_load
class WholeBodyPoseEstimator:
def __init__(self, config_path: Path, checkpoint_path: Path, *, device: str) -> None:
from mmengine.dataset import Compose, pseudo_collate
from mmengine.registry import init_default_scope
from mmpose.apis import init_model
self._compose = Compose
self._pseudo_collate = pseudo_collate
self._init_default_scope = init_default_scope
with legacy_torch_checkpoint_loading():
self._model: Any = init_model(str(config_path), str(checkpoint_path), device=device)
model_cfg = cast(Any, self._model.cfg)
self._scope = cast(str | None, model_cfg.get("default_scope", "mmpose"))
self._pipeline = self._compose(cast(Any, model_cfg.test_dataloader.dataset.pipeline))
def estimate_batch(
self,
requests: Sequence[PoseBatchRequest],
) -> list[tuple[np.ndarray, np.ndarray]]:
import torch
if not requests:
return []
if self._scope is not None:
self._init_default_scope(self._scope)
torch_module = cast(Any, torch)
data_list = []
detection_counts: list[int] = []
for request in requests:
boxes = np.asarray(request.boxes_xyxy, dtype=np.float32)
detections = int(boxes.shape[0])
detection_counts.append(detections)
for bbox in boxes:
data_info = {
"img": request.image_rgb,
"bbox": bbox[None],
"bbox_score": np.ones(1, dtype=np.float32),
}
data_info.update(cast(Any, self._model.dataset_meta))
data_list.append(self._pipeline(data_info))
samples = []
if data_list:
batch = self._pseudo_collate(data_list)
with torch_module.no_grad():
samples = self._model.test_step(batch)
outputs: list[tuple[np.ndarray, np.ndarray]] = []
offset = 0
for detections in detection_counts:
keypoints = np.zeros((detections, 133, 2), dtype=np.float32)
scores = np.zeros((detections, 133), dtype=np.float32)
for index in range(detections):
pred_instances = samples[offset + index].pred_instances
try:
keypoints[index] = np.asarray(pred_instances.keypoints[0], dtype=np.float32)
scores[index] = np.asarray(
pred_instances.keypoint_scores[0],
dtype=np.float32,
)
except IndexError:
continue
outputs.append((keypoints, scores))
offset += detections
return outputs
class YoloRtmposeShim:
def __init__(
self,
object_detector: ObjectDetector,
pose_estimator: PoseEstimator,
*,
bbox_area_threshold: int,
) -> None:
self._object_detector = object_detector
self._pose_estimator = pose_estimator
self._bbox_area_threshold = bbox_area_threshold
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
if not frames:
return []
frames_rgb = [
cv2.cvtColor(frame.image_bgr, cv2.COLOR_BGR2RGB)
for frame in frames
]
detections = self._object_detector.detect_many(
frames_rgb,
classes=[COCO_PERSON_CLASS_ID],
)
results = [
PoseDetections(
source_name=frame.source_name,
frame_index=frame.frame_index,
source_size=(frame.image_bgr.shape[1], frame.image_bgr.shape[0]),
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
box_scores=np.empty((0,), dtype=np.float32),
keypoints_xy=np.empty((0, 133, 2), dtype=np.float32),
keypoint_scores=np.empty((0, 133), dtype=np.float32),
timestamp_unix_ns=frame.timestamp_unix_ns,
keypoint_schema="coco_wholebody133",
)
for frame in frames
]
pose_requests: list[PoseBatchRequest] = []
detection_mapping: list[tuple[int, BoxDetections]] = []
for index, (frame, frame_rgb, detection_result) in enumerate(
zip(frames, frames_rgb, detections, strict=True)
):
filtered_result = detection_result.filter_by_area(self._bbox_area_threshold)
if filtered_result.boxes_num == 0:
continue
pose_requests.append(
PoseBatchRequest(
image_rgb=frame_rgb,
boxes_xyxy=filtered_result.boxes_xyxy,
)
)
detection_mapping.append((index, filtered_result))
pose_outputs = self._pose_estimator.estimate_batch(pose_requests)
for (frame_index, detection_result), (keypoints, keypoint_scores) in zip(
detection_mapping,
pose_outputs,
strict=True,
):
source_frame = frames[frame_index]
results[frame_index] = PoseDetections(
source_name=source_frame.source_name,
frame_index=source_frame.frame_index,
source_size=detection_result.reference_size,
boxes_xyxy=detection_result.boxes_xyxy,
box_scores=detection_result.scores,
keypoints_xy=keypoints,
keypoint_scores=keypoint_scores,
timestamp_unix_ns=source_frame.timestamp_unix_ns,
keypoint_schema="coco_wholebody133",
)
return results
def build_yolo_rtmpose_shim(
*,
yolo_checkpoint: Path,
yolo_conf_threshold: float,
pose_checkpoint: Path,
pose_config_path: Path,
device: str,
max_batch_frames: int,
bbox_area_threshold: int,
) -> YoloRtmposeShim:
object_detector = YoloObjectDetector(
yolo_checkpoint,
device=device,
conf_threshold=yolo_conf_threshold,
max_batch_frames=max_batch_frames,
)
pose_estimator = WholeBodyPoseEstimator(
pose_config_path,
pose_checkpoint,
device=device,
)
return YoloRtmposeShim(
object_detector,
pose_estimator,
bbox_area_threshold=bbox_area_threshold,
)
-224
View File
@@ -1,224 +0,0 @@
from dataclasses import dataclass, field
from pathlib import Path
from typing import Literal
import cv2
import numpy as np
from pose_tracking_exp.tensor_types import Matrix3, Pose2D, Pose3D, Vector3
@dataclass(slots=True)
class CameraCalibration:
name: str
width: int
height: int
K: Matrix3
DC: np.ndarray
# Canonical in-repo convention: OpenCV world->camera extrinsics.
R: Matrix3
T: Vector3
model: str = "pinhole"
rvec: np.ndarray | None = None
pose_R: Matrix3 = field(init=False)
pose_T: Vector3 = field(init=False)
def __post_init__(self) -> None:
self.K = np.asarray(self.K, dtype=np.float64).reshape(3, 3)
self.DC = np.asarray(self.DC, dtype=np.float64).reshape(-1)
self.R = np.asarray(self.R, dtype=np.float64).reshape(3, 3)
self.T = np.asarray(self.T, dtype=np.float64).reshape(3)
if self.rvec is None:
rvec, _ = cv2.Rodrigues(self.R)
self.rvec = np.asarray(rvec, dtype=np.float64).reshape(3)
else:
self.rvec = np.asarray(self.rvec, dtype=np.float64).reshape(3)
self.pose_R = self.R.T
self.pose_T = -(self.pose_R @ self.T)
@classmethod
def from_opencv_extrinsics(
cls,
*,
name: str,
width: int,
height: int,
K: Matrix3,
DC: np.ndarray,
R: Matrix3,
T: Vector3,
model: str = "pinhole",
rvec: np.ndarray | None = None,
) -> "CameraCalibration":
return cls(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
rvec=rvec,
)
@classmethod
def from_rpt_pose(
cls,
*,
name: str,
width: int,
height: int,
K: Matrix3,
DC: np.ndarray,
R: Matrix3,
T: Vector3,
model: str = "pinhole",
) -> "CameraCalibration":
pose_R = np.asarray(R, dtype=np.float64).reshape(3, 3)
pose_T = np.asarray(T, dtype=np.float64).reshape(3)
rotation = pose_R.T
translation = -(rotation @ pose_T)
rvec, _ = cv2.Rodrigues(rotation)
return cls(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=rotation,
T=translation,
model=model,
rvec=np.asarray(rvec, dtype=np.float64).reshape(3),
)
@dataclass(slots=True)
class SceneConfig:
room_size: Vector3
room_center: Vector3
cameras: tuple[CameraCalibration, ...]
@dataclass(slots=True)
class PoseDetection:
bbox: np.ndarray
bbox_confidence: float
keypoints: Pose2D
@dataclass(slots=True)
class CameraFrame:
camera_name: str
frame_index: int
timestamp_unix_ns: int
detections: tuple[PoseDetection, ...]
source_size: tuple[int, int]
@dataclass(slots=True)
class FrameBundle:
bundle_index: int
timestamp_unix_ns: int
views: tuple[CameraFrame, ...]
@dataclass(slots=True)
class ReplaySequence:
scene_path: Path
replay_path: Path
frames_by_camera: dict[str, list[CameraFrame]]
@dataclass(slots=True)
class ProposalCluster:
pose3d: Pose3D
root: Vector3
source_views: frozenset[str]
support_size: int
mean_score: float
@dataclass(slots=True)
class SkeletonState:
parameters: np.ndarray
beta: np.ndarray
pose3d: Pose3D
@dataclass(slots=True)
class TentativeTrackState:
track_id: int
state: Literal["tentative"] = "tentative"
age: int = 0
misses: int = 0
score: float = 0.0
last_bundle_index: int = -1
root: Vector3 = field(default_factory=lambda: np.zeros(3, dtype=np.float64))
pose3d: Pose3D = field(default_factory=lambda: np.zeros((20, 4), dtype=np.float64))
evidence_buffer: list[Pose3D] = field(default_factory=list)
@dataclass(slots=True)
class ActiveTrackState:
track_id: int
status: Literal["active", "lost"] = "active"
misses: int = 0
lost_age: int = 0
score: float = 0.0
last_bundle_index: int = -1
skeleton: SkeletonState = field(
default_factory=lambda: SkeletonState(
parameters=np.zeros(31, dtype=np.float64),
beta=np.ones(8, dtype=np.float64),
pose3d=np.zeros((20, 4), dtype=np.float64),
)
)
noise_scale: np.ndarray = field(default_factory=lambda: np.full((20,), 9.0, dtype=np.float64))
TrackState = TentativeTrackState | ActiveTrackState
@dataclass(slots=True)
class TrackedFrameResult:
bundle_index: int
timestamp_unix_ns: int
tentative_tracks: tuple[TentativeTrackState, ...]
active_tracks: tuple[ActiveTrackState, ...]
lost_tracks: tuple[ActiveTrackState, ...]
proposals: tuple[ProposalCluster, ...]
@dataclass(slots=True)
class TrackerDiagnostics:
match_existing_calls: int = 0
match_existing_seconds: float = 0.0
proposal_build_calls: int = 0
proposal_build_seconds: float = 0.0
promotions: int = 0
reacquisitions: int = 0
active_updates: int = 0
seed_initializations: int = 0
nonlinear_refinements: int = 0
@dataclass(slots=True)
class TrackerConfig:
mode: Literal["general", "single_person"] = "general"
min_bundle_views: int = 2
max_sync_skew_ns: int = 12_000_000
tentative_buffer_size: int = 5
tentative_min_age: int = 3
tentative_hits_required: int = 3
tentative_promote_score: float = 3.0
tentative_max_misses: int = 2
active_min_views: int = 2
active_core_gate_px: float = 80.0
active_joint_gate_px: float = 120.0
active_miss_to_lost: int = 3
lost_delete_age: int = 15
proposal_match_distance_m: float = 0.45
noise_ema: float = 0.85
proposal_min_score: float = 0.9
proposal_min_group_size: int = 1
-147
View File
@@ -1,147 +0,0 @@
import base64
import json
from dataclasses import dataclass
import numpy as np
from beartype import beartype
from pose_tracking_exp.models import CameraFrame, PoseDetection
from pose_tracking_exp.normalization import normalize_rtmpose_body20
PROTOCOL_HEADER = bytes([0x80]) + b"POSE"
POSE_JOINT_COUNT = 133
@dataclass(slots=True)
class DecodedPosePayload:
frame_index: int
reference_size: tuple[int, int]
timestamp_unix_ns: int
detections: tuple[PoseDetection, ...]
def _read_u8(payload: memoryview, offset: int) -> tuple[int, int]:
return int(payload[offset]), offset + 1
def _read_u16_array(payload: memoryview, offset: int, count: int) -> tuple[np.ndarray, int]:
size = count * 2
array = np.frombuffer(payload[offset : offset + size], dtype="<u2", count=count).astype(np.float64)
return array, offset + size
@beartype
def decode_pose_payload(payload: bytes) -> DecodedPosePayload:
if not payload.startswith(PROTOCOL_HEADER):
raise ValueError("Invalid ParaJumping pose payload header.")
view = memoryview(payload)
offset = len(PROTOCOL_HEADER)
frame_index = int.from_bytes(view[offset : offset + 4], "little")
offset += 4
reference_size = tuple(int(x) for x in np.frombuffer(view[offset : offset + 4], dtype="<u2", count=2))
offset += 4
num_bbox = int(view[offset])
offset += 1
bbox_raw, offset = _read_u16_array(view, offset, num_bbox * 4)
bboxes = bbox_raw.reshape(num_bbox, 4) if num_bbox > 0 else np.zeros((0, 4), dtype=np.float64)
num_bbox_conf = int(view[offset])
offset += 1
bbox_confidence = np.frombuffer(view[offset : offset + num_bbox_conf], dtype=np.uint8, count=num_bbox_conf)
offset += num_bbox_conf
num_keypoints = int(view[offset])
offset += 1
keypoints_raw, offset = _read_u16_array(view, offset, num_keypoints * POSE_JOINT_COUNT * 2)
keypoints_xy = (
keypoints_raw.reshape(num_keypoints, POSE_JOINT_COUNT, 2)
if num_keypoints > 0
else np.zeros((0, POSE_JOINT_COUNT, 2), dtype=np.float64)
)
num_keypoint_conf = int(view[offset])
offset += 1
keypoint_confidence = (
np.frombuffer(view[offset : offset + num_keypoint_conf], dtype=np.uint8, count=num_keypoint_conf).astype(np.float64)
/ 255.0
)
offset += num_keypoint_conf
timestamp_unix_ns = int.from_bytes(view[offset : offset + 8], "little")
if num_keypoint_conf > 0 and num_keypoint_conf != num_keypoints * POSE_JOINT_COUNT:
raise ValueError("Unexpected keypoint confidence payload length.")
detection_items: list[PoseDetection] = []
confidences = (
keypoint_confidence.reshape(num_keypoints, POSE_JOINT_COUNT)
if num_keypoints > 0
else np.zeros((0, POSE_JOINT_COUNT), dtype=np.float64)
)
for index in range(num_keypoints):
normalized = normalize_rtmpose_body20(keypoints_xy[index], confidences[index])
bbox_score = float(bbox_confidence[index] / 255.0) if index < bbox_confidence.shape[0] else 0.0
bbox = bboxes[index] if index < bboxes.shape[0] else np.zeros(4, dtype=np.float64)
detection_items.append(
PoseDetection(
bbox=np.asarray(bbox, dtype=np.float64),
bbox_confidence=bbox_score,
keypoints=np.asarray(normalized, dtype=np.float64),
)
)
return DecodedPosePayload(
frame_index=frame_index,
reference_size=(reference_size[0], reference_size[1]),
timestamp_unix_ns=timestamp_unix_ns,
detections=tuple(detection_items),
)
@beartype
def frame_from_payload(camera_name: str, payload: bytes) -> CameraFrame:
decoded = decode_pose_payload(payload)
return CameraFrame(
camera_name=camera_name,
frame_index=decoded.frame_index,
timestamp_unix_ns=decoded.timestamp_unix_ns,
detections=decoded.detections,
source_size=decoded.reference_size,
)
@beartype
def convert_payload_record(record: dict[str, object]) -> dict[str, object]:
camera_name = str(record["camera"])
payload_b64 = str(record["payload_b64"])
frame = frame_from_payload(camera_name, base64.b64decode(payload_b64))
return {
"camera": frame.camera_name,
"frame_index": frame.frame_index,
"timestamp_unix_ns": frame.timestamp_unix_ns,
"source_size": list(frame.source_size),
"detections": [
{
"bbox": detection.bbox.tolist(),
"bbox_confidence": detection.bbox_confidence,
"keypoints": detection.keypoints.tolist(),
}
for detection in frame.detections
],
}
@beartype
def convert_payload_jsonl_lines(lines: list[str]) -> list[str]:
output_lines: list[str] = []
for line in lines:
if not line.strip():
continue
record = json.loads(line)
converted = convert_payload_record(record)
output_lines.append(json.dumps(converted))
return output_lines
-108
View File
@@ -1,108 +0,0 @@
import json
from pathlib import Path
import numpy as np
from beartype import beartype
from pose_tracking_exp.models import CameraCalibration, CameraFrame, PoseDetection, ReplaySequence, SceneConfig
_OPENCV_EXTRINSICS = "opencv_world_to_camera"
_RPT_POSE = "rpt_camera_pose"
def _as_float_array(values: object, shape: tuple[int, ...]) -> np.ndarray:
array = np.asarray(values, dtype=np.float64)
if array.shape != shape:
raise ValueError(f"Expected shape {shape}, got {array.shape}.")
return array
@beartype
def load_scene_file(path: Path) -> SceneConfig:
payload = json.loads(path.read_text(encoding="utf-8"))
default_extrinsic_format = str(payload.get("extrinsic_format", _OPENCV_EXTRINSICS))
cameras: list[CameraCalibration] = []
for camera_payload in payload["cameras"]:
extrinsic_format = str(camera_payload.get("extrinsic_format", default_extrinsic_format))
name = str(camera_payload["name"])
width = int(camera_payload["width"])
height = int(camera_payload["height"])
K = _as_float_array(camera_payload["K"], (3, 3))
DC = np.asarray(camera_payload.get("DC", [0.0, 0.0, 0.0, 0.0, 0.0]), dtype=np.float64)
R = _as_float_array(camera_payload["R"], (3, 3))
T = _as_float_array(camera_payload["T"], (3, 1)).reshape(3)
model = str(camera_payload.get("model", "pinhole"))
if extrinsic_format == _OPENCV_EXTRINSICS:
cameras.append(
CameraCalibration.from_opencv_extrinsics(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
rvec=np.asarray(camera_payload["rvec"], dtype=np.float64).reshape(3)
if "rvec" in camera_payload
else None,
)
)
elif extrinsic_format == _RPT_POSE:
cameras.append(
CameraCalibration.from_rpt_pose(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
)
)
else:
raise ValueError(
f"Unsupported extrinsic format {extrinsic_format!r}. "
f"Expected {_OPENCV_EXTRINSICS!r} or {_RPT_POSE!r}."
)
return SceneConfig(
room_size=_as_float_array(payload["room_size"], (3,)),
room_center=_as_float_array(payload["room_center"], (3,)),
cameras=tuple(cameras),
)
@beartype
def load_replay_file(scene_path: Path, replay_path: Path) -> ReplaySequence:
frames_by_camera: dict[str, list[CameraFrame]] = {}
for raw_line in replay_path.read_text(encoding="utf-8").splitlines():
if not raw_line.strip():
continue
payload = json.loads(raw_line)
camera_name = str(payload["camera"])
detections: list[PoseDetection] = []
for detection_payload in payload["detections"]:
detections.append(
PoseDetection(
bbox=np.asarray(detection_payload["bbox"], dtype=np.float64),
bbox_confidence=float(detection_payload["bbox_confidence"]),
keypoints=np.asarray(detection_payload["keypoints"], dtype=np.float64),
)
)
frames_by_camera.setdefault(camera_name, []).append(
CameraFrame(
camera_name=camera_name,
frame_index=int(payload["frame_index"]),
timestamp_unix_ns=int(payload["timestamp_unix_ns"]),
detections=tuple(detections),
source_size=(
int(payload["source_size"][0]),
int(payload["source_size"][1]),
),
)
)
for frames in frames_by_camera.values():
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
return ReplaySequence(scene_path=scene_path, replay_path=replay_path, frames_by_camera=frames_by_camera)
+50
View File
@@ -0,0 +1,50 @@
from pose_tracking_exp.schema.camera import (
CameraCalibration,
CameraModel,
PINHOLE_CAMERA_MODEL,
SceneConfig,
parse_camera_model,
)
from pose_tracking_exp.schema.detection import (
BoxDetections,
CocoKeypointSchema,
PoseBatchRequest,
PoseDetections,
SourceFrame,
)
from pose_tracking_exp.schema.observation import CameraFrame, FrameBundle, PoseDetection, ReplaySequence
from pose_tracking_exp.schema.tracking import (
ActiveTrackState,
ProposalCluster,
SkeletonState,
TentativeTrackState,
TrackState,
TrackerConfig,
TrackerDiagnostics,
TrackedFrameResult,
)
__all__ = [
"ActiveTrackState",
"BoxDetections",
"CameraCalibration",
"CameraFrame",
"CameraModel",
"CocoKeypointSchema",
"FrameBundle",
"PINHOLE_CAMERA_MODEL",
"PoseBatchRequest",
"PoseDetection",
"PoseDetections",
"ProposalCluster",
"ReplaySequence",
"SceneConfig",
"SkeletonState",
"TentativeTrackState",
"TrackState",
"TrackerConfig",
"TrackerDiagnostics",
"TrackedFrameResult",
"SourceFrame",
"parse_camera_model",
]
+106
View File
@@ -0,0 +1,106 @@
from dataclasses import dataclass, field
from typing import Literal
import cv2
import numpy as np
from pose_tracking_exp.common.tensor_types import Matrix3, Vector3
CameraModel = Literal["pinhole"]
PINHOLE_CAMERA_MODEL: CameraModel = "pinhole"
def parse_camera_model(model: str) -> CameraModel:
if model != PINHOLE_CAMERA_MODEL:
raise ValueError(
f"Unsupported camera model {model!r}. Expected {PINHOLE_CAMERA_MODEL!r}."
)
return PINHOLE_CAMERA_MODEL
@dataclass(slots=True)
class CameraCalibration:
name: str
width: int
height: int
K: Matrix3
DC: np.ndarray
R: Matrix3
T: Vector3
model: CameraModel = PINHOLE_CAMERA_MODEL
rvec: np.ndarray | None = None
pose_R: Matrix3 = field(init=False)
pose_T: Vector3 = field(init=False)
def __post_init__(self) -> None:
self.K = np.asarray(self.K, dtype=np.float64).reshape(3, 3)
self.DC = np.asarray(self.DC, dtype=np.float64).reshape(-1)
self.R = np.asarray(self.R, dtype=np.float64).reshape(3, 3)
self.T = np.asarray(self.T, dtype=np.float64).reshape(3)
self.model = parse_camera_model(self.model)
if self.rvec is None:
rvec, _ = cv2.Rodrigues(self.R)
self.rvec = np.asarray(rvec, dtype=np.float64).reshape(3)
else:
self.rvec = np.asarray(self.rvec, dtype=np.float64).reshape(3)
self.pose_R = self.R.T
self.pose_T = -(self.pose_R @ self.T)
@staticmethod
def from_opencv_extrinsics(
name: str,
width: int,
height: int,
K: Matrix3,
DC: np.ndarray,
R: Matrix3,
T: Vector3,
model: CameraModel = PINHOLE_CAMERA_MODEL,
rvec: np.ndarray | None = None,
) -> "CameraCalibration":
return CameraCalibration(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
rvec=rvec,
)
@staticmethod
def from_rpt_pose(
name: str,
width: int,
height: int,
K: Matrix3,
DC: np.ndarray,
R: Matrix3,
T: Vector3,
model: CameraModel = PINHOLE_CAMERA_MODEL,
) -> "CameraCalibration":
pose_R = np.asarray(R, dtype=np.float64).reshape(3, 3)
pose_T = np.asarray(T, dtype=np.float64).reshape(3)
rotation = pose_R.T
translation = -(rotation @ pose_T)
rvec, _ = cv2.Rodrigues(rotation)
return CameraCalibration(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=rotation,
T=translation,
model=model,
rvec=np.asarray(rvec, dtype=np.float64).reshape(3),
)
@dataclass(slots=True)
class SceneConfig:
room_size: Vector3
room_center: Vector3
cameras: tuple[CameraCalibration, ...]
+116
View File
@@ -0,0 +1,116 @@
"""Shared 2D detection schema.
`coco_wholebody133` matches the COCO-WholeBody dataset terminology used by
MMPose and the official dataset repo. The first 17 joints follow the standard
COCO body ordering, so it is body-compatible with `coco17`.
References:
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
- https://github.com/jin-s13/COCO-WholeBody
"""
from dataclasses import dataclass
from typing import Literal
import numpy as np
CocoKeypointSchema = Literal["coco17", "coco_wholebody133"]
def expected_keypoint_count(schema: CocoKeypointSchema) -> int:
if schema == "coco17":
return 17
return 133
@dataclass(slots=True)
class SourceFrame:
source_name: str
image_bgr: np.ndarray
frame_index: int
timestamp_unix_ns: int
@dataclass(slots=True)
class BoxDetections:
boxes_xyxy: np.ndarray
scores: np.ndarray
reference_frame_shape: tuple[int, int]
@property
def reference_size(self) -> tuple[int, int]:
return (self.reference_frame_shape[1], self.reference_frame_shape[0])
@property
def boxes_num(self) -> int:
return int(self.boxes_xyxy.shape[0])
def filter_by_area(self, area_threshold: int) -> "BoxDetections":
if area_threshold <= 0:
raise ValueError("Area threshold must be positive.")
areas = np.abs(
(self.boxes_xyxy[:, 2] - self.boxes_xyxy[:, 0])
* (self.boxes_xyxy[:, 3] - self.boxes_xyxy[:, 1])
)
mask = areas >= area_threshold
return BoxDetections(
boxes_xyxy=self.boxes_xyxy[mask],
scores=self.scores[mask],
reference_frame_shape=self.reference_frame_shape,
)
@dataclass(slots=True)
class PoseBatchRequest:
image_rgb: np.ndarray
boxes_xyxy: np.ndarray
@dataclass(slots=True)
class PoseDetections:
source_name: str
frame_index: int
source_size: tuple[int, int]
boxes_xyxy: np.ndarray
box_scores: np.ndarray | None
keypoints_xy: np.ndarray
keypoint_scores: np.ndarray | None
timestamp_unix_ns: int
keypoint_schema: CocoKeypointSchema = "coco_wholebody133"
def validate(self) -> None:
if self.boxes_xyxy.ndim != 2 or self.boxes_xyxy.shape[1] != 4:
raise ValueError(
f"Expected boxes with shape (N, 4), got {self.boxes_xyxy.shape}."
)
if self.keypoints_xy.ndim != 3 or self.keypoints_xy.shape[2] != 2:
raise ValueError(
"Expected keypoints with shape (N, K, 2), "
f"got {self.keypoints_xy.shape}."
)
expected_count = expected_keypoint_count(self.keypoint_schema)
if self.keypoints_xy.shape[1] != expected_count:
raise ValueError(
f"Expected {self.keypoint_schema} keypoints with {expected_count} joints, "
f"got {self.keypoints_xy.shape[1]}."
)
detection_count = int(self.keypoints_xy.shape[0])
if self.boxes_xyxy.shape[0] != detection_count:
raise ValueError(
"Expected box and keypoint detection counts to match, "
f"got {self.boxes_xyxy.shape[0]} and {detection_count}."
)
if self.box_scores is not None and self.box_scores.shape != (detection_count,):
raise ValueError(
f"Expected box scores with shape ({detection_count},), got {self.box_scores.shape}."
)
if self.keypoint_scores is not None and self.keypoint_scores.shape != (
detection_count,
expected_count,
):
raise ValueError(
"Expected keypoint scores with shape "
f"({detection_count}, {expected_count}), got {self.keypoint_scores.shape}."
)
@@ -0,0 +1,36 @@
from dataclasses import dataclass
from pathlib import Path
import numpy as np
from pose_tracking_exp.common.tensor_types import Pose2D
@dataclass(slots=True)
class PoseDetection:
bbox: np.ndarray
bbox_confidence: float
keypoints: Pose2D
@dataclass(slots=True)
class CameraFrame:
camera_name: str
frame_index: int
timestamp_unix_ns: int
detections: tuple[PoseDetection, ...]
source_size: tuple[int, int]
@dataclass(slots=True)
class FrameBundle:
bundle_index: int
timestamp_unix_ns: int
views: tuple[CameraFrame, ...]
@dataclass(slots=True)
class ReplaySequence:
scene_path: Path
replay_path: Path
frames_by_camera: dict[str, list[CameraFrame]]
+102
View File
@@ -0,0 +1,102 @@
from dataclasses import dataclass, field
from typing import Literal
import numpy as np
from pose_tracking_exp.common.tensor_types import Pose3D, Vector3
@dataclass(slots=True)
class ProposalCluster:
pose3d: Pose3D
root: Vector3
source_views: frozenset[str]
support_size: int
mean_score: float
@dataclass(slots=True)
class SkeletonState:
parameters: np.ndarray
beta: np.ndarray
pose3d: Pose3D
@dataclass(slots=True)
class TentativeTrackState:
track_id: int
state: Literal["tentative"] = "tentative"
age: int = 0
misses: int = 0
score: float = 0.0
last_bundle_index: int = -1
root: Vector3 = field(default_factory=lambda: np.zeros(3, dtype=np.float64))
pose3d: Pose3D = field(default_factory=lambda: np.zeros((20, 4), dtype=np.float64))
evidence_buffer: list[Pose3D] = field(default_factory=list)
@dataclass(slots=True)
class ActiveTrackState:
track_id: int
status: Literal["active", "lost"] = "active"
misses: int = 0
lost_age: int = 0
score: float = 0.0
last_bundle_index: int = -1
skeleton: SkeletonState = field(
default_factory=lambda: SkeletonState(
parameters=np.zeros(31, dtype=np.float64),
beta=np.ones(8, dtype=np.float64),
pose3d=np.zeros((20, 4), dtype=np.float64),
)
)
noise_scale: np.ndarray = field(
default_factory=lambda: np.full((20,), 9.0, dtype=np.float64)
)
TrackState = TentativeTrackState | ActiveTrackState
@dataclass(slots=True)
class TrackedFrameResult:
bundle_index: int
timestamp_unix_ns: int
tentative_tracks: tuple[TentativeTrackState, ...]
active_tracks: tuple[ActiveTrackState, ...]
lost_tracks: tuple[ActiveTrackState, ...]
proposals: tuple[ProposalCluster, ...]
@dataclass(slots=True)
class TrackerDiagnostics:
match_existing_calls: int = 0
match_existing_seconds: float = 0.0
proposal_build_calls: int = 0
proposal_build_seconds: float = 0.0
promotions: int = 0
reacquisitions: int = 0
active_updates: int = 0
seed_initializations: int = 0
nonlinear_refinements: int = 0
@dataclass(slots=True)
class TrackerConfig:
max_active_tracks: int | None = None
min_bundle_views: int = 2
max_sync_skew_ns: int = 12_000_000
tentative_buffer_size: int = 5
tentative_min_age: int = 3
tentative_hits_required: int = 3
tentative_promote_score: float = 3.0
tentative_max_misses: int = 2
active_min_views: int = 2
active_core_gate_px: float = 80.0
active_joint_gate_px: float = 120.0
active_miss_to_lost: int = 3
lost_delete_age: int = 15
proposal_match_distance_m: float = 0.45
noise_ema: float = 0.85
proposal_min_score: float = 0.9
proposal_min_group_size: int = 1
@@ -0,0 +1,15 @@
from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview
from pose_tracking_exp.tracking.replay_io import load_parquet_replay_dir, load_replay_file, load_scene_file
from pose_tracking_exp.tracking.sync import synchronize_frames
from pose_tracking_exp.tracking.tracker import PoseTracker
__all__ = [
"PoseTracker",
"load_parquet_replay_dir",
"load_replay_file",
"load_scene_file",
"seed_state_from_pose3d",
"synchronize_frames",
"update_noise_scale",
"update_state_from_multiview",
]
@@ -4,10 +4,10 @@ import numpy as np
from beartype import beartype from beartype import beartype
from scipy.optimize import least_squares from scipy.optimize import least_squares
from pose_tracking_exp.camera_math import project_pose from pose_tracking_exp.common.camera_math import project_pose
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.models import CameraCalibration, PoseDetection, SkeletonState from pose_tracking_exp.common.tensor_types import Pose3D
from pose_tracking_exp.tensor_types import Pose3D from pose_tracking_exp.schema import CameraCalibration, PoseDetection, SkeletonState
PARAMETER_DIMENSION = 31 PARAMETER_DIMENSION = 31
SHAPE_DIMENSION = 8 SHAPE_DIMENSION = 8
+221
View File
@@ -0,0 +1,221 @@
import json
from pathlib import Path
from typing import cast
import numpy as np
import pyarrow.parquet as pq
from beartype import beartype
from pose_tracking_exp.common.detection_parquet import DETECTED_PARQUET_SUFFIX
from pose_tracking_exp.common.normalization import infer_bbox_from_keypoints, normalize_coco_body20
from pose_tracking_exp.schema import (
CameraCalibration,
CameraFrame,
CocoKeypointSchema,
PoseDetection,
ReplaySequence,
SceneConfig,
parse_camera_model,
)
_OPENCV_EXTRINSICS = "opencv_world_to_camera"
_RPT_POSE = "rpt_camera_pose"
def _as_float_array(values: object, shape: tuple[int, ...]) -> np.ndarray:
array = np.asarray(values, dtype=np.float64)
if array.shape != shape:
raise ValueError(f"Expected shape {shape}, got {array.shape}.")
return array
@beartype
def load_scene_file(path: Path) -> SceneConfig:
payload = json.loads(path.read_text(encoding="utf-8"))
default_extrinsic_format = str(payload.get("extrinsic_format", _OPENCV_EXTRINSICS))
cameras: list[CameraCalibration] = []
for camera_payload in payload["cameras"]:
extrinsic_format = str(
camera_payload.get("extrinsic_format", default_extrinsic_format)
)
name = str(camera_payload["name"])
width = int(camera_payload["width"])
height = int(camera_payload["height"])
K = _as_float_array(camera_payload["K"], (3, 3))
DC = np.asarray(
camera_payload.get("DC", [0.0, 0.0, 0.0, 0.0, 0.0]), dtype=np.float64
)
R = _as_float_array(camera_payload["R"], (3, 3))
T = _as_float_array(camera_payload["T"], (3, 1)).reshape(3)
model = parse_camera_model(camera_payload.get("model", "pinhole"))
if extrinsic_format == _OPENCV_EXTRINSICS:
cameras.append(
CameraCalibration.from_opencv_extrinsics(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
rvec=np.asarray(camera_payload["rvec"], dtype=np.float64).reshape(3)
if "rvec" in camera_payload
else None,
)
)
elif extrinsic_format == _RPT_POSE:
cameras.append(
CameraCalibration.from_rpt_pose(
name=name,
width=width,
height=height,
K=K,
DC=DC,
R=R,
T=T,
model=model,
)
)
else:
raise ValueError(
f"Unsupported extrinsic format {extrinsic_format!r}. "
f"Expected {_OPENCV_EXTRINSICS!r} or {_RPT_POSE!r}."
)
return SceneConfig(
room_size=_as_float_array(payload["room_size"], (3,)),
room_center=_as_float_array(payload["room_center"], (3,)),
cameras=tuple(cameras),
)
@beartype
def load_replay_file(scene_path: Path, replay_path: Path) -> ReplaySequence:
if replay_path.is_dir():
return load_parquet_replay_dir(scene_path, replay_path)
frames_by_camera: dict[str, list[CameraFrame]] = {}
for raw_line in replay_path.read_text(encoding="utf-8").splitlines():
if not raw_line.strip():
continue
payload = json.loads(raw_line)
camera_name = str(payload["camera"])
detections: list[PoseDetection] = []
for detection_payload in payload["detections"]:
detections.append(
PoseDetection(
bbox=np.asarray(detection_payload["bbox"], dtype=np.float64),
bbox_confidence=float(detection_payload["bbox_confidence"]),
keypoints=np.asarray(
detection_payload["keypoints"], dtype=np.float64
),
)
)
frames_by_camera.setdefault(camera_name, []).append(
CameraFrame(
camera_name=camera_name,
frame_index=int(payload["frame_index"]),
timestamp_unix_ns=int(payload["timestamp_unix_ns"]),
detections=tuple(detections),
source_size=(
int(payload["source_size"][0]),
int(payload["source_size"][1]),
),
)
)
for frames in frames_by_camera.values():
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
return ReplaySequence(
scene_path=scene_path,
replay_path=replay_path,
frames_by_camera=frames_by_camera,
)
def _pose_detections_from_parquet_row(row: dict[str, object]) -> tuple[PoseDetection, ...]:
boxes = np.asarray(row.get("boxes", []), dtype=np.float64)
if boxes.size == 0:
boxes = np.empty((0, 4), dtype=np.float64)
box_scores = np.asarray(row.get("box_scores", []), dtype=np.float64)
keypoints_xy = np.asarray(row.get("kps", []), dtype=np.float64)
if keypoints_xy.size == 0:
keypoints_xy = np.empty((0, 133, 2), dtype=np.float64)
keypoint_scores = np.asarray(row.get("kps_scores", []), dtype=np.float64)
if keypoint_scores.size == 0:
keypoint_scores = np.empty((0, 133), dtype=np.float64)
raw_keypoint_schema = row.get("keypoint_schema", "coco_wholebody133")
if raw_keypoint_schema not in {"coco17", "coco_wholebody133"}:
raise ValueError(f"Unsupported keypoint schema in parquet replay: {raw_keypoint_schema!r}")
keypoint_schema = cast(CocoKeypointSchema, raw_keypoint_schema)
if keypoints_xy.shape[0] != keypoint_scores.shape[0]:
raise ValueError(
"Expected matching keypoint coordinate and score counts in parquet replay row."
)
detections: list[PoseDetection] = []
for detection_index in range(int(keypoints_xy.shape[0])):
normalized = normalize_coco_body20(
keypoints_xy[detection_index],
keypoint_scores[detection_index],
keypoint_schema=keypoint_schema,
)
bbox = (
boxes[detection_index]
if detection_index < boxes.shape[0]
else infer_bbox_from_keypoints(normalized)
)
visible = normalized[:, 2] > 0.0
bbox_confidence = (
float(box_scores[detection_index])
if detection_index < box_scores.shape[0]
else float(np.mean(normalized[visible, 2]))
if np.any(visible)
else 0.0
)
detections.append(
PoseDetection(
bbox=np.asarray(bbox, dtype=np.float64),
bbox_confidence=bbox_confidence,
keypoints=np.asarray(normalized, dtype=np.float64),
)
)
return tuple(detections)
@beartype
def load_parquet_replay_dir(scene_path: Path, replay_root: Path) -> ReplaySequence:
parquet_paths = sorted(replay_root.glob(f"*{DETECTED_PARQUET_SUFFIX}"))
if not parquet_paths:
raise FileNotFoundError(
f"No detection parquet files matching *{DETECTED_PARQUET_SUFFIX} under {replay_root}."
)
frames_by_camera: dict[str, list[CameraFrame]] = {}
for parquet_path in parquet_paths:
camera_name = parquet_path.name.removesuffix(DETECTED_PARQUET_SUFFIX)
frames: list[CameraFrame] = []
for row in pq.read_table(parquet_path).to_pylist():
frames.append(
CameraFrame(
camera_name=camera_name,
frame_index=int(row["frame_index"]),
timestamp_unix_ns=int(row["timestamp_unix_ns"]),
detections=_pose_detections_from_parquet_row(row),
source_size=(
int(row.get("source_width", 0)),
int(row.get("source_height", 0)),
),
)
)
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
frames_by_camera[camera_name] = frames
return ReplaySequence(
scene_path=scene_path,
replay_path=replay_root,
frames_by_camera=frames_by_camera,
)
@@ -1,16 +1,19 @@
from typing import Any
import numpy as np import numpy as np
import rpt import rpt
from beartype import beartype from beartype import beartype
from rpt._core import TriangulationConfig, TriangulationTrace # type: ignore[reportMissingModuleSource]
from pose_tracking_exp.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT, BODY20_INDEX_BY_NAME from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME, BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT
from pose_tracking_exp.models import CameraFrame, ProposalCluster, SceneConfig from pose_tracking_exp.common.tensor_types import Pose2D
from pose_tracking_exp.tensor_types import Pose2D from pose_tracking_exp.schema import CameraFrame, ProposalCluster, SceneConfig
@beartype def build_rpt_config(
def build_rpt_config(scene: SceneConfig, *, min_match_score: float, min_group_size: int) -> Any: scene: SceneConfig,
*,
min_match_score: float,
min_group_size: int,
) -> TriangulationConfig:
cameras = [ cameras = [
{ {
"name": camera.name, "name": camera.name,
@@ -50,7 +53,7 @@ def pack_view_detections(frames: tuple[CameraFrame, ...], unmatched_indices: dic
@beartype @beartype
def extract_clusters( def extract_clusters(
trace: Any, trace: TriangulationTrace,
camera_names: tuple[str, ...], camera_names: tuple[str, ...],
) -> tuple[ProposalCluster, ...]: ) -> tuple[ProposalCluster, ...]:
clusters: list[ProposalCluster] = [] clusters: list[ProposalCluster] = []
@@ -2,7 +2,7 @@ from collections.abc import Iterable
from beartype import beartype from beartype import beartype
from pose_tracking_exp.models import CameraFrame, FrameBundle, ReplaySequence from pose_tracking_exp.schema import CameraFrame, FrameBundle, ReplaySequence
@beartype @beartype
@@ -50,4 +50,3 @@ def synchronize_frames(
) )
) )
return bundles return bundles
@@ -5,10 +5,10 @@ import numpy as np
from beartype import beartype from beartype import beartype
from scipy.optimize import linear_sum_assignment from scipy.optimize import linear_sum_assignment
from pose_tracking_exp.camera_math import project_pose from pose_tracking_exp.common.camera_math import project_pose
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME, CORE_JOINT_INDICES from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME, CORE_JOINT_INDICES
from pose_tracking_exp.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview from pose_tracking_exp.common.normalization import core_reprojection_distance
from pose_tracking_exp.models import ( from pose_tracking_exp.schema import (
ActiveTrackState, ActiveTrackState,
FrameBundle, FrameBundle,
PoseDetection, PoseDetection,
@@ -20,8 +20,8 @@ from pose_tracking_exp.models import (
TrackerConfig, TrackerConfig,
TrackerDiagnostics, TrackerDiagnostics,
) )
from pose_tracking_exp.normalization import core_reprojection_distance from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview
from pose_tracking_exp.rpt_adapter import build_rpt_config, extract_clusters, pack_view_detections from pose_tracking_exp.tracking.rpt_adapter import build_rpt_config, extract_clusters, pack_view_detections
CORE_JOINT_MASK = np.zeros((20,), dtype=bool) CORE_JOINT_MASK = np.zeros((20,), dtype=bool)
CORE_JOINT_MASK[list(CORE_JOINT_INDICES)] = True CORE_JOINT_MASK[list(CORE_JOINT_INDICES)] = True
@@ -78,20 +78,24 @@ class PoseTracker:
return replace(self._diagnostics) return replace(self._diagnostics)
def run(self, bundles: list[FrameBundle]) -> list[TrackedFrameResult]: def run(self, bundles: list[FrameBundle]) -> list[TrackedFrameResult]:
self._tentative.clear()
self._active.clear()
self._lost.clear()
self._next_track_id = 1
self._diagnostics = TrackerDiagnostics() self._diagnostics = TrackerDiagnostics()
return [self.step(bundle) for bundle in bundles] return [self.step(bundle) for bundle in bundles]
def step(self, bundle: FrameBundle) -> TrackedFrameResult: def step(self, bundle: FrameBundle) -> TrackedFrameResult:
self._enforce_single_person_constraints() self._enforce_track_limits()
matches, unmatched = self._match_existing_tracks(bundle) matches, unmatched = self._match_existing_tracks(bundle)
self._update_active_tracks(bundle, matches) self._update_active_tracks(bundle, matches)
self._update_lost_tracks(bundle, matches) self._update_lost_tracks(bundle, matches)
proposals = self._refresh_single_person_track_from_proposals(bundle, self._build_proposals(bundle, unmatched)) proposals = self._refresh_capped_single_track_from_proposals(bundle, self._build_proposals(bundle, unmatched))
self._update_tentative_tracks(bundle, self._birth_candidate_proposals(proposals)) self._update_tentative_tracks(bundle, self._birth_candidate_proposals(proposals))
self._promote_tentative_tracks(bundle) self._promote_tentative_tracks(bundle)
self._reacquire_lost_tracks(bundle, proposals) self._reacquire_lost_tracks(bundle, proposals)
self._delete_expired_tracks() self._delete_expired_tracks()
self._enforce_single_person_constraints() self._enforce_track_limits()
return TrackedFrameResult( return TrackedFrameResult(
bundle_index=bundle.bundle_index, bundle_index=bundle.bundle_index,
timestamp_unix_ns=bundle.timestamp_unix_ns, timestamp_unix_ns=bundle.timestamp_unix_ns,
@@ -101,46 +105,58 @@ class PoseTracker:
proposals=proposals, proposals=proposals,
) )
def _single_person_mode(self) -> bool: def _track_limit(self) -> int | None:
return self._config.mode == "single_person" return self._config.max_active_tracks
def _keep_best_active_track(self) -> None: def _single_track_cap_enabled(self) -> bool:
if len(self._active) <= 1: return self._config.max_active_tracks == 1
def _keep_best_active_tracks(self, limit: int) -> None:
if len(self._active) <= limit:
return return
best_id = max(self._active, key=lambda track_id: _active_track_rank(self._active[track_id])) ranked_ids = sorted(self._active, key=lambda track_id: _active_track_rank(self._active[track_id]), reverse=True)
keep_ids = set(ranked_ids[:limit])
for track_id in list(self._active): for track_id in list(self._active):
if track_id != best_id: if track_id not in keep_ids:
self._active.pop(track_id, None) self._active.pop(track_id, None)
def _keep_best_lost_track(self) -> None: def _keep_best_lost_tracks(self, limit: int) -> None:
if len(self._lost) <= 1: if len(self._lost) <= limit:
return return
best_id = max(self._lost, key=lambda track_id: _lost_track_rank(self._lost[track_id])) ranked_ids = sorted(self._lost, key=lambda track_id: _lost_track_rank(self._lost[track_id]), reverse=True)
keep_ids = set(ranked_ids[:limit])
for track_id in list(self._lost): for track_id in list(self._lost):
if track_id != best_id: if track_id not in keep_ids:
self._lost.pop(track_id, None) self._lost.pop(track_id, None)
def _keep_best_tentative_track(self) -> None: def _keep_best_tentative_tracks(self, limit: int) -> None:
if len(self._tentative) <= 1: if len(self._tentative) <= limit:
return return
best_id = max(self._tentative, key=lambda track_id: _tentative_track_rank(self._tentative[track_id])) ranked_ids = sorted(
self._tentative,
key=lambda track_id: _tentative_track_rank(self._tentative[track_id]),
reverse=True,
)
keep_ids = set(ranked_ids[:limit])
for track_id in list(self._tentative): for track_id in list(self._tentative):
if track_id != best_id: if track_id not in keep_ids:
self._tentative.pop(track_id, None) self._tentative.pop(track_id, None)
def _enforce_single_person_constraints(self) -> None: def _enforce_track_limits(self) -> None:
if not self._single_person_mode(): limit = self._track_limit()
if limit is None:
return
self._keep_best_active_tracks(limit)
self._keep_best_lost_tracks(limit)
self._keep_best_tentative_tracks(limit)
if not self._single_track_cap_enabled():
return return
self._keep_best_active_track()
if self._active: if self._active:
self._lost.clear() self._lost.clear()
self._tentative.clear() self._tentative.clear()
return return
self._keep_best_lost_track()
if self._lost: if self._lost:
self._tentative.clear() self._tentative.clear()
return
self._keep_best_tentative_track()
def _predicted_pose_by_track(self) -> dict[int, np.ndarray]: def _predicted_pose_by_track(self) -> dict[int, np.ndarray]:
result: dict[int, np.ndarray] = {} result: dict[int, np.ndarray] = {}
@@ -278,7 +294,7 @@ class PoseTracker:
self._diagnostics.proposal_build_seconds += perf_counter() - started_at self._diagnostics.proposal_build_seconds += perf_counter() - started_at
def _birth_candidate_proposals(self, proposals: tuple[ProposalCluster, ...]) -> tuple[ProposalCluster, ...]: def _birth_candidate_proposals(self, proposals: tuple[ProposalCluster, ...]) -> tuple[ProposalCluster, ...]:
if not self._single_person_mode(): if not self._single_track_cap_enabled():
return proposals return proposals
if self._active or self._lost: if self._active or self._lost:
return () return ()
@@ -286,12 +302,12 @@ class PoseTracker:
return () return ()
return (max(proposals, key=_proposal_rank),) return (max(proposals, key=_proposal_rank),)
def _refresh_single_person_track_from_proposals( def _refresh_capped_single_track_from_proposals(
self, self,
bundle: FrameBundle, bundle: FrameBundle,
proposals: tuple[ProposalCluster, ...], proposals: tuple[ProposalCluster, ...],
) -> tuple[ProposalCluster, ...]: ) -> tuple[ProposalCluster, ...]:
if not self._single_person_mode() or not proposals: if not self._single_track_cap_enabled() or not proposals:
return proposals return proposals
remaining = list(proposals) remaining = list(proposals)
+1
View File
@@ -0,0 +1 @@
"""Test package for support helpers and test-local utilities."""
+1
View File
@@ -0,0 +1 @@
"""Test-only support helpers."""
@@ -1,18 +1,21 @@
from pathlib import Path from pathlib import Path
import click
import cv2 import cv2
import numpy as np import numpy as np
import pyarrow.parquet as pq import pyarrow.parquet as pq
from beartype import beartype from beartype import beartype
from loguru import logger
from pose_tracking_exp.models import CameraCalibration, CameraFrame, FrameBundle, PoseDetection, SceneConfig from pose_tracking_exp.common.normalization import infer_bbox_from_keypoints, normalize_rtmpose_body20
from pose_tracking_exp.normalization import infer_bbox_from_keypoints, normalize_rtmpose_body20 from pose_tracking_exp.schema import CameraCalibration, CameraFrame, FrameBundle, PoseDetection, SceneConfig, TrackerConfig
from pose_tracking_exp.tracking import PoseTracker
_NOMINAL_FRAME_PERIOD_NS = 33_333_333 _NOMINAL_FRAME_PERIOD_NS = 33_333_333
@beartype @beartype
def load_actualtest_scene(root: Path) -> SceneConfig: def load_actual_test_scene(root: Path) -> SceneConfig:
# ActualTest parquet comes from the ChArUco/OpenCV side, so `rvec` / `tvec` # ActualTest parquet comes from the ChArUco/OpenCV side, so `rvec` / `tvec`
# are world->camera extrinsics. The RPT-facing camera pose is derived later # are world->camera extrinsics. The RPT-facing camera pose is derived later
# from this canonical OpenCV form. # from this canonical OpenCV form.
@@ -40,13 +43,14 @@ def load_actualtest_scene(root: Path) -> SceneConfig:
@beartype @beartype
def load_actualtest_segment_bundles( def load_actual_test_segment_bundles(
root: Path, root: Path,
segment_name: str, segment_name: str,
*, *,
frame_start: int = 690, frame_start: int = 690,
frame_stop: int | None = None, frame_stop: int | None = None,
max_frames: int | None = None, max_frames: int | None = None,
min_cameras_with_rows: int = 1,
min_visible_joints: int = 6, min_visible_joints: int = 6,
) -> list[FrameBundle]: ) -> list[FrameBundle]:
segment_root = root / segment_name segment_root = root / segment_name
@@ -98,24 +102,31 @@ def load_actualtest_segment_bundles(
if not by_camera: if not by_camera:
return [] return []
common_frames = sorted(set.intersection(*(set(frames) for frames in by_camera.values()))) candidate_frames = sorted(set().union(*(set(frames) for frames in by_camera.values())))
if min_cameras_with_rows > 1:
candidate_frames = [
frame_index
for frame_index in candidate_frames
if sum(frame_index in frames for frames in by_camera.values()) >= min_cameras_with_rows
]
if max_frames is not None: if max_frames is not None:
common_frames = common_frames[:max_frames] candidate_frames = candidate_frames[:max_frames]
scene = load_actualtest_scene(root) scene = load_actual_test_scene(root)
camera_by_name = {camera.name: camera for camera in scene.cameras} camera_by_name = {camera.name: camera for camera in scene.cameras}
bundles: list[FrameBundle] = [] bundles: list[FrameBundle] = []
for bundle_index, frame_index in enumerate(common_frames): ordered_camera_names = [camera.name for camera in scene.cameras]
for bundle_index, frame_index in enumerate(candidate_frames):
timestamp_unix_ns = bundle_index * _NOMINAL_FRAME_PERIOD_NS timestamp_unix_ns = bundle_index * _NOMINAL_FRAME_PERIOD_NS
views: list[CameraFrame] = [] views: list[CameraFrame] = []
for camera_name in sorted(by_camera): for camera_name in ordered_camera_names:
camera = camera_by_name[camera_name] camera = camera_by_name[camera_name]
views.append( views.append(
CameraFrame( CameraFrame(
camera_name=camera_name, camera_name=camera_name,
frame_index=frame_index, frame_index=frame_index,
timestamp_unix_ns=timestamp_unix_ns, timestamp_unix_ns=timestamp_unix_ns,
detections=by_camera[camera_name][frame_index], detections=by_camera.get(camera_name, {}).get(frame_index, ()),
source_size=(camera.width, camera.height), source_size=(camera.width, camera.height),
) )
) )
@@ -127,3 +138,49 @@ def load_actualtest_segment_bundles(
) )
) )
return bundles return bundles
@click.command()
@click.argument("root_path", type=click.Path(path_type=Path, exists=True, file_okay=False))
@click.option("--segment", "segment_name", default="Segment_1", show_default=True)
@click.option("--frame-start", default=690, type=int, show_default=True)
@click.option("--frame-stop", type=int)
@click.option("--max-frames", type=click.IntRange(min=1))
@click.option("--min-camera-rows", default=1, type=click.IntRange(min=1), show_default=True)
@click.option("--max-active-tracks", default=1, type=click.IntRange(min=1), show_default=True)
def main(
root_path: Path,
segment_name: str,
frame_start: int,
frame_stop: int | None,
max_frames: int | None,
min_camera_rows: int,
max_active_tracks: int,
) -> None:
logger.remove()
logger.add(
click.get_text_stream("stderr"),
level="INFO",
format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}",
)
scene = load_actual_test_scene(root_path)
bundles = load_actual_test_segment_bundles(
root_path,
segment_name,
frame_start=frame_start,
frame_stop=frame_stop,
max_frames=max_frames,
min_cameras_with_rows=min_camera_rows,
)
tracker = PoseTracker(scene, TrackerConfig(max_active_tracks=max_active_tracks))
results = tracker.run(bundles)
logger.info(
"actual_test bundles={} active_frames={} proposal_frames={}",
len(results),
sum(1 for result in results if result.active_tracks),
sum(1 for result in results if result.proposals),
)
if __name__ == "__main__":
main()
@@ -4,8 +4,8 @@ import numpy as np
import pyarrow as pa import pyarrow as pa
import pyarrow.parquet as pq import pyarrow.parquet as pq
from pose_tracking_exp.actualtest import load_actualtest_scene, load_actualtest_segment_bundles from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME from tests.support.actual_test import load_actual_test_scene, load_actual_test_segment_bundles
def _write_parquet(path: Path, rows: list[dict[str, object]]) -> None: def _write_parquet(path: Path, rows: list[dict[str, object]]) -> None:
@@ -25,7 +25,7 @@ def _sample_rtmpose_detection() -> tuple[list[float], list[list[float]], list[fl
return [8.0, 4.0, 32.0, 64.0], keypoints_xy.tolist(), scores.tolist() return [8.0, 4.0, 32.0, 64.0], keypoints_xy.tolist(), scores.tolist()
def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None: def test_load_actual_test_parquet_scene_and_segment(tmp_path: Path) -> None:
root = tmp_path / "ActualTest_WeiHua" root = tmp_path / "ActualTest_WeiHua"
_write_parquet( _write_parquet(
root / "camera_params" / "camera_params.parquet", root / "camera_params" / "camera_params.parquet",
@@ -62,8 +62,8 @@ def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None:
], ],
) )
scene = load_actualtest_scene(root) scene = load_actual_test_scene(root)
bundles = load_actualtest_segment_bundles(root, "Segment_1", frame_start=690, max_frames=1) bundles = load_actual_test_segment_bundles(root, "Segment_1", frame_start=690, max_frames=1)
assert [camera.name for camera in scene.cameras] == ["5602", "5603"] assert [camera.name for camera in scene.cameras] == ["5602", "5603"]
np.testing.assert_allclose(scene.cameras[0].pose_T, [0.0, 0.0, 0.0]) np.testing.assert_allclose(scene.cameras[0].pose_T, [0.0, 0.0, 0.0])
@@ -75,3 +75,53 @@ def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None:
bundles[0].views[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2], bundles[0].views[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2],
[20.0, 60.0], [20.0, 60.0],
) )
def test_load_actual_test_keeps_partial_camera_frames(tmp_path: Path) -> None:
root = tmp_path / "ActualTest_WeiHua"
_write_parquet(
root / "camera_params" / "camera_params.parquet",
[
{
"name": "AF_02",
"port": 5602,
"intrinsic": {
"camera_matrix": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
"distortion_coefficients": [0.0, 0.0, 0.0, 0.0, 0.0],
},
"extrinsic": {"rvec": [0.0, 0.0, 0.0], "tvec": [0.0, 0.0, 0.0]},
"resolution": {"width": 640, "height": 480},
},
{
"name": "AF_03",
"port": 5603,
"intrinsic": {
"camera_matrix": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
"distortion_coefficients": [0.0, 0.0, 0.0, 0.0, 0.0],
},
"extrinsic": {"rvec": [0.0, 0.0, 0.0], "tvec": [1.0, 0.0, 0.0]},
"resolution": {"width": 640, "height": 480},
},
],
)
box, keypoints_xy, scores = _sample_rtmpose_detection()
_write_parquet(
root / "Segment_1" / "5602_detected.parquet",
[
{"frame_index": 690, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
{"frame_index": 691, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
],
)
_write_parquet(
root / "Segment_1" / "5603_detected.parquet",
[
{"frame_index": 690, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
],
)
bundles = load_actual_test_segment_bundles(root, "Segment_1", frame_start=690)
assert [bundle.views[0].frame_index for bundle in bundles] == [690, 691]
assert [view.camera_name for view in bundles[1].views] == ["5602", "5603"]
assert len(bundles[1].views[0].detections) == 1
assert bundles[1].views[1].detections == ()
+6 -6
View File
@@ -8,9 +8,9 @@ import pytest
pytest.importorskip("rpt") pytest.importorskip("rpt")
from pose_tracking_exp.models import CameraCalibration, SceneConfig from pose_tracking_exp.schema import CameraCalibration, CameraModel, SceneConfig, parse_camera_model
from pose_tracking_exp.replay import load_scene_file from pose_tracking_exp.tracking.replay_io import load_scene_file
from pose_tracking_exp.rpt_adapter import build_rpt_config from pose_tracking_exp.tracking.rpt_adapter import build_rpt_config
class _CameraArgs(NamedTuple): class _CameraArgs(NamedTuple):
@@ -19,7 +19,7 @@ class _CameraArgs(NamedTuple):
height: int height: int
K: np.ndarray K: np.ndarray
DC: np.ndarray DC: np.ndarray
model: str model: CameraModel
def _camera_args() -> _CameraArgs: def _camera_args() -> _CameraArgs:
@@ -29,7 +29,7 @@ def _camera_args() -> _CameraArgs:
height=480, height=480,
K=np.asarray([[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]], dtype=np.float64), K=np.asarray([[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]], dtype=np.float64),
DC=np.zeros(5, dtype=np.float64), DC=np.zeros(5, dtype=np.float64),
model="pinhole", model=parse_camera_model("pinhole"),
) )
@@ -139,7 +139,7 @@ def test_build_rpt_config_uses_pose_convention(monkeypatch: pytest.MonkeyPatch)
captured["min_group_size"] = min_group_size captured["min_group_size"] = min_group_size
return captured return captured
monkeypatch.setattr("pose_tracking_exp.rpt_adapter.rpt.make_triangulation_config", fake_make_triangulation_config) monkeypatch.setattr("pose_tracking_exp.tracking.rpt_adapter.rpt.make_triangulation_config", fake_make_triangulation_config)
build_rpt_config(scene, min_match_score=0.5, min_group_size=2) build_rpt_config(scene, min_match_score=0.5, min_group_size=2)
+223
View File
@@ -0,0 +1,223 @@
from collections.abc import AsyncIterator, Sequence
from pathlib import Path
import anyio
import numpy as np
import pytest
from pose_tracking_exp.detection.config import (
DetectionRunnerConfig,
load_detection_runner_config,
resolve_instances,
)
from pose_tracking_exp.detection.runner import (
PendingFrame,
SourceSlot,
run_detection_runner,
store_latest_frame,
take_pending_batch,
)
from pose_tracking_exp.schema.detection import PoseDetections, SourceFrame
def test_load_detection_runner_config_from_toml_and_env(
monkeypatch: pytest.MonkeyPatch,
tmp_path: Path,
) -> None:
config_path = tmp_path / "runner.toml"
config_path.write_text(
"\n".join(
[
'instances = ["front_left", "front_right"]',
'device = "cuda:1"',
'nats_host = "nats://localhost:4222"',
'yolo_checkpoint = "checkpoint/yolo/yolo11_mix_epoch10.pt"',
'pose_checkpoint = "checkpoint/dwpose/best_coco-wholebody_AP_epoch_50.pth"',
"bbox_area_threshold = 2500",
"max_batch_frames = 6",
"max_batch_wait_ms = 3",
]
),
encoding="utf-8",
)
monkeypatch.setenv("POSE_TRACKING_EXP_DETECTION_DEVICE", "cpu")
config = load_detection_runner_config(config_path)
assert config.instances == ("front_left", "front_right")
assert config.device == "cpu"
assert config.nats_host == "nats://localhost:4222"
assert config.bbox_area_threshold == 2500
assert config.max_batch_frames == 6
assert config.max_batch_wait_ms == 3
def test_resolve_instances_prefers_cli_values() -> None:
assert resolve_instances(("cli_a", "cli_b"), ("cfg_a",)) == ("cli_a", "cli_b")
def test_resolve_instances_falls_back_to_config_values() -> None:
assert resolve_instances((), ("cfg_a", "cfg_b")) == ("cfg_a", "cfg_b")
def test_store_latest_frame_overwrites_pending_frame() -> None:
slot = SourceSlot(source_name="front_left")
first = SourceFrame(
source_name="front_left",
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
frame_index=1,
timestamp_unix_ns=100,
)
second = SourceFrame(
source_name="front_left",
image_bgr=np.ones((1, 1, 3), dtype=np.uint8),
frame_index=2,
timestamp_unix_ns=200,
)
store_latest_frame(slot, first)
store_latest_frame(slot, second)
assert slot.received_frames == 2
assert slot.dropped_frames == 1
assert slot.pending_frame is not None
assert slot.pending_frame.frame is second
def test_take_pending_batch_collects_at_most_one_frame_per_source() -> None:
slots = {
"front_left": SourceSlot(
source_name="front_left",
pending_frame=PendingFrame(
source_name="front_left",
frame=SourceFrame(
source_name="front_left",
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
frame_index=11,
timestamp_unix_ns=110,
),
),
),
"front_right": SourceSlot(
source_name="front_right",
pending_frame=PendingFrame(
source_name="front_right",
frame=SourceFrame(
source_name="front_right",
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
frame_index=22,
timestamp_unix_ns=220,
),
),
),
"rear": SourceSlot(
source_name="rear",
pending_frame=PendingFrame(
source_name="rear",
frame=SourceFrame(
source_name="rear",
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
frame_index=33,
timestamp_unix_ns=330,
),
),
),
}
batch = take_pending_batch(slots, max_batch_frames=2)
assert [frame.source_name for frame in batch] == ["front_left", "front_right"]
assert slots["front_left"].pending_frame is None
assert slots["front_right"].pending_frame is None
assert slots["rear"].pending_frame is not None
class StubSource:
def __init__(self, source_name: str, frames: tuple[SourceFrame, ...]) -> None:
self.source_name = source_name
self._frames = frames
async def frames(self) -> AsyncIterator[SourceFrame]:
for frame in self._frames:
yield frame
class StubPoseShim:
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
detections: list[PoseDetections] = []
for frame in frames:
detections.append(
PoseDetections(
source_name=frame.source_name,
frame_index=frame.frame_index,
source_size=(frame.image_bgr.shape[1], frame.image_bgr.shape[0]),
boxes_xyxy=np.asarray([[0.0, 0.0, 10.0, 10.0]], dtype=np.float32),
box_scores=np.asarray([1.0], dtype=np.float32),
keypoints_xy=np.zeros((1, 133, 2), dtype=np.float32),
keypoint_scores=np.ones((1, 133), dtype=np.float32),
timestamp_unix_ns=frame.timestamp_unix_ns,
keypoint_schema="coco_wholebody133",
)
)
return detections
class StubSink:
def __init__(self) -> None:
self.messages: list[PoseDetections] = []
self.closed = False
async def publish_pose(self, detections: PoseDetections) -> None:
self.messages.append(detections)
async def aclose(self) -> None:
self.closed = True
def test_run_detection_runner_publishes_payloads() -> None:
sink = StubSink()
sources = (
StubSource(
"cam0",
(
SourceFrame(
source_name="cam0",
image_bgr=np.zeros((2, 3, 3), dtype=np.uint8),
frame_index=1,
timestamp_unix_ns=100,
),
),
),
StubSource(
"cam1",
(
SourceFrame(
source_name="cam1",
image_bgr=np.zeros((2, 3, 3), dtype=np.uint8),
frame_index=2,
timestamp_unix_ns=200,
),
),
),
)
config = DetectionRunnerConfig(
instances=("cam0", "cam1"),
pose_config_path=Path(__file__),
yolo_checkpoint=Path(__file__),
pose_checkpoint=Path(__file__),
max_batch_frames=2,
)
anyio.run(
run_detection_runner,
sources,
StubPoseShim(),
sink,
config,
)
assert sink.closed is True
assert [(item.source_name, item.frame_index, item.timestamp_unix_ns) for item in sink.messages] == [
("cam0", 1, 100),
("cam1", 2, 200),
]
+137
View File
@@ -0,0 +1,137 @@
import json
from pathlib import Path
import anyio
import cv2
import numpy as np
import pyarrow.parquet as pq
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.detection.sinks import ParquetPoseSink
from pose_tracking_exp.detection.sources import VideoFrameSource
from pose_tracking_exp.schema.detection import PoseDetections
from pose_tracking_exp.tracking import load_replay_file
def _write_synthetic_video(path: Path) -> None:
writer = cv2.VideoWriter(
str(path),
cv2.VideoWriter.fourcc(*"MJPG"),
10.0,
(8, 6),
)
if not writer.isOpened():
raise RuntimeError("Could not open synthetic video writer.")
try:
for frame_index in range(3):
frame = np.full((6, 8, 3), frame_index * 32, dtype=np.uint8)
writer.write(frame)
finally:
writer.release()
def _sample_wholebody_detection(*, source_name: str, frame_index: int) -> PoseDetections:
keypoints_xy = np.zeros((1, 133, 2), dtype=np.float32)
keypoint_scores = np.zeros((1, 133), dtype=np.float32)
keypoints_xy[0, 5] = [10.0, 20.0]
keypoints_xy[0, 6] = [30.0, 20.0]
keypoints_xy[0, 11] = [12.0, 60.0]
keypoints_xy[0, 12] = [28.0, 60.0]
keypoints_xy[0, 0] = [20.0, 8.0]
keypoint_scores[0, [0, 5, 6, 11, 12]] = 1.0
return PoseDetections(
source_name=source_name,
frame_index=frame_index,
source_size=(640, 480),
boxes_xyxy=np.asarray([[8.0, 4.0, 32.0, 64.0]], dtype=np.float32),
box_scores=np.asarray([0.9], dtype=np.float32),
keypoints_xy=keypoints_xy,
keypoint_scores=keypoint_scores,
timestamp_unix_ns=frame_index * 100_000_000,
keypoint_schema="coco_wholebody133",
)
def test_video_frame_source_reads_frames(tmp_path: Path) -> None:
video_path = tmp_path / "cam0.avi"
_write_synthetic_video(video_path)
source = VideoFrameSource(video_path, source_name="cam0")
async def collect() -> list[tuple[str, int, int, tuple[int, int, int]]]:
frames: list[tuple[str, int, int, tuple[int, int, int]]] = []
async for frame in source.frames():
frames.append(
(
frame.source_name,
frame.frame_index,
frame.timestamp_unix_ns,
frame.image_bgr.shape,
)
)
return frames
frames = anyio.run(collect)
assert [item[0] for item in frames] == ["cam0", "cam0", "cam0"]
assert [item[1] for item in frames] == [0, 1, 2]
assert [item[3] for item in frames] == [(6, 8, 3), (6, 8, 3), (6, 8, 3)]
assert frames[0][2] <= frames[1][2] <= frames[2][2]
def test_parquet_sink_round_trips_into_tracking_replay(tmp_path: Path) -> None:
output_dir = tmp_path / "detections"
sink = ParquetPoseSink(output_dir, flush_rows=1)
async def write_rows() -> None:
await sink.publish_pose(_sample_wholebody_detection(source_name="cam0", frame_index=0))
await sink.publish_pose(
PoseDetections(
source_name="cam0",
frame_index=1,
source_size=(640, 480),
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
box_scores=np.empty((0,), dtype=np.float32),
keypoints_xy=np.empty((0, 133, 2), dtype=np.float32),
keypoint_scores=np.empty((0, 133), dtype=np.float32),
timestamp_unix_ns=100_000_000,
keypoint_schema="coco_wholebody133",
)
)
await sink.aclose()
anyio.run(write_rows)
parquet_path = output_dir / "cam0_detected.parquet"
assert parquet_path.exists()
assert pq.read_table(parquet_path).num_rows == 2
scene_path = tmp_path / "scene.json"
scene_path.write_text(
json.dumps(
{
"room_size": [6.0, 4.0, 3.0],
"room_center": [0.0, 0.0, 1.0],
"cameras": [
{
"name": "cam0",
"width": 640,
"height": 480,
"K": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
"DC": [0.0, 0.0, 0.0, 0.0, 0.0],
"R": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]],
"T": [[0.0], [0.0], [0.0]],
}
],
}
),
encoding="utf-8",
)
replay = load_replay_file(scene_path, output_dir)
frames = replay.frames_by_camera["cam0"]
assert [frame.frame_index for frame in frames] == [0, 1]
assert frames[1].detections == ()
np.testing.assert_allclose(
frames[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2],
[20.0, 60.0],
)
+3 -3
View File
@@ -1,7 +1,7 @@
import numpy as np import numpy as np
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.kinematics import seed_state_from_pose3d from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d
def _sample_pose3d() -> np.ndarray: def _sample_pose3d() -> np.ndarray:
@@ -38,7 +38,7 @@ def test_seed_state_from_pose3d_does_not_call_least_squares(monkeypatch) -> None
def fail_least_squares(*args: object, **kwargs: object) -> object: def fail_least_squares(*args: object, **kwargs: object) -> object:
raise AssertionError("seed_state_from_pose3d should not call scipy.optimize.least_squares") raise AssertionError("seed_state_from_pose3d should not call scipy.optimize.least_squares")
monkeypatch.setattr("pose_tracking_exp.kinematics.least_squares", fail_least_squares) monkeypatch.setattr("pose_tracking_exp.tracking.kinematics.least_squares", fail_least_squares)
state = seed_state_from_pose3d(_sample_pose3d()) state = seed_state_from_pose3d(_sample_pose3d())
assert state.parameters.shape == (31,) assert state.parameters.shape == (31,)
+49 -7
View File
@@ -4,11 +4,17 @@ from pathlib import Path
import numpy as np import numpy as np
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.normalization import normalize_rtmpose_body20 from pose_tracking_exp.common.normalization import normalize_coco_body20, normalize_rtmpose_body20
from pose_tracking_exp.parajumping import PROTOCOL_HEADER, convert_payload_record, decode_pose_payload from pose_tracking_exp.detection.cvmmap_payload import (
from pose_tracking_exp.replay import load_replay_file, load_scene_file COCO_WHOLEBODY_KEYPOINT_COUNT,
from pose_tracking_exp.sync import synchronize_frames PROTOCOL_HEADER,
CvmmapPosePayloadCodec,
convert_payload_record,
decode_pose_payload,
)
from pose_tracking_exp.schema.detection import PoseDetections
from pose_tracking_exp.tracking import load_replay_file, load_scene_file, synchronize_frames
def _encode_payload( def _encode_payload(
@@ -31,7 +37,7 @@ def _encode_payload(
+ np.asarray(box_scores, dtype=np.uint8).tobytes() + np.asarray(box_scores, dtype=np.uint8).tobytes()
+ int(keypoints_xy.shape[0]).to_bytes(1, "little") + int(keypoints_xy.shape[0]).to_bytes(1, "little")
+ np.asarray(keypoints_xy, dtype="<u2").tobytes() + np.asarray(keypoints_xy, dtype="<u2").tobytes()
+ int(keypoint_scores.size).to_bytes(1, "little") + int(keypoint_scores.shape[0]).to_bytes(1, "little")
+ np.asarray(keypoint_scores, dtype=np.uint8).reshape(-1).tobytes() + np.asarray(keypoint_scores, dtype=np.uint8).reshape(-1).tobytes()
+ int(timestamp_unix_ns).to_bytes(8, "little") + int(timestamp_unix_ns).to_bytes(8, "little")
) )
@@ -54,6 +60,23 @@ def test_normalize_rtmpose_body20_derives_midpoints_and_head():
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0]) np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0])
def test_normalize_coco17_body20_derives_midpoints_and_head():
keypoints = np.zeros((17, 2), dtype=np.float64)
scores = np.zeros((17,), dtype=np.float64)
keypoints[5] = [10.0, 20.0]
keypoints[6] = [30.0, 20.0]
keypoints[11] = [12.0, 60.0]
keypoints[12] = [28.0, 60.0]
keypoints[0] = [20.0, 8.0]
scores[[0, 5, 6, 11, 12]] = 1.0
normalized = normalize_coco_body20(keypoints, scores, keypoint_schema="coco17")
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["hip_middle"], :2], [20.0, 60.0])
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["shoulder_middle"], :2], [20.0, 20.0])
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0])
def test_decode_payload_and_convert_record(): def test_decode_payload_and_convert_record():
keypoints_xy = np.zeros((1, 133, 2), dtype=np.uint16) keypoints_xy = np.zeros((1, 133, 2), dtype=np.uint16)
keypoint_scores = np.zeros((1, 133), dtype=np.uint8) keypoint_scores = np.zeros((1, 133), dtype=np.uint8)
@@ -87,6 +110,26 @@ def test_decode_payload_and_convert_record():
assert converted["frame_index"] == 7 assert converted["frame_index"] == 7
def test_encode_pose_payload_requires_coco_wholebody133():
codec = CvmmapPosePayloadCodec()
detections = PoseDetections(
source_name="cam0",
frame_index=1,
source_size=(640, 480),
boxes_xyxy=np.zeros((1, 4), dtype=np.float32),
box_scores=np.ones((1,), dtype=np.float32),
keypoints_xy=np.zeros((1, COCO_WHOLEBODY_KEYPOINT_COUNT, 2), dtype=np.float32),
keypoint_scores=np.ones((1, COCO_WHOLEBODY_KEYPOINT_COUNT), dtype=np.float32),
timestamp_unix_ns=123,
keypoint_schema="coco_wholebody133",
)
payload = codec.encode(detections)
decoded = decode_pose_payload(payload)
assert decoded.frame_index == 1
assert decoded.reference_size == (640, 480)
def test_load_replay_and_synchronize(tmp_path: Path): def test_load_replay_and_synchronize(tmp_path: Path):
scene_path = tmp_path / "scene.json" scene_path = tmp_path / "scene.json"
replay_path = tmp_path / "replay.jsonl" replay_path = tmp_path / "replay.jsonl"
@@ -153,4 +196,3 @@ def test_load_replay_and_synchronize(tmp_path: Path):
bundles = synchronize_frames(replay, max_skew_ns=20, min_views=2) bundles = synchronize_frames(replay, max_skew_ns=20, min_views=2)
assert len(bundles) == 1 assert len(bundles) == 1
assert {frame.camera_name for frame in bundles[0].views} == {"cam0", "cam1"} assert {frame.camera_name for frame in bundles[0].views} == {"cam0", "cam1"}
+5 -5
View File
@@ -5,9 +5,9 @@ import pytest
pytest.importorskip("rpt") pytest.importorskip("rpt")
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
from pose_tracking_exp.models import CameraCalibration, CameraFrame, FrameBundle, ProposalCluster, SceneConfig, TrackerConfig from pose_tracking_exp.schema import CameraCalibration, CameraFrame, FrameBundle, ProposalCluster, SceneConfig, TrackerConfig
from pose_tracking_exp.tracker import PoseTracker from pose_tracking_exp.tracking import PoseTracker
def _make_scene() -> SceneConfig: def _make_scene() -> SceneConfig:
@@ -96,7 +96,7 @@ def test_single_person_mode_caps_active_tracks(monkeypatch) -> None:
tracker = PoseTracker( tracker = PoseTracker(
_make_scene(), _make_scene(),
TrackerConfig( TrackerConfig(
mode="single_person", max_active_tracks=1,
tentative_min_age=1, tentative_min_age=1,
tentative_hits_required=1, tentative_hits_required=1,
tentative_promote_score=0.0, tentative_promote_score=0.0,
@@ -127,7 +127,7 @@ def test_single_person_mode_reuses_lost_track_id(monkeypatch) -> None:
tracker = PoseTracker( tracker = PoseTracker(
_make_scene(), _make_scene(),
TrackerConfig( TrackerConfig(
mode="single_person", max_active_tracks=1,
tentative_min_age=1, tentative_min_age=1,
tentative_hits_required=1, tentative_hits_required=1,
tentative_promote_score=0.0, tentative_promote_score=0.0,
+3 -3
View File
@@ -6,9 +6,9 @@ import pytest
pytest.importorskip("rpt") pytest.importorskip("rpt")
from pose_tracking_exp.models import CameraFrame, FrameBundle, PoseDetection, TrackerConfig from pose_tracking_exp.schema import CameraFrame, FrameBundle, PoseDetection, TrackerConfig
from pose_tracking_exp.replay import load_scene_file from pose_tracking_exp.tracking import PoseTracker
from pose_tracking_exp.tracker import PoseTracker from pose_tracking_exp.tracking.replay_io import load_scene_file
RPT_ROOT = Path("/home/crosstyan/Code/RapidPoseTriangulation") RPT_ROOT = Path("/home/crosstyan/Code/RapidPoseTriangulation")
Generated
+2304 -199
View File
File diff suppressed because it is too large Load Diff
+1 -3
View File
@@ -1,3 +1 @@
[[index]] no-build-isolation-package = ["chumpy", "xtcocotools"]
url = "https://pypi.org/simple"
default = true
Binary file not shown.
Binary file not shown.