feat!: reorganize detection and tracking pipeline
Refactor the package into common, schema, detection, and tracking namespaces and move dataset-specific ActualTest utilities into tests/support. Add a pluggable detection stack with typed protocols, pydantic-settings config, loguru-based runner logging, cvmmap and headless video sources, NATS and parquet sinks, and a structured coco-wholebody133 payload path. Teach tracking replay loading to consume parquet detection directories directly, preserve empty frames, and keep the video-to-parquet-to-tracking workflow usable for offline E2E runs. Vendor the local mmcv and xtcocotools wheels under Git LFS, update uv sources/lock state, and refresh the mmcv build so mmcv.ops loads successfully with the current torch+cu130 environment.
This commit is contained in:
@@ -0,0 +1 @@
|
|||||||
|
vendor/wheels/*.whl filter=lfs diff=lfs merge=lfs -text
|
||||||
@@ -9,13 +9,13 @@ Offline multiview body tracking experiments built around:
|
|||||||
## Install
|
## Install
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv sync --extra dev
|
uv sync --group dev
|
||||||
```
|
```
|
||||||
|
|
||||||
## Run
|
## Run
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv run pose-tracking-exp run data/scene.json data/replay.jsonl
|
uv run pose-tracking-exp run_tracking data/scene.json data/replay.jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
`scene.json` may declare camera extrinsics in either format:
|
`scene.json` may declare camera extrinsics in either format:
|
||||||
@@ -26,13 +26,58 @@ uv run pose-tracking-exp run data/scene.json data/replay.jsonl
|
|||||||
The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config.
|
The loader normalizes both to OpenCV extrinsics for reprojection and converts to RPT pose only when building the triangulation config.
|
||||||
If you already have an older hand-authored scene file that stored RPT camera pose directly, set `extrinsic_format` explicitly to `rpt_camera_pose`.
|
If you already have an older hand-authored scene file that stored RPT camera pose directly, set `extrinsic_format` explicitly to `rpt_camera_pose`.
|
||||||
|
|
||||||
## Convert ParaJumping Payload Records
|
## Convert cvmmap Pose Payload Records
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv run pose-tracking-exp convert-parajumping input.jsonl output.jsonl
|
uv run pose-tracking-exp convert-cvmmap-pose input.jsonl output.jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
## ActualTest Calibration Caveat
|
The current cvmmap `.pose` wire format is fixed to `COCO-WholeBody-133` keypoints.
|
||||||
|
That is a transport compatibility constraint, not a tracker limitation: the tracker-side normalizer accepts both `coco17` and `coco_wholebody133`, because the first 17 body joints share the standard COCO ordering.
|
||||||
|
|
||||||
|
References:
|
||||||
|
|
||||||
|
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
|
||||||
|
- https://github.com/jin-s13/COCO-WholeBody
|
||||||
|
|
||||||
|
## Run Detection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv sync --group dev --group detection
|
||||||
|
uv run pose-tracking-exp run_detection --config detection.toml camera0 camera1
|
||||||
|
uv run pose-tracking-exp run_detection --source video --output-dir data/detections --config detection.toml cam0=/data/cam0.mp4 cam1=/data/cam1.mp4
|
||||||
|
```
|
||||||
|
|
||||||
|
The embedded 2D detection module is organized as a swapable shim:
|
||||||
|
|
||||||
|
- `FrameSource`: where images come from
|
||||||
|
- `PoseShim`: object detection + pose estimation backend
|
||||||
|
- `PoseSink`: where structured detections are published or stored
|
||||||
|
|
||||||
|
The default backend is `yolo_rtmpose`, and the heavy runtime dependencies live in the optional `detection` dependency group.
|
||||||
|
Checkpoint paths are explicit config fields; the code does not hardcode local checkpoint locations.
|
||||||
|
The only inferred path is the MMPose config path, which is resolved relative to the installed `mmpose` package when `pose_config_path` is omitted.
|
||||||
|
For offline video runs, the default sink is parquet and writes one `*_detected.parquet` file per source. `run_tracking` can consume that directory directly as replay input.
|
||||||
|
|
||||||
|
Example `detection.toml`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
instances = ["camera0", "camera1"]
|
||||||
|
device = "cuda"
|
||||||
|
yolo_checkpoint = "/path/to/yolo_checkpoint.pt"
|
||||||
|
pose_checkpoint = "/path/to/coco_wholebody_pose_checkpoint.pth"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Actual Test Helper
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv run --group dev --group detection python -m tests.support.actual_test /mnt/hddl/data/ActualTest_WeiHua --segment Segment_2 --frame-start 1100 --max-frames 120
|
||||||
|
```
|
||||||
|
|
||||||
|
`actual_test` is a test/support helper, not part of the public installed CLI surface.
|
||||||
|
It keeps the union of per-camera frame indices and fills missing camera rows with empty detections, so later 2-camera stretches are still usable instead of being dropped by a 4-camera intersection.
|
||||||
|
|
||||||
|
## Actual Test Calibration Caveat
|
||||||
|
|
||||||
`ActualTest_WeiHua/camera_params.parquet` appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling `RapidPoseTriangulation`, because RPT expects camera centers and camera-to-world rotation.
|
`ActualTest_WeiHua/camera_params.parquet` appears to store raw OpenCV extrinsics from the ChArUco pipeline, not camera poses. The tracker now converts those values before calling `RapidPoseTriangulation`, because RPT expects camera centers and camera-to-world rotation.
|
||||||
|
|
||||||
|
|||||||
+22
-2
@@ -7,13 +7,14 @@ name = "pose-tracking-exp"
|
|||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
description = "Offline multiview pose tracking experiment with RPT-backed proposal births"
|
description = "Offline multiview pose tracking experiment with RPT-backed proposal births"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.12"
|
requires-python = ">=3.12,<3.13"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
|
"anyio>=4.11.0",
|
||||||
"beartype>=0.19.0",
|
"beartype>=0.19.0",
|
||||||
"click>=8.2.1",
|
"click>=8.2.1",
|
||||||
"jaxtyping>=0.3.2",
|
"jaxtyping>=0.3.2",
|
||||||
"numpy>=2.1.0",
|
"numpy>=2.1.0",
|
||||||
"opencv-python>=4.12.0.88",
|
"opencv-python-headless>=4.12.0.88",
|
||||||
"pyarrow>=21.0.0",
|
"pyarrow>=21.0.0",
|
||||||
"rapid-pose-triangulation",
|
"rapid-pose-triangulation",
|
||||||
"scipy>=1.15.0",
|
"scipy>=1.15.0",
|
||||||
@@ -22,8 +23,24 @@ dependencies = [
|
|||||||
[dependency-groups]
|
[dependency-groups]
|
||||||
dev = [
|
dev = [
|
||||||
"basedpyright>=1.31.0",
|
"basedpyright>=1.31.0",
|
||||||
|
"jupyterlab>=4.5.6",
|
||||||
"pytest>=8.4.0",
|
"pytest>=8.4.0",
|
||||||
]
|
]
|
||||||
|
detection = [
|
||||||
|
"cvmmap-client",
|
||||||
|
"loguru>=0.7.3",
|
||||||
|
"mmcv",
|
||||||
|
"mmdet>=3.3.0",
|
||||||
|
"mmengine>=0.10.7",
|
||||||
|
"mmpose>=1.3.2",
|
||||||
|
"nats-py>=2.11.0",
|
||||||
|
"pydantic>=2.11.7",
|
||||||
|
"pydantic-settings>=2.0.0",
|
||||||
|
"torch>=2.7.0",
|
||||||
|
"torchvision>=0.22.0",
|
||||||
|
"ultralytics>=8.3.166",
|
||||||
|
"xtcocotools",
|
||||||
|
]
|
||||||
|
|
||||||
[project.scripts]
|
[project.scripts]
|
||||||
pose-tracking-exp = "pose_tracking_exp.cli:main"
|
pose-tracking-exp = "pose_tracking_exp.cli:main"
|
||||||
@@ -33,6 +50,9 @@ packages = ["src/pose_tracking_exp"]
|
|||||||
|
|
||||||
[tool.uv.sources]
|
[tool.uv.sources]
|
||||||
rapid-pose-triangulation = { path = "../RapidPoseTriangulation", editable = true }
|
rapid-pose-triangulation = { path = "../RapidPoseTriangulation", editable = true }
|
||||||
|
cvmmap-client = { path = "../cvmmap-python-client", editable = true }
|
||||||
|
mmcv = { path = "vendor/wheels/mmcv-2.2.0-cp312-cp312-linux_x86_64.whl" }
|
||||||
|
xtcocotools = { path = "vendor/wheels/xtcocotools-1.14.3-cp312-cp312-linux_x86_64.whl" }
|
||||||
|
|
||||||
[tool.pytest.ini_options]
|
[tool.pytest.ini_options]
|
||||||
testpaths = ["tests"]
|
testpaths = ["tests"]
|
||||||
|
|||||||
@@ -1,40 +1,37 @@
|
|||||||
from pose_tracking_exp.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT
|
from pose_tracking_exp.common.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT
|
||||||
from pose_tracking_exp.models import (
|
from pose_tracking_exp.detection.cvmmap_payload import CvmmapPosePayloadCodec, decode_pose_payload
|
||||||
ActiveTrackState,
|
from pose_tracking_exp.schema import (
|
||||||
CameraCalibration,
|
CameraCalibration,
|
||||||
CameraFrame,
|
CameraFrame,
|
||||||
FrameBundle,
|
FrameBundle,
|
||||||
PoseDetection,
|
PoseDetection,
|
||||||
ProposalCluster,
|
|
||||||
ReplaySequence,
|
ReplaySequence,
|
||||||
SceneConfig,
|
SceneConfig,
|
||||||
TentativeTrackState,
|
|
||||||
TrackerConfig,
|
TrackerConfig,
|
||||||
TrackedFrameResult,
|
|
||||||
)
|
)
|
||||||
from pose_tracking_exp.parajumping import decode_pose_payload
|
from pose_tracking_exp.tracking import (
|
||||||
from pose_tracking_exp.replay import load_replay_file, load_scene_file
|
PoseTracker,
|
||||||
from pose_tracking_exp.sync import synchronize_frames
|
load_parquet_replay_dir,
|
||||||
from pose_tracking_exp.tracker import PoseTracker
|
load_replay_file,
|
||||||
|
load_scene_file,
|
||||||
|
synchronize_frames,
|
||||||
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"BODY20_JOINT_NAMES",
|
"BODY20_JOINT_NAMES",
|
||||||
"BODY20_OBSERVATION_COUNT",
|
"BODY20_OBSERVATION_COUNT",
|
||||||
"ActiveTrackState",
|
|
||||||
"CameraCalibration",
|
"CameraCalibration",
|
||||||
"CameraFrame",
|
"CameraFrame",
|
||||||
|
"CvmmapPosePayloadCodec",
|
||||||
"FrameBundle",
|
"FrameBundle",
|
||||||
"PoseDetection",
|
"PoseDetection",
|
||||||
"PoseTracker",
|
"PoseTracker",
|
||||||
"ProposalCluster",
|
"load_parquet_replay_dir",
|
||||||
"ReplaySequence",
|
"ReplaySequence",
|
||||||
"SceneConfig",
|
"SceneConfig",
|
||||||
"TentativeTrackState",
|
|
||||||
"TrackedFrameResult",
|
|
||||||
"TrackerConfig",
|
"TrackerConfig",
|
||||||
"decode_pose_payload",
|
"decode_pose_payload",
|
||||||
"load_replay_file",
|
"load_replay_file",
|
||||||
"load_scene_file",
|
"load_scene_file",
|
||||||
"synchronize_frames",
|
"synchronize_frames",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|||||||
+110
-88
@@ -1,15 +1,12 @@
|
|||||||
import json
|
import json
|
||||||
|
import sys
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Literal, cast
|
|
||||||
|
|
||||||
import click
|
import click
|
||||||
|
|
||||||
from pose_tracking_exp.actualtest import load_actualtest_scene, load_actualtest_segment_bundles
|
from pose_tracking_exp.detection.cvmmap_payload import convert_payload_jsonl_lines
|
||||||
from pose_tracking_exp.models import TrackerConfig
|
from pose_tracking_exp.schema import TrackerConfig
|
||||||
from pose_tracking_exp.parajumping import convert_payload_jsonl_lines
|
from pose_tracking_exp.tracking import PoseTracker, load_replay_file, load_scene_file, synchronize_frames
|
||||||
from pose_tracking_exp.replay import load_replay_file, load_scene_file
|
|
||||||
from pose_tracking_exp.sync import synchronize_frames
|
|
||||||
from pose_tracking_exp.tracker import PoseTracker
|
|
||||||
|
|
||||||
|
|
||||||
@click.group()
|
@click.group()
|
||||||
@@ -17,19 +14,120 @@ def main() -> None:
|
|||||||
"""Offline multiview pose tracking experiment CLI."""
|
"""Offline multiview pose tracking experiment CLI."""
|
||||||
|
|
||||||
|
|
||||||
@main.command("convert-parajumping")
|
@main.command("convert-cvmmap-pose")
|
||||||
@click.argument("input_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
|
@click.argument("input_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
|
||||||
@click.argument("output_path", type=click.Path(path_type=Path, dir_okay=False))
|
@click.argument("output_path", type=click.Path(path_type=Path, dir_okay=False))
|
||||||
def convert_parajumping(input_path: Path, output_path: Path) -> None:
|
def convert_cvmmap_pose(input_path: Path, output_path: Path) -> None:
|
||||||
lines = input_path.read_text(encoding="utf-8").splitlines()
|
lines = input_path.read_text(encoding="utf-8").splitlines()
|
||||||
converted = convert_payload_jsonl_lines(lines)
|
converted = convert_payload_jsonl_lines(lines)
|
||||||
output_path.write_text("\n".join(converted) + ("\n" if converted else ""), encoding="utf-8")
|
output_path.write_text("\n".join(converted) + ("\n" if converted else ""), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
@main.command("run")
|
@main.command("run_detection")
|
||||||
|
@click.argument("inputs", nargs=-1, type=str, required=False)
|
||||||
|
@click.option(
|
||||||
|
"--config",
|
||||||
|
"config_path",
|
||||||
|
type=click.Path(dir_okay=False, path_type=Path),
|
||||||
|
default=None,
|
||||||
|
help="Optional TOML detection runner config file.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--source",
|
||||||
|
"source_kind",
|
||||||
|
type=click.Choice(("cvmmap", "video")),
|
||||||
|
default="cvmmap",
|
||||||
|
show_default=True,
|
||||||
|
help="Frame source implementation to use.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--sink",
|
||||||
|
"sink_kind",
|
||||||
|
type=click.Choice(("auto", "nats", "parquet")),
|
||||||
|
default="auto",
|
||||||
|
show_default=True,
|
||||||
|
help="Output sink. `auto` picks nats for cvmmap and parquet for video.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--output-dir",
|
||||||
|
type=click.Path(file_okay=False, path_type=Path),
|
||||||
|
default=None,
|
||||||
|
help="Required for parquet sink output.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--log-level",
|
||||||
|
default="INFO",
|
||||||
|
show_default=True,
|
||||||
|
type=click.Choice(("DEBUG", "INFO", "WARNING", "ERROR")),
|
||||||
|
)
|
||||||
|
def run_detection(
|
||||||
|
inputs: tuple[str, ...],
|
||||||
|
config_path: Path | None,
|
||||||
|
source_kind: str,
|
||||||
|
sink_kind: str,
|
||||||
|
output_dir: Path | None,
|
||||||
|
log_level: str,
|
||||||
|
) -> None:
|
||||||
|
import anyio
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection import (
|
||||||
|
CvmmapFrameSource,
|
||||||
|
NatsPoseSink,
|
||||||
|
ParquetPoseSink,
|
||||||
|
VideoFrameSource,
|
||||||
|
build_pose_shim,
|
||||||
|
load_detection_runner_config,
|
||||||
|
parse_video_input_specs,
|
||||||
|
resolve_instances,
|
||||||
|
run_detection_runner,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.remove()
|
||||||
|
logger.add(
|
||||||
|
sys.stderr,
|
||||||
|
level=log_level,
|
||||||
|
format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {name}:{function}:{line} | {message}",
|
||||||
|
)
|
||||||
|
config = load_detection_runner_config(config_path)
|
||||||
|
config.validate_runtime_paths()
|
||||||
|
|
||||||
|
if source_kind == "cvmmap":
|
||||||
|
resolved_instances = resolve_instances(inputs, config.instances)
|
||||||
|
config = config.model_copy(update={"instances": resolved_instances})
|
||||||
|
sources = tuple(CvmmapFrameSource(instance) for instance in resolved_instances)
|
||||||
|
else:
|
||||||
|
video_inputs = parse_video_input_specs(inputs)
|
||||||
|
sources = tuple(
|
||||||
|
VideoFrameSource(video_path, source_name=source_name)
|
||||||
|
for source_name, video_path in video_inputs
|
||||||
|
)
|
||||||
|
|
||||||
|
pose_shim = build_pose_shim(config)
|
||||||
|
resolved_sink_kind = sink_kind
|
||||||
|
if resolved_sink_kind == "auto":
|
||||||
|
resolved_sink_kind = "nats" if source_kind == "cvmmap" else "parquet"
|
||||||
|
|
||||||
|
if resolved_sink_kind == "nats":
|
||||||
|
pose_sink = NatsPoseSink(config.nats_host)
|
||||||
|
else:
|
||||||
|
if output_dir is None:
|
||||||
|
raise click.ClickException("--output-dir is required for parquet sink output.")
|
||||||
|
pose_sink = ParquetPoseSink(output_dir)
|
||||||
|
|
||||||
|
anyio.run(
|
||||||
|
run_detection_runner,
|
||||||
|
sources,
|
||||||
|
pose_shim,
|
||||||
|
pose_sink,
|
||||||
|
config,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command("run_tracking")
|
||||||
@click.argument("scene_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
|
@click.argument("scene_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
|
||||||
@click.argument("replay_path", type=click.Path(path_type=Path, exists=True, dir_okay=False))
|
@click.argument("replay_path", type=click.Path(path_type=Path, exists=True))
|
||||||
def run(scene_path: Path, replay_path: Path) -> None:
|
def run_tracking(scene_path: Path, replay_path: Path) -> None:
|
||||||
scene = load_scene_file(scene_path)
|
scene = load_scene_file(scene_path)
|
||||||
replay = load_replay_file(scene_path, replay_path)
|
replay = load_replay_file(scene_path, replay_path)
|
||||||
config = TrackerConfig()
|
config = TrackerConfig()
|
||||||
@@ -52,79 +150,3 @@ def run(scene_path: Path, replay_path: Path) -> None:
|
|||||||
for result in results
|
for result in results
|
||||||
]
|
]
|
||||||
click.echo(json.dumps(payload, indent=2))
|
click.echo(json.dumps(payload, indent=2))
|
||||||
|
|
||||||
|
|
||||||
@main.command("run-actualtest")
|
|
||||||
@click.argument("root_path", type=click.Path(path_type=Path, exists=True, file_okay=False))
|
|
||||||
@click.option("--segment", "segment_name", default="Segment_1", show_default=True)
|
|
||||||
@click.option("--frame-start", default=690, type=int, show_default=True)
|
|
||||||
@click.option("--frame-stop", type=int)
|
|
||||||
@click.option("--max-frames", type=int)
|
|
||||||
@click.option("--mode", type=click.Choice(["single_person", "general"]), default="single_person", show_default=True)
|
|
||||||
@click.option("--proposal-min-score", default=0.5, type=float, show_default=True)
|
|
||||||
@click.option("--tentative-min-age", default=2, type=int, show_default=True)
|
|
||||||
@click.option("--tentative-hits-required", default=2, type=int, show_default=True)
|
|
||||||
@click.option("--tentative-promote-score", default=1.2, type=float, show_default=True)
|
|
||||||
def run_actualtest(
|
|
||||||
root_path: Path,
|
|
||||||
segment_name: str,
|
|
||||||
frame_start: int,
|
|
||||||
frame_stop: int | None,
|
|
||||||
max_frames: int | None,
|
|
||||||
mode: str,
|
|
||||||
proposal_min_score: float,
|
|
||||||
tentative_min_age: int,
|
|
||||||
tentative_hits_required: int,
|
|
||||||
tentative_promote_score: float,
|
|
||||||
) -> None:
|
|
||||||
tracker_mode = cast(Literal["general", "single_person"], mode)
|
|
||||||
scene = load_actualtest_scene(root_path)
|
|
||||||
bundles = load_actualtest_segment_bundles(
|
|
||||||
root_path,
|
|
||||||
segment_name,
|
|
||||||
frame_start=frame_start,
|
|
||||||
frame_stop=frame_stop,
|
|
||||||
max_frames=max_frames,
|
|
||||||
)
|
|
||||||
config = TrackerConfig(
|
|
||||||
mode=tracker_mode,
|
|
||||||
proposal_min_score=proposal_min_score,
|
|
||||||
tentative_min_age=tentative_min_age,
|
|
||||||
tentative_hits_required=tentative_hits_required,
|
|
||||||
tentative_promote_score=tentative_promote_score,
|
|
||||||
)
|
|
||||||
tracker = PoseTracker(scene, config)
|
|
||||||
results = tracker.run(bundles)
|
|
||||||
diagnostics = tracker.diagnostics_snapshot()
|
|
||||||
payload = {
|
|
||||||
"segment": segment_name,
|
|
||||||
"mode": tracker_mode,
|
|
||||||
"bundle_count": len(results),
|
|
||||||
"active_track_frames": sum(1 for result in results if result.active_tracks),
|
|
||||||
"proposal_frames": sum(1 for result in results if result.proposals),
|
|
||||||
"max_active_tracks": max((len(result.active_tracks) for result in results), default=0),
|
|
||||||
"diagnostics": {
|
|
||||||
"match_existing_calls": diagnostics.match_existing_calls,
|
|
||||||
"match_existing_seconds": diagnostics.match_existing_seconds,
|
|
||||||
"proposal_build_calls": diagnostics.proposal_build_calls,
|
|
||||||
"proposal_build_seconds": diagnostics.proposal_build_seconds,
|
|
||||||
"promotions": diagnostics.promotions,
|
|
||||||
"reacquisitions": diagnostics.reacquisitions,
|
|
||||||
"active_updates": diagnostics.active_updates,
|
|
||||||
"seed_initializations": diagnostics.seed_initializations,
|
|
||||||
"nonlinear_refinements": diagnostics.nonlinear_refinements,
|
|
||||||
},
|
|
||||||
"results": [
|
|
||||||
{
|
|
||||||
"bundle_index": result.bundle_index,
|
|
||||||
"source_frame_index": bundle.views[0].frame_index if bundle.views else -1,
|
|
||||||
"timestamp_unix_ns": result.timestamp_unix_ns,
|
|
||||||
"tentative_track_ids": [track.track_id for track in result.tentative_tracks],
|
|
||||||
"active_track_ids": [track.track_id for track in result.active_tracks],
|
|
||||||
"lost_track_ids": [track.track_id for track in result.lost_tracks],
|
|
||||||
"proposal_count": len(result.proposals),
|
|
||||||
}
|
|
||||||
for bundle, result in zip(bundles, results, strict=True)
|
|
||||||
],
|
|
||||||
}
|
|
||||||
click.echo(json.dumps(payload, indent=2))
|
|
||||||
|
|||||||
@@ -0,0 +1,33 @@
|
|||||||
|
from pose_tracking_exp.common.camera_math import project_pose
|
||||||
|
from pose_tracking_exp.common.joints import (
|
||||||
|
BODY20_INDEX_BY_NAME,
|
||||||
|
BODY20_JOINT_NAMES,
|
||||||
|
BODY20_OBSERVATION_COUNT,
|
||||||
|
COCO_BODY17_INDEX_BY_NAME,
|
||||||
|
COCO_BODY17_NAMES,
|
||||||
|
CORE_JOINT_INDICES,
|
||||||
|
CORE_JOINT_NAMES,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.common.normalization import (
|
||||||
|
core_reprojection_distance,
|
||||||
|
infer_bbox_from_keypoints,
|
||||||
|
normalize_coco_body20,
|
||||||
|
normalize_openpose25_body20,
|
||||||
|
normalize_rtmpose_body20,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"BODY20_INDEX_BY_NAME",
|
||||||
|
"BODY20_JOINT_NAMES",
|
||||||
|
"BODY20_OBSERVATION_COUNT",
|
||||||
|
"COCO_BODY17_INDEX_BY_NAME",
|
||||||
|
"COCO_BODY17_NAMES",
|
||||||
|
"CORE_JOINT_INDICES",
|
||||||
|
"CORE_JOINT_NAMES",
|
||||||
|
"core_reprojection_distance",
|
||||||
|
"infer_bbox_from_keypoints",
|
||||||
|
"normalize_coco_body20",
|
||||||
|
"normalize_openpose25_body20",
|
||||||
|
"normalize_rtmpose_body20",
|
||||||
|
"project_pose",
|
||||||
|
]
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
import cv2
|
import cv2
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraCalibration
|
from pose_tracking_exp.common.tensor_types import Pose3D
|
||||||
from pose_tracking_exp.tensor_types import Pose3D
|
from pose_tracking_exp.schema.camera import CameraCalibration
|
||||||
|
|
||||||
|
|
||||||
def project_pose(camera: CameraCalibration, pose3d: Pose3D) -> np.ndarray:
|
def project_pose(camera: CameraCalibration, pose3d: Pose3D) -> np.ndarray:
|
||||||
@@ -0,0 +1,43 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pyarrow as pa
|
||||||
|
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
|
||||||
|
DETECTED_PARQUET_SUFFIX = "_detected.parquet"
|
||||||
|
DETECTION_PARQUET_SCHEMA = pa.schema(
|
||||||
|
[
|
||||||
|
pa.field("frame_index", pa.int64()),
|
||||||
|
pa.field("timestamp_unix_ns", pa.int64()),
|
||||||
|
pa.field("source_width", pa.int32()),
|
||||||
|
pa.field("source_height", pa.int32()),
|
||||||
|
pa.field("boxes", pa.list_(pa.list_(pa.float32()))),
|
||||||
|
pa.field("box_scores", pa.list_(pa.float32())),
|
||||||
|
pa.field("kps", pa.list_(pa.list_(pa.list_(pa.float32())))),
|
||||||
|
pa.field("kps_scores", pa.list_(pa.list_(pa.float32()))),
|
||||||
|
pa.field("keypoint_schema", pa.string()),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def detection_parquet_path(output_dir: Path, source_name: str) -> Path:
|
||||||
|
return output_dir / f"{source_name}{DETECTED_PARQUET_SUFFIX}"
|
||||||
|
|
||||||
|
|
||||||
|
def pose_detections_to_row(detections: PoseDetections) -> dict[str, object]:
|
||||||
|
if detections.box_scores is None:
|
||||||
|
raise ValueError("Parquet sink requires box_scores to be present.")
|
||||||
|
if detections.keypoint_scores is None:
|
||||||
|
raise ValueError("Parquet sink requires keypoint_scores to be present.")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"frame_index": int(detections.frame_index),
|
||||||
|
"timestamp_unix_ns": int(detections.timestamp_unix_ns),
|
||||||
|
"source_width": int(detections.source_size[0]),
|
||||||
|
"source_height": int(detections.source_size[1]),
|
||||||
|
"boxes": detections.boxes_xyxy.astype("float32", copy=False).tolist(),
|
||||||
|
"box_scores": detections.box_scores.astype("float32", copy=False).tolist(),
|
||||||
|
"kps": detections.keypoints_xy.astype("float32", copy=False).tolist(),
|
||||||
|
"kps_scores": detections.keypoint_scores.astype("float32", copy=False).tolist(),
|
||||||
|
"keypoint_schema": detections.keypoint_schema,
|
||||||
|
}
|
||||||
@@ -39,7 +39,7 @@ CORE_JOINT_NAMES: tuple[str, ...] = (
|
|||||||
|
|
||||||
CORE_JOINT_INDICES: tuple[int, ...] = tuple(BODY20_INDEX_BY_NAME[name] for name in CORE_JOINT_NAMES)
|
CORE_JOINT_INDICES: tuple[int, ...] = tuple(BODY20_INDEX_BY_NAME[name] for name in CORE_JOINT_NAMES)
|
||||||
|
|
||||||
RTMPOSE_BODY17_INDEX_BY_NAME = {
|
COCO_BODY17_INDEX_BY_NAME = {
|
||||||
"nose": 0,
|
"nose": 0,
|
||||||
"eye_left": 1,
|
"eye_left": 1,
|
||||||
"eye_right": 2,
|
"eye_right": 2,
|
||||||
@@ -59,5 +59,8 @@ RTMPOSE_BODY17_INDEX_BY_NAME = {
|
|||||||
"ankle_right": 16,
|
"ankle_right": 16,
|
||||||
}
|
}
|
||||||
|
|
||||||
RTMPOSE_BODY17_NAMES = tuple(RTMPOSE_BODY17_INDEX_BY_NAME.keys())
|
COCO_BODY17_NAMES = tuple(COCO_BODY17_INDEX_BY_NAME.keys())
|
||||||
|
|
||||||
|
# RTMPose whole-body uses the standard COCO body-17 ordering for the first 17 joints.
|
||||||
|
RTMPOSE_BODY17_INDEX_BY_NAME = COCO_BODY17_INDEX_BY_NAME
|
||||||
|
RTMPOSE_BODY17_NAMES = COCO_BODY17_NAMES
|
||||||
+62
-9
@@ -1,12 +1,51 @@
|
|||||||
import math
|
import math
|
||||||
from collections.abc import Mapping
|
from collections.abc import Mapping
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
from jaxtyping import jaxtyped
|
from jaxtyping import jaxtyped
|
||||||
|
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME, BODY20_OBSERVATION_COUNT, RTMPOSE_BODY17_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import (
|
||||||
from pose_tracking_exp.tensor_types import FloatArray, JointXY, Pose2D
|
BODY20_INDEX_BY_NAME,
|
||||||
|
BODY20_OBSERVATION_COUNT,
|
||||||
|
COCO_BODY17_INDEX_BY_NAME,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.common.tensor_types import FloatArray, JointXY, Pose2D
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_coco_shape(
|
||||||
|
keypoints_xy: FloatArray,
|
||||||
|
confidences: FloatArray,
|
||||||
|
*,
|
||||||
|
keypoint_schema: Literal["coco17", "coco_wholebody133"] | None,
|
||||||
|
) -> Literal["coco17", "coco_wholebody133"]:
|
||||||
|
if keypoints_xy.ndim != 2 or keypoints_xy.shape[1] != 2:
|
||||||
|
raise ValueError(
|
||||||
|
f"Expected keypoints with shape (N, 2), got {keypoints_xy.shape}."
|
||||||
|
)
|
||||||
|
if confidences.ndim != 1 or confidences.shape[0] != keypoints_xy.shape[0]:
|
||||||
|
raise ValueError(
|
||||||
|
"Expected confidences with shape matching keypoint count. "
|
||||||
|
f"Got {confidences.shape} for {keypoints_xy.shape}."
|
||||||
|
)
|
||||||
|
|
||||||
|
detected_schema: Literal["coco17", "coco_wholebody133"]
|
||||||
|
if keypoints_xy.shape[0] == 17:
|
||||||
|
detected_schema = "coco17"
|
||||||
|
elif keypoints_xy.shape[0] == 133:
|
||||||
|
detected_schema = "coco_wholebody133"
|
||||||
|
else:
|
||||||
|
raise ValueError(
|
||||||
|
"Expected COCO-compatible keypoints with 17 or 133 joints, "
|
||||||
|
f"got {keypoints_xy.shape[0]}."
|
||||||
|
)
|
||||||
|
|
||||||
|
if keypoint_schema is not None and keypoint_schema != detected_schema:
|
||||||
|
raise ValueError(
|
||||||
|
f"Expected {keypoint_schema} keypoints, got shape {keypoints_xy.shape}."
|
||||||
|
)
|
||||||
|
return detected_schema
|
||||||
|
|
||||||
|
|
||||||
def _visible_mean(points: list[tuple[np.ndarray, float]], fallback_xy: np.ndarray) -> tuple[np.ndarray, float]:
|
def _visible_mean(points: list[tuple[np.ndarray, float]], fallback_xy: np.ndarray) -> tuple[np.ndarray, float]:
|
||||||
@@ -68,18 +107,20 @@ def _normalize_named_keypoints(
|
|||||||
|
|
||||||
|
|
||||||
@jaxtyped(typechecker=beartype)
|
@jaxtyped(typechecker=beartype)
|
||||||
def normalize_rtmpose_body20(
|
def normalize_coco_body20(
|
||||||
keypoints_xy: FloatArray,
|
keypoints_xy: FloatArray,
|
||||||
confidences: FloatArray,
|
confidences: FloatArray,
|
||||||
|
*,
|
||||||
|
keypoint_schema: Literal["coco17", "coco_wholebody133"] | None = None,
|
||||||
) -> Pose2D:
|
) -> Pose2D:
|
||||||
if keypoints_xy.shape != (133, 2):
|
_validate_coco_shape(
|
||||||
raise ValueError(f"Expected RTMPose keypoints with shape (133, 2), got {keypoints_xy.shape}.")
|
keypoints_xy,
|
||||||
if confidences.shape != (133,):
|
confidences,
|
||||||
raise ValueError(f"Expected RTMPose confidences with shape (133,), got {confidences.shape}.")
|
keypoint_schema=keypoint_schema,
|
||||||
|
)
|
||||||
keypoint_map = {
|
keypoint_map = {
|
||||||
name: (keypoints_xy[source_index], float(confidences[source_index]))
|
name: (keypoints_xy[source_index], float(confidences[source_index]))
|
||||||
for name, source_index in RTMPOSE_BODY17_INDEX_BY_NAME.items()
|
for name, source_index in COCO_BODY17_INDEX_BY_NAME.items()
|
||||||
}
|
}
|
||||||
return _normalize_named_keypoints(
|
return _normalize_named_keypoints(
|
||||||
keypoint_map,
|
keypoint_map,
|
||||||
@@ -89,6 +130,18 @@ def normalize_rtmpose_body20(
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@jaxtyped(typechecker=beartype)
|
||||||
|
def normalize_rtmpose_body20(
|
||||||
|
keypoints_xy: FloatArray,
|
||||||
|
confidences: FloatArray,
|
||||||
|
) -> Pose2D:
|
||||||
|
return normalize_coco_body20(
|
||||||
|
keypoints_xy,
|
||||||
|
confidences,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@jaxtyped(typechecker=beartype)
|
@jaxtyped(typechecker=beartype)
|
||||||
def normalize_openpose25_body20(keypoints: FloatArray) -> Pose2D:
|
def normalize_openpose25_body20(keypoints: FloatArray) -> Pose2D:
|
||||||
if keypoints.shape != (25, 3):
|
if keypoints.shape != (25, 3):
|
||||||
@@ -0,0 +1,58 @@
|
|||||||
|
from pose_tracking_exp.detection.config import (
|
||||||
|
DEFAULT_BACKEND,
|
||||||
|
DetectionRunnerConfig,
|
||||||
|
load_detection_runner_config,
|
||||||
|
resolve_default_pose_config,
|
||||||
|
resolve_instances,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.detection.factory import build_pose_shim
|
||||||
|
from pose_tracking_exp.detection.runner import (
|
||||||
|
SimpleMovingAverage,
|
||||||
|
SourceSlot,
|
||||||
|
run_detection_runner,
|
||||||
|
store_latest_frame,
|
||||||
|
take_pending_batch,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.detection.sinks import NatsPoseSink, ParquetPoseSink
|
||||||
|
from pose_tracking_exp.detection.sources import (
|
||||||
|
CvmmapFrameSource,
|
||||||
|
IteratorFrameSource,
|
||||||
|
VideoFrameSource,
|
||||||
|
parse_video_input_specs,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.detection import BoxDetections, CocoKeypointSchema, PoseBatchRequest, PoseDetections, SourceFrame
|
||||||
|
from pose_tracking_exp.detection.yolo_rtmpose import (
|
||||||
|
WholeBodyPoseEstimator,
|
||||||
|
YoloRtmposeShim,
|
||||||
|
build_yolo_rtmpose_shim,
|
||||||
|
legacy_torch_checkpoint_loading,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"BoxDetections",
|
||||||
|
"CocoKeypointSchema",
|
||||||
|
"CvmmapFrameSource",
|
||||||
|
"DEFAULT_BACKEND",
|
||||||
|
"DetectionRunnerConfig",
|
||||||
|
"IteratorFrameSource",
|
||||||
|
"NatsPoseSink",
|
||||||
|
"ParquetPoseSink",
|
||||||
|
"PoseBatchRequest",
|
||||||
|
"PoseDetections",
|
||||||
|
"SimpleMovingAverage",
|
||||||
|
"SourceFrame",
|
||||||
|
"SourceSlot",
|
||||||
|
"VideoFrameSource",
|
||||||
|
"WholeBodyPoseEstimator",
|
||||||
|
"YoloRtmposeShim",
|
||||||
|
"build_pose_shim",
|
||||||
|
"build_yolo_rtmpose_shim",
|
||||||
|
"legacy_torch_checkpoint_loading",
|
||||||
|
"load_detection_runner_config",
|
||||||
|
"parse_video_input_specs",
|
||||||
|
"resolve_default_pose_config",
|
||||||
|
"resolve_instances",
|
||||||
|
"run_detection_runner",
|
||||||
|
"store_latest_frame",
|
||||||
|
"take_pending_batch",
|
||||||
|
]
|
||||||
@@ -0,0 +1,147 @@
|
|||||||
|
import tomllib
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Literal, cast
|
||||||
|
|
||||||
|
import click
|
||||||
|
from pydantic import (
|
||||||
|
PositiveFloat,
|
||||||
|
PositiveInt,
|
||||||
|
ValidationError,
|
||||||
|
field_validator,
|
||||||
|
model_validator,
|
||||||
|
)
|
||||||
|
from pydantic_settings import (
|
||||||
|
BaseSettings,
|
||||||
|
PydanticBaseSettingsSource,
|
||||||
|
SettingsConfigDict,
|
||||||
|
)
|
||||||
|
|
||||||
|
DEFAULT_BACKEND = "yolo_rtmpose"
|
||||||
|
ENV_PREFIX = "POSE_TRACKING_EXP_DETECTION_"
|
||||||
|
POSE_CONFIG_RELATIVE_PATH = Path(
|
||||||
|
"wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_default_pose_config() -> Path:
|
||||||
|
import mmpose
|
||||||
|
|
||||||
|
module_file = getattr(mmpose, "__file__", None)
|
||||||
|
if module_file is None:
|
||||||
|
raise FileNotFoundError("Could not locate the installed mmpose package.")
|
||||||
|
config_path = (
|
||||||
|
Path(module_file).resolve().parent
|
||||||
|
/ ".mim"
|
||||||
|
/ "configs"
|
||||||
|
/ POSE_CONFIG_RELATIVE_PATH
|
||||||
|
)
|
||||||
|
if not config_path.exists():
|
||||||
|
raise FileNotFoundError(f"Default pose config is missing: {config_path}")
|
||||||
|
return config_path
|
||||||
|
|
||||||
|
|
||||||
|
class DetectionRunnerConfig(BaseSettings):
|
||||||
|
model_config = SettingsConfigDict(
|
||||||
|
env_prefix=ENV_PREFIX,
|
||||||
|
extra="forbid",
|
||||||
|
)
|
||||||
|
|
||||||
|
instances: tuple[str, ...] = ()
|
||||||
|
backend: Literal["yolo_rtmpose"] = DEFAULT_BACKEND
|
||||||
|
device: str = "cuda"
|
||||||
|
nats_host: str = "nats://localhost:4222"
|
||||||
|
yolo_checkpoint: Path
|
||||||
|
yolo_conf_threshold: float = 0.6
|
||||||
|
pose_checkpoint: Path
|
||||||
|
pose_config_path: Path | None = None
|
||||||
|
bbox_area_threshold: PositiveInt = 50 * 50
|
||||||
|
max_batch_frames: PositiveInt = 8
|
||||||
|
max_batch_wait_ms: int = 4
|
||||||
|
slow_frame_budget_seconds: PositiveFloat = 1 / 22
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def settings_customise_sources(
|
||||||
|
cls,
|
||||||
|
settings_cls: type[BaseSettings],
|
||||||
|
init_settings: PydanticBaseSettingsSource,
|
||||||
|
env_settings: PydanticBaseSettingsSource,
|
||||||
|
dotenv_settings: PydanticBaseSettingsSource,
|
||||||
|
file_secret_settings: PydanticBaseSettingsSource,
|
||||||
|
) -> tuple[PydanticBaseSettingsSource, ...]:
|
||||||
|
return (
|
||||||
|
env_settings,
|
||||||
|
init_settings,
|
||||||
|
dotenv_settings,
|
||||||
|
file_secret_settings,
|
||||||
|
)
|
||||||
|
|
||||||
|
@field_validator("instances", mode="before")
|
||||||
|
@classmethod
|
||||||
|
def _parse_instances(cls, value: object) -> object:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return tuple(item.strip() for item in value.split(",") if item.strip())
|
||||||
|
return value
|
||||||
|
|
||||||
|
@field_validator("max_batch_wait_ms")
|
||||||
|
@classmethod
|
||||||
|
def _validate_wait_ms(cls, value: int) -> int:
|
||||||
|
if value < 0:
|
||||||
|
raise ValueError("max_batch_wait_ms must be non-negative.")
|
||||||
|
return value
|
||||||
|
|
||||||
|
@model_validator(mode="after")
|
||||||
|
def _resolve_pose_config(self) -> "DetectionRunnerConfig":
|
||||||
|
if self.pose_config_path is None:
|
||||||
|
self.pose_config_path = resolve_default_pose_config()
|
||||||
|
return self
|
||||||
|
|
||||||
|
def validate_runtime_paths(self) -> None:
|
||||||
|
missing: list[Path] = []
|
||||||
|
for candidate in (
|
||||||
|
self.yolo_checkpoint,
|
||||||
|
self.pose_checkpoint,
|
||||||
|
self.pose_config_path,
|
||||||
|
):
|
||||||
|
if candidate is None:
|
||||||
|
raise FileNotFoundError
|
||||||
|
if not candidate.exists():
|
||||||
|
missing.append(candidate)
|
||||||
|
if missing:
|
||||||
|
formatted = ", ".join(str(path) for path in missing)
|
||||||
|
raise click.ClickException(f"Missing runtime assets: {formatted}")
|
||||||
|
|
||||||
|
|
||||||
|
def load_detection_runner_config(config_path: Path | None) -> DetectionRunnerConfig:
|
||||||
|
config_data: dict[str, object] = {}
|
||||||
|
if config_path is not None:
|
||||||
|
with config_path.open("rb") as handle:
|
||||||
|
parsed = tomllib.load(handle)
|
||||||
|
if not isinstance(parsed, dict):
|
||||||
|
raise click.ClickException("Detection runner config must be a TOML table.")
|
||||||
|
config_data = parsed
|
||||||
|
|
||||||
|
try:
|
||||||
|
# TOML/env values are validated by Pydantic at construction.
|
||||||
|
return DetectionRunnerConfig(**cast(dict[str, Any], config_data))
|
||||||
|
except (ValidationError, ValueError, FileNotFoundError) as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_instances(
|
||||||
|
cli_instances: tuple[str, ...],
|
||||||
|
configured_instances: tuple[str, ...],
|
||||||
|
) -> tuple[str, ...]:
|
||||||
|
selected = cli_instances or configured_instances
|
||||||
|
if not selected:
|
||||||
|
raise click.ClickException(
|
||||||
|
"Provide at least one instance on the command line or via config `instances = [...]`."
|
||||||
|
)
|
||||||
|
|
||||||
|
unique_instances: list[str] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for instance in selected:
|
||||||
|
if instance in seen:
|
||||||
|
raise click.ClickException(f"Duplicate instance requested: {instance}")
|
||||||
|
unique_instances.append(instance)
|
||||||
|
seen.add(instance)
|
||||||
|
return tuple(unique_instances)
|
||||||
@@ -0,0 +1,237 @@
|
|||||||
|
"""cvmmap pose payload helpers.
|
||||||
|
|
||||||
|
The current `.pose` wire format is fixed-width for COCO-WholeBody-133 keypoints.
|
||||||
|
That is a protocol compatibility choice, not a tracker limitation: the tracker
|
||||||
|
normalizer accepts either `coco17` or `coco_wholebody133` because the first
|
||||||
|
17 body joints share the standard COCO ordering.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
|
||||||
|
- https://github.com/jin-s13/COCO-WholeBody
|
||||||
|
"""
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import json
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from beartype import beartype
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.normalization import normalize_coco_body20
|
||||||
|
from pose_tracking_exp.schema import CameraFrame, PoseDetection
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
|
||||||
|
PROTOCOL_HEADER = bytes([0x80]) + b"POSE"
|
||||||
|
COCO_WHOLEBODY_KEYPOINT_COUNT = 133
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class DecodedPosePayload:
|
||||||
|
frame_index: int
|
||||||
|
reference_size: tuple[int, int]
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
detections: tuple[PoseDetection, ...]
|
||||||
|
|
||||||
|
|
||||||
|
class CvmmapPosePayloadCodec:
|
||||||
|
def encode(self, detections: PoseDetections) -> bytes:
|
||||||
|
return encode_pose_payload(detections)
|
||||||
|
|
||||||
|
|
||||||
|
def _read_u8(payload: memoryview, offset: int) -> tuple[int, int]:
|
||||||
|
return int(payload[offset]), offset + 1
|
||||||
|
|
||||||
|
|
||||||
|
def _read_u16_array(payload: memoryview, offset: int, count: int) -> tuple[np.ndarray, int]:
|
||||||
|
size = count * 2
|
||||||
|
array = np.frombuffer(payload[offset : offset + size], dtype="<u2", count=count).astype(np.float64)
|
||||||
|
return array, offset + size
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def decode_pose_payload(payload: bytes) -> DecodedPosePayload:
|
||||||
|
if not payload.startswith(PROTOCOL_HEADER):
|
||||||
|
raise ValueError("Invalid cvmmap pose payload header.")
|
||||||
|
|
||||||
|
view = memoryview(payload)
|
||||||
|
offset = len(PROTOCOL_HEADER)
|
||||||
|
frame_index = int.from_bytes(view[offset : offset + 4], "little")
|
||||||
|
offset += 4
|
||||||
|
reference_size = tuple(int(x) for x in np.frombuffer(view[offset : offset + 4], dtype="<u2", count=2))
|
||||||
|
offset += 4
|
||||||
|
|
||||||
|
num_bbox = int(view[offset])
|
||||||
|
offset += 1
|
||||||
|
bbox_raw, offset = _read_u16_array(view, offset, num_bbox * 4)
|
||||||
|
bboxes = bbox_raw.reshape(num_bbox, 4) if num_bbox > 0 else np.zeros((0, 4), dtype=np.float64)
|
||||||
|
|
||||||
|
num_bbox_conf = int(view[offset])
|
||||||
|
offset += 1
|
||||||
|
bbox_confidence = np.frombuffer(view[offset : offset + num_bbox_conf], dtype=np.uint8, count=num_bbox_conf)
|
||||||
|
offset += num_bbox_conf
|
||||||
|
|
||||||
|
num_keypoints = int(view[offset])
|
||||||
|
offset += 1
|
||||||
|
keypoints_raw, offset = _read_u16_array(
|
||||||
|
view,
|
||||||
|
offset,
|
||||||
|
num_keypoints * COCO_WHOLEBODY_KEYPOINT_COUNT * 2,
|
||||||
|
)
|
||||||
|
keypoints_xy = (
|
||||||
|
keypoints_raw.reshape(num_keypoints, COCO_WHOLEBODY_KEYPOINT_COUNT, 2)
|
||||||
|
if num_keypoints > 0
|
||||||
|
else np.zeros((0, COCO_WHOLEBODY_KEYPOINT_COUNT, 2), dtype=np.float64)
|
||||||
|
)
|
||||||
|
|
||||||
|
num_keypoint_conf = int(view[offset])
|
||||||
|
offset += 1
|
||||||
|
keypoint_confidence_count = num_keypoint_conf * COCO_WHOLEBODY_KEYPOINT_COUNT
|
||||||
|
keypoint_confidence = (
|
||||||
|
np.frombuffer(
|
||||||
|
view[offset : offset + keypoint_confidence_count],
|
||||||
|
dtype=np.uint8,
|
||||||
|
count=keypoint_confidence_count,
|
||||||
|
).astype(np.float64)
|
||||||
|
/ 255.0
|
||||||
|
)
|
||||||
|
offset += keypoint_confidence_count
|
||||||
|
timestamp_unix_ns = int.from_bytes(view[offset : offset + 8], "little")
|
||||||
|
|
||||||
|
if num_keypoint_conf > 0 and num_keypoint_conf != num_keypoints:
|
||||||
|
raise ValueError("Unexpected keypoint confidence set count.")
|
||||||
|
|
||||||
|
detection_items: list[PoseDetection] = []
|
||||||
|
confidences = (
|
||||||
|
keypoint_confidence.reshape(num_keypoints, COCO_WHOLEBODY_KEYPOINT_COUNT)
|
||||||
|
if num_keypoints > 0
|
||||||
|
else np.zeros((0, COCO_WHOLEBODY_KEYPOINT_COUNT), dtype=np.float64)
|
||||||
|
)
|
||||||
|
|
||||||
|
for index in range(num_keypoints):
|
||||||
|
normalized = normalize_coco_body20(
|
||||||
|
keypoints_xy[index],
|
||||||
|
confidences[index],
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
bbox_score = float(bbox_confidence[index] / 255.0) if index < bbox_confidence.shape[0] else 0.0
|
||||||
|
bbox = bboxes[index] if index < bboxes.shape[0] else np.zeros(4, dtype=np.float64)
|
||||||
|
detection_items.append(
|
||||||
|
PoseDetection(
|
||||||
|
bbox=np.asarray(bbox, dtype=np.float64),
|
||||||
|
bbox_confidence=bbox_score,
|
||||||
|
keypoints=np.asarray(normalized, dtype=np.float64),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return DecodedPosePayload(
|
||||||
|
frame_index=frame_index,
|
||||||
|
reference_size=(reference_size[0], reference_size[1]),
|
||||||
|
timestamp_unix_ns=timestamp_unix_ns,
|
||||||
|
detections=tuple(detection_items),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def encode_pose_payload(detections: PoseDetections) -> bytes:
|
||||||
|
detections.validate()
|
||||||
|
if detections.keypoint_schema != "coco_wholebody133":
|
||||||
|
raise ValueError(
|
||||||
|
"The cvmmap `.pose` payload currently requires `coco_wholebody133` keypoints."
|
||||||
|
)
|
||||||
|
|
||||||
|
frame_index_bytes = int(detections.frame_index).to_bytes(4, "little")
|
||||||
|
reference_size_bytes = np.asarray(detections.source_size, dtype=np.dtype("<u2")).tobytes()
|
||||||
|
|
||||||
|
num_bbox = int(detections.boxes_xyxy.shape[0])
|
||||||
|
num_bbox_bytes = num_bbox.to_bytes(1, "little")
|
||||||
|
bbox_bytes = np.ascontiguousarray(
|
||||||
|
detections.boxes_xyxy.astype(np.uint16),
|
||||||
|
dtype=np.dtype("<u2"),
|
||||||
|
).tobytes()
|
||||||
|
|
||||||
|
num_bbox_confidence_bytes = bytes([0])
|
||||||
|
bbox_confidence_bytes = bytes()
|
||||||
|
if detections.box_scores is not None:
|
||||||
|
num_bbox_confidence_bytes = int(detections.box_scores.shape[0]).to_bytes(1, "little")
|
||||||
|
bbox_confidence_bytes = np.ascontiguousarray(
|
||||||
|
np.clip(detections.box_scores * np.iinfo(np.uint8).max, 0, 255).astype(np.uint8),
|
||||||
|
dtype=np.dtype("<u1"),
|
||||||
|
).tobytes()
|
||||||
|
|
||||||
|
num_keypoints = int(detections.keypoints_xy.shape[0])
|
||||||
|
num_keypoints_bytes = num_keypoints.to_bytes(1, "little")
|
||||||
|
keypoints_bytes = np.ascontiguousarray(
|
||||||
|
detections.keypoints_xy.astype(np.uint16),
|
||||||
|
dtype=np.dtype("<u2"),
|
||||||
|
).tobytes()
|
||||||
|
|
||||||
|
num_keypoint_confidence_bytes = bytes([0])
|
||||||
|
keypoint_confidence_bytes = bytes()
|
||||||
|
if detections.keypoint_scores is not None:
|
||||||
|
num_keypoint_confidence_bytes = int(detections.keypoint_scores.shape[0]).to_bytes(1, "little")
|
||||||
|
keypoint_confidence_bytes = np.ascontiguousarray(
|
||||||
|
np.clip(detections.keypoint_scores * np.iinfo(np.uint8).max, 0, 255).astype(np.uint8),
|
||||||
|
dtype=np.dtype("<u1"),
|
||||||
|
).tobytes()
|
||||||
|
|
||||||
|
timestamp_unix_ns_bytes = int(detections.timestamp_unix_ns).to_bytes(8, "little")
|
||||||
|
return (
|
||||||
|
PROTOCOL_HEADER
|
||||||
|
+ frame_index_bytes
|
||||||
|
+ reference_size_bytes
|
||||||
|
+ num_bbox_bytes
|
||||||
|
+ bbox_bytes
|
||||||
|
+ num_bbox_confidence_bytes
|
||||||
|
+ bbox_confidence_bytes
|
||||||
|
+ num_keypoints_bytes
|
||||||
|
+ keypoints_bytes
|
||||||
|
+ num_keypoint_confidence_bytes
|
||||||
|
+ keypoint_confidence_bytes
|
||||||
|
+ timestamp_unix_ns_bytes
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def frame_from_payload(camera_name: str, payload: bytes) -> CameraFrame:
|
||||||
|
decoded = decode_pose_payload(payload)
|
||||||
|
return CameraFrame(
|
||||||
|
camera_name=camera_name,
|
||||||
|
frame_index=decoded.frame_index,
|
||||||
|
timestamp_unix_ns=decoded.timestamp_unix_ns,
|
||||||
|
detections=decoded.detections,
|
||||||
|
source_size=decoded.reference_size,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def convert_payload_record(record: dict[str, object]) -> dict[str, object]:
|
||||||
|
camera_name = str(record["camera"])
|
||||||
|
payload_b64 = str(record["payload_b64"])
|
||||||
|
frame = frame_from_payload(camera_name, base64.b64decode(payload_b64))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"camera": frame.camera_name,
|
||||||
|
"frame_index": frame.frame_index,
|
||||||
|
"timestamp_unix_ns": frame.timestamp_unix_ns,
|
||||||
|
"source_size": list(frame.source_size),
|
||||||
|
"detections": [
|
||||||
|
{
|
||||||
|
"bbox": detection.bbox.tolist(),
|
||||||
|
"bbox_confidence": detection.bbox_confidence,
|
||||||
|
"keypoints": detection.keypoints.tolist(),
|
||||||
|
}
|
||||||
|
for detection in frame.detections
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def convert_payload_jsonl_lines(lines: list[str]) -> list[str]:
|
||||||
|
output_lines: list[str] = []
|
||||||
|
for line in lines:
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
record = json.loads(line)
|
||||||
|
converted = convert_payload_record(record)
|
||||||
|
output_lines.append(json.dumps(converted))
|
||||||
|
return output_lines
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
from pose_tracking_exp.detection.sources.cvmmap import CvmmapFrameSource
|
||||||
|
|
||||||
|
__all__ = ["CvmmapFrameSource"]
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
import click
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection.config import DEFAULT_BACKEND, DetectionRunnerConfig
|
||||||
|
from pose_tracking_exp.detection.protocols import PoseShim
|
||||||
|
from pose_tracking_exp.detection.yolo_rtmpose import build_yolo_rtmpose_shim
|
||||||
|
|
||||||
|
|
||||||
|
def build_pose_shim(config: DetectionRunnerConfig) -> PoseShim:
|
||||||
|
if config.backend == DEFAULT_BACKEND:
|
||||||
|
if config.pose_config_path is None:
|
||||||
|
raise click.ClickException("pose_config_path must be resolved before building the backend.")
|
||||||
|
return build_yolo_rtmpose_shim(
|
||||||
|
yolo_checkpoint=config.yolo_checkpoint,
|
||||||
|
yolo_conf_threshold=config.yolo_conf_threshold,
|
||||||
|
pose_checkpoint=config.pose_checkpoint,
|
||||||
|
pose_config_path=config.pose_config_path,
|
||||||
|
device=config.device,
|
||||||
|
max_batch_frames=config.max_batch_frames,
|
||||||
|
bbox_area_threshold=config.bbox_area_threshold,
|
||||||
|
)
|
||||||
|
raise click.ClickException(f"Unsupported detection backend: {config.backend}")
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
from pose_tracking_exp.detection.sinks.nats import NatsPoseSink
|
||||||
|
|
||||||
|
__all__ = ["NatsPoseSink"]
|
||||||
@@ -0,0 +1,49 @@
|
|||||||
|
from collections.abc import AsyncIterator, Sequence
|
||||||
|
from typing import Protocol
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.schema.detection import BoxDetections, PoseBatchRequest, PoseDetections, SourceFrame
|
||||||
|
|
||||||
|
|
||||||
|
class FrameSource(Protocol):
|
||||||
|
source_name: str
|
||||||
|
|
||||||
|
def frames(self) -> AsyncIterator[SourceFrame]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
class ObjectDetector(Protocol):
|
||||||
|
def detect_many(
|
||||||
|
self,
|
||||||
|
frames_rgb: Sequence[np.ndarray],
|
||||||
|
*,
|
||||||
|
classes: Sequence[int] | None = None,
|
||||||
|
) -> list[BoxDetections]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
class PoseEstimator(Protocol):
|
||||||
|
def estimate_batch(
|
||||||
|
self,
|
||||||
|
requests: Sequence[PoseBatchRequest],
|
||||||
|
) -> list[tuple[np.ndarray, np.ndarray]]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
class PoseShim(Protocol):
|
||||||
|
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
class PosePayloadCodec(Protocol):
|
||||||
|
def encode(self, detections: PoseDetections) -> bytes:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
class PoseSink(Protocol):
|
||||||
|
async def publish_pose(self, detections: PoseDetections) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
...
|
||||||
@@ -0,0 +1,238 @@
|
|||||||
|
from dataclasses import dataclass
|
||||||
|
from time import perf_counter
|
||||||
|
|
||||||
|
import anyio
|
||||||
|
from anyio.to_thread import run_sync as to_thread_run_sync
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection.config import DetectionRunnerConfig
|
||||||
|
from pose_tracking_exp.detection.protocols import FrameSource, PoseShim, PoseSink
|
||||||
|
from pose_tracking_exp.schema.detection import SourceFrame
|
||||||
|
|
||||||
|
PERFORMANCE_WINDOW = 60
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class PendingFrame:
|
||||||
|
source_name: str
|
||||||
|
frame: SourceFrame
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class SourceSlot:
|
||||||
|
source_name: str
|
||||||
|
pending_frame: PendingFrame | None = None
|
||||||
|
last_seen_frame_index: int | None = None
|
||||||
|
received_frames: int = 0
|
||||||
|
dropped_frames: int = 0
|
||||||
|
processed_frames: int = 0
|
||||||
|
published_frames: int = 0
|
||||||
|
closed: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
def store_latest_frame(slot: SourceSlot, frame: SourceFrame) -> None:
|
||||||
|
slot.received_frames += 1
|
||||||
|
if slot.pending_frame is not None:
|
||||||
|
slot.dropped_frames += 1
|
||||||
|
slot.pending_frame = PendingFrame(source_name=slot.source_name, frame=frame)
|
||||||
|
|
||||||
|
|
||||||
|
def pending_source_count(slots: dict[str, SourceSlot]) -> int:
|
||||||
|
return sum(slot.pending_frame is not None for slot in slots.values())
|
||||||
|
|
||||||
|
|
||||||
|
def take_pending_batch(
|
||||||
|
slots: dict[str, SourceSlot],
|
||||||
|
max_batch_frames: int,
|
||||||
|
) -> list[PendingFrame]:
|
||||||
|
batch: list[PendingFrame] = []
|
||||||
|
for slot in slots.values():
|
||||||
|
if slot.pending_frame is None:
|
||||||
|
continue
|
||||||
|
batch.append(slot.pending_frame)
|
||||||
|
slot.pending_frame = None
|
||||||
|
if len(batch) >= max_batch_frames:
|
||||||
|
break
|
||||||
|
return batch
|
||||||
|
|
||||||
|
|
||||||
|
def all_sources_closed_and_idle(slots: dict[str, SourceSlot]) -> bool:
|
||||||
|
return all(slot.closed and slot.pending_frame is None for slot in slots.values())
|
||||||
|
|
||||||
|
|
||||||
|
class SimpleMovingAverage:
|
||||||
|
def __init__(self, window: int) -> None:
|
||||||
|
self._window = window
|
||||||
|
self._sum = 0.0
|
||||||
|
self._size = 0
|
||||||
|
self._value: float | None = None
|
||||||
|
|
||||||
|
def next(self, value: float) -> float:
|
||||||
|
if self._size < self._window:
|
||||||
|
self._sum += value
|
||||||
|
self._size += 1
|
||||||
|
self._value = self._sum / self._size
|
||||||
|
else:
|
||||||
|
self._sum -= self._sum / self._window
|
||||||
|
self._sum += value
|
||||||
|
self._value = self._sum / self._window
|
||||||
|
return float(self._value)
|
||||||
|
|
||||||
|
def get(self) -> float | None:
|
||||||
|
return self._value
|
||||||
|
|
||||||
|
|
||||||
|
async def run_detection_runner(
|
||||||
|
sources: tuple[FrameSource, ...],
|
||||||
|
pose_shim: PoseShim,
|
||||||
|
pose_sink: PoseSink,
|
||||||
|
config: DetectionRunnerConfig,
|
||||||
|
) -> None:
|
||||||
|
performance_sma = SimpleMovingAverage(PERFORMANCE_WINDOW)
|
||||||
|
batch_size_sma = SimpleMovingAverage(PERFORMANCE_WINDOW)
|
||||||
|
scheduler_condition = anyio.Condition()
|
||||||
|
slots = {
|
||||||
|
source.source_name: SourceSlot(source_name=source.source_name) for source in sources
|
||||||
|
}
|
||||||
|
inference_limiter = anyio.CapacityLimiter(1)
|
||||||
|
|
||||||
|
async def ingest_loop(source: FrameSource) -> None:
|
||||||
|
logger.info(
|
||||||
|
"[{}] source initialized; waiting for first frame metadata",
|
||||||
|
source.source_name,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
async for frame in source.frames():
|
||||||
|
should_log_init = False
|
||||||
|
previous_frame_index: int | None = None
|
||||||
|
|
||||||
|
async with scheduler_condition:
|
||||||
|
slot = slots[source.source_name]
|
||||||
|
previous_frame_index = slot.last_seen_frame_index
|
||||||
|
should_log_init = previous_frame_index is None
|
||||||
|
slot.last_seen_frame_index = frame.frame_index
|
||||||
|
store_latest_frame(slot, frame)
|
||||||
|
scheduler_condition.notify_all()
|
||||||
|
|
||||||
|
if should_log_init:
|
||||||
|
logger.info(
|
||||||
|
"[{}] initialized with frame shape={}x{} frame_index={}",
|
||||||
|
source.source_name,
|
||||||
|
frame.image_bgr.shape[1],
|
||||||
|
frame.image_bgr.shape[0],
|
||||||
|
frame.frame_index,
|
||||||
|
)
|
||||||
|
elif previous_frame_index is not None and frame.frame_index != previous_frame_index + 1:
|
||||||
|
logger.warning(
|
||||||
|
"[{}] skip frame detected: {} -> {}",
|
||||||
|
source.source_name,
|
||||||
|
previous_frame_index,
|
||||||
|
frame.frame_index,
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
async with scheduler_condition:
|
||||||
|
slots[source.source_name].closed = True
|
||||||
|
scheduler_condition.notify_all()
|
||||||
|
logger.info("[{}] source closed", source.source_name)
|
||||||
|
|
||||||
|
async def scheduler_loop() -> None:
|
||||||
|
while True:
|
||||||
|
async with scheduler_condition:
|
||||||
|
while pending_source_count(slots) == 0:
|
||||||
|
if all_sources_closed_and_idle(slots):
|
||||||
|
return
|
||||||
|
await scheduler_condition.wait()
|
||||||
|
|
||||||
|
if (
|
||||||
|
pending_source_count(slots) < config.max_batch_frames
|
||||||
|
and config.max_batch_wait_ms > 0
|
||||||
|
and not all_sources_closed_and_idle(slots)
|
||||||
|
):
|
||||||
|
with anyio.move_on_after(config.max_batch_wait_ms / 1000):
|
||||||
|
while (
|
||||||
|
pending_source_count(slots) < config.max_batch_frames
|
||||||
|
and not all_sources_closed_and_idle(slots)
|
||||||
|
):
|
||||||
|
await scheduler_condition.wait()
|
||||||
|
|
||||||
|
batch = take_pending_batch(slots, config.max_batch_frames)
|
||||||
|
|
||||||
|
start = perf_counter()
|
||||||
|
pose_infos = await to_thread_run_sync(
|
||||||
|
pose_shim.process_many,
|
||||||
|
[item.frame for item in batch],
|
||||||
|
limiter=inference_limiter,
|
||||||
|
)
|
||||||
|
elapsed = perf_counter() - start
|
||||||
|
average_elapsed = elapsed / len(batch)
|
||||||
|
performance_sma.next(average_elapsed)
|
||||||
|
batch_size_sma.next(float(len(batch)))
|
||||||
|
|
||||||
|
if average_elapsed > config.slow_frame_budget_seconds:
|
||||||
|
logger.warning(
|
||||||
|
"slow batch: size={} total={:.2f}ms avg={:.2f}ms",
|
||||||
|
len(batch),
|
||||||
|
elapsed * 1000,
|
||||||
|
average_elapsed * 1000,
|
||||||
|
)
|
||||||
|
|
||||||
|
for pending_frame, pose_info in zip(batch, pose_infos, strict=True):
|
||||||
|
slot = slots[pending_frame.source_name]
|
||||||
|
slot.processed_frames += 1
|
||||||
|
await pose_sink.publish_pose(pose_info)
|
||||||
|
slot.published_frames += 1
|
||||||
|
if pose_info.boxes_xyxy.shape[0] == 0:
|
||||||
|
logger.debug(
|
||||||
|
"[{}:{}] no detections",
|
||||||
|
pending_frame.source_name,
|
||||||
|
pending_frame.frame.frame_index,
|
||||||
|
)
|
||||||
|
|
||||||
|
async def log_performance() -> None:
|
||||||
|
while True:
|
||||||
|
await anyio.sleep(5)
|
||||||
|
|
||||||
|
async with scheduler_condition:
|
||||||
|
if all_sources_closed_and_idle(slots):
|
||||||
|
return
|
||||||
|
slot_snapshot = {
|
||||||
|
source_name: (
|
||||||
|
slot.received_frames,
|
||||||
|
slot.dropped_frames,
|
||||||
|
slot.processed_frames,
|
||||||
|
slot.published_frames,
|
||||||
|
)
|
||||||
|
for source_name, slot in slots.items()
|
||||||
|
}
|
||||||
|
|
||||||
|
per_source = " ".join(
|
||||||
|
(
|
||||||
|
f"[{source_name}]"
|
||||||
|
f" recv={received}"
|
||||||
|
f" drop={dropped}"
|
||||||
|
f" proc={processed}"
|
||||||
|
f" pub={published}"
|
||||||
|
)
|
||||||
|
for source_name, (received, dropped, processed, published) in slot_snapshot.items()
|
||||||
|
)
|
||||||
|
if value := performance_sma.get():
|
||||||
|
batch_size = batch_size_sma.get() or 1.0
|
||||||
|
logger.info(
|
||||||
|
"{:.2f}it/s ({:.2f}ms/frame) batch={:.2f} {}",
|
||||||
|
1 / value,
|
||||||
|
value * 1000,
|
||||||
|
batch_size,
|
||||||
|
per_source,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
logger.info("warming up {}", per_source)
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with anyio.create_task_group() as task_group:
|
||||||
|
for source in sources:
|
||||||
|
task_group.start_soon(ingest_loop, source)
|
||||||
|
task_group.start_soon(log_performance)
|
||||||
|
await scheduler_loop()
|
||||||
|
task_group.cancel_scope.cancel()
|
||||||
|
finally:
|
||||||
|
await pose_sink.aclose()
|
||||||
@@ -0,0 +1,7 @@
|
|||||||
|
from pose_tracking_exp.detection.sinks.nats import NatsPoseSink
|
||||||
|
from pose_tracking_exp.detection.sinks.parquet import ParquetPoseSink
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"NatsPoseSink",
|
||||||
|
"ParquetPoseSink",
|
||||||
|
]
|
||||||
@@ -0,0 +1,29 @@
|
|||||||
|
from pose_tracking_exp.detection.cvmmap_payload import CvmmapPosePayloadCodec
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
|
||||||
|
|
||||||
|
class NatsPoseSink:
|
||||||
|
def __init__(self, nats_host: str) -> None:
|
||||||
|
self._nats_host = nats_host
|
||||||
|
self._client = None
|
||||||
|
self._codec = CvmmapPosePayloadCodec()
|
||||||
|
|
||||||
|
async def _client_or_connect(self):
|
||||||
|
if self._client is None:
|
||||||
|
from nats.aio.client import Client as NatsClient
|
||||||
|
|
||||||
|
client = NatsClient()
|
||||||
|
await client.connect(servers=[self._nats_host])
|
||||||
|
self._client = client
|
||||||
|
return self._client
|
||||||
|
|
||||||
|
async def publish_pose(self, detections: PoseDetections) -> None:
|
||||||
|
client = await self._client_or_connect()
|
||||||
|
payload = self._codec.encode(detections)
|
||||||
|
await client.publish(f"{detections.source_name}.pose", payload)
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
if self._client is None:
|
||||||
|
return
|
||||||
|
await self._client.drain()
|
||||||
|
self._client = None
|
||||||
@@ -0,0 +1,51 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pyarrow as pa
|
||||||
|
import pyarrow.parquet as pq
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.detection_parquet import (
|
||||||
|
DETECTION_PARQUET_SCHEMA,
|
||||||
|
detection_parquet_path,
|
||||||
|
pose_detections_to_row,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
|
||||||
|
|
||||||
|
class ParquetPoseSink:
|
||||||
|
def __init__(self, output_dir: Path, *, flush_rows: int = 64) -> None:
|
||||||
|
self._output_dir = output_dir
|
||||||
|
self._flush_rows = flush_rows
|
||||||
|
self._buffers: dict[str, list[dict[str, object]]] = {}
|
||||||
|
self._writers: dict[str, pq.ParquetWriter] = {}
|
||||||
|
self._output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
def _writer_for(self, source_name: str) -> pq.ParquetWriter:
|
||||||
|
writer = self._writers.get(source_name)
|
||||||
|
if writer is not None:
|
||||||
|
return writer
|
||||||
|
|
||||||
|
path = detection_parquet_path(self._output_dir, source_name)
|
||||||
|
writer = pq.ParquetWriter(path, DETECTION_PARQUET_SCHEMA)
|
||||||
|
self._writers[source_name] = writer
|
||||||
|
return writer
|
||||||
|
|
||||||
|
def _flush_source(self, source_name: str) -> None:
|
||||||
|
rows = self._buffers.get(source_name)
|
||||||
|
if not rows:
|
||||||
|
return
|
||||||
|
table = pa.Table.from_pylist(rows, schema=DETECTION_PARQUET_SCHEMA)
|
||||||
|
self._writer_for(source_name).write_table(table)
|
||||||
|
rows.clear()
|
||||||
|
|
||||||
|
async def publish_pose(self, detections: PoseDetections) -> None:
|
||||||
|
rows = self._buffers.setdefault(detections.source_name, [])
|
||||||
|
rows.append(pose_detections_to_row(detections))
|
||||||
|
if len(rows) >= self._flush_rows:
|
||||||
|
self._flush_source(detections.source_name)
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
for source_name in tuple(self._buffers):
|
||||||
|
self._flush_source(source_name)
|
||||||
|
for writer in self._writers.values():
|
||||||
|
writer.close()
|
||||||
|
self._writers.clear()
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
from pose_tracking_exp.detection.sources.adapters import IteratorFrameSource
|
||||||
|
from pose_tracking_exp.detection.sources.cvmmap import CvmmapFrameSource
|
||||||
|
from pose_tracking_exp.detection.sources.video import VideoFrameSource, parse_video_input_specs
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"CvmmapFrameSource",
|
||||||
|
"IteratorFrameSource",
|
||||||
|
"VideoFrameSource",
|
||||||
|
"parse_video_input_specs",
|
||||||
|
]
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
from collections.abc import AsyncIterator, Callable, Iterator
|
||||||
|
from typing import Protocol
|
||||||
|
|
||||||
|
from anyio.to_thread import run_sync as to_thread_run_sync
|
||||||
|
|
||||||
|
from pose_tracking_exp.schema.detection import SourceFrame
|
||||||
|
|
||||||
|
|
||||||
|
class BlockingFrameProducer(Protocol):
|
||||||
|
source_name: str
|
||||||
|
|
||||||
|
def iter_frames(self) -> Iterator[SourceFrame]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
def _next_or_none(iterator: Iterator[SourceFrame]) -> SourceFrame | None:
|
||||||
|
return next(iterator, None)
|
||||||
|
|
||||||
|
|
||||||
|
class IteratorFrameSource:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
source_name: str,
|
||||||
|
iterator_factory: Callable[[], Iterator[SourceFrame]],
|
||||||
|
) -> None:
|
||||||
|
self.source_name = source_name
|
||||||
|
self._iterator_factory = iterator_factory
|
||||||
|
|
||||||
|
async def frames(self) -> AsyncIterator[SourceFrame]:
|
||||||
|
iterator = self._iterator_factory()
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
frame = await to_thread_run_sync(_next_or_none, iterator)
|
||||||
|
if frame is None:
|
||||||
|
return
|
||||||
|
yield frame
|
||||||
|
finally:
|
||||||
|
close = getattr(iterator, "close", None)
|
||||||
|
if callable(close):
|
||||||
|
await to_thread_run_sync(close)
|
||||||
|
|
||||||
|
|
||||||
|
def wrap_blocking_source(producer: BlockingFrameProducer) -> IteratorFrameSource:
|
||||||
|
return IteratorFrameSource(
|
||||||
|
source_name=producer.source_name,
|
||||||
|
iterator_factory=producer.iter_frames,
|
||||||
|
)
|
||||||
@@ -0,0 +1,22 @@
|
|||||||
|
from collections.abc import AsyncIterator
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.schema.detection import SourceFrame
|
||||||
|
|
||||||
|
|
||||||
|
class CvmmapFrameSource:
|
||||||
|
def __init__(self, source_name: str) -> None:
|
||||||
|
self.source_name = source_name
|
||||||
|
|
||||||
|
async def frames(self) -> AsyncIterator[SourceFrame]:
|
||||||
|
from cvmmap import CvMmapClient
|
||||||
|
|
||||||
|
client = CvMmapClient(self.source_name)
|
||||||
|
async for frame, meta in client:
|
||||||
|
yield SourceFrame(
|
||||||
|
source_name=self.source_name,
|
||||||
|
image_bgr=np.array(frame, copy=True),
|
||||||
|
frame_index=meta.frame_count,
|
||||||
|
timestamp_unix_ns=meta.timestamp_ns,
|
||||||
|
)
|
||||||
@@ -0,0 +1,83 @@
|
|||||||
|
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import click
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection.sources.adapters import wrap_blocking_source
|
||||||
|
from pose_tracking_exp.schema.detection import SourceFrame
|
||||||
|
|
||||||
|
_DEFAULT_VIDEO_FPS = 30.0
|
||||||
|
|
||||||
|
|
||||||
|
def parse_video_input_specs(specs: Sequence[str]) -> tuple[tuple[str, Path], ...]:
|
||||||
|
inputs: list[tuple[str, Path]] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for spec in specs:
|
||||||
|
source_name, separator, raw_path = spec.partition("=")
|
||||||
|
if separator == "" or not source_name or not raw_path:
|
||||||
|
raise click.ClickException(
|
||||||
|
f"Video input must be in source=path form, got: {spec!r}"
|
||||||
|
)
|
||||||
|
if source_name in seen:
|
||||||
|
raise click.ClickException(f"Duplicate video source requested: {source_name}")
|
||||||
|
path = Path(raw_path).expanduser().resolve()
|
||||||
|
if not path.exists():
|
||||||
|
raise click.ClickException(f"Missing video input: {path}")
|
||||||
|
inputs.append((source_name, path))
|
||||||
|
seen.add(source_name)
|
||||||
|
if not inputs:
|
||||||
|
raise click.ClickException("Provide at least one --input source=path entry.")
|
||||||
|
return tuple(inputs)
|
||||||
|
|
||||||
|
|
||||||
|
class VideoFrameSource:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
video_path: Path,
|
||||||
|
*,
|
||||||
|
source_name: str | None = None,
|
||||||
|
default_fps: float = _DEFAULT_VIDEO_FPS,
|
||||||
|
) -> None:
|
||||||
|
self.video_path = video_path
|
||||||
|
self.source_name = source_name or video_path.stem
|
||||||
|
self._default_fps = default_fps
|
||||||
|
self._adapter = wrap_blocking_source(self)
|
||||||
|
|
||||||
|
async def frames(self) -> AsyncIterator[SourceFrame]:
|
||||||
|
async for frame in self._adapter.frames():
|
||||||
|
yield frame
|
||||||
|
|
||||||
|
def iter_frames(self) -> Iterator[SourceFrame]:
|
||||||
|
capture = cv2.VideoCapture(str(self.video_path))
|
||||||
|
if not capture.isOpened():
|
||||||
|
capture.release()
|
||||||
|
raise click.ClickException(f"Could not open video input: {self.video_path}")
|
||||||
|
|
||||||
|
fps = float(capture.get(cv2.CAP_PROP_FPS))
|
||||||
|
if not np.isfinite(fps) or fps <= 0:
|
||||||
|
fps = self._default_fps
|
||||||
|
|
||||||
|
frame_index = 0
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
success, frame = capture.read()
|
||||||
|
if not success or frame is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
pos_msec = float(capture.get(cv2.CAP_PROP_POS_MSEC))
|
||||||
|
if np.isfinite(pos_msec) and (pos_msec > 0.0 or frame_index == 0):
|
||||||
|
timestamp_unix_ns = int(round(pos_msec * 1_000_000.0))
|
||||||
|
else:
|
||||||
|
timestamp_unix_ns = int(round((frame_index / fps) * 1_000_000_000.0))
|
||||||
|
|
||||||
|
yield SourceFrame(
|
||||||
|
source_name=self.source_name,
|
||||||
|
image_bgr=np.ascontiguousarray(frame),
|
||||||
|
frame_index=frame_index,
|
||||||
|
timestamp_unix_ns=timestamp_unix_ns,
|
||||||
|
)
|
||||||
|
frame_index += 1
|
||||||
|
finally:
|
||||||
|
capture.release()
|
||||||
@@ -0,0 +1,263 @@
|
|||||||
|
from contextlib import contextmanager
|
||||||
|
from collections.abc import Sequence
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, cast
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection.protocols import ObjectDetector, PoseEstimator
|
||||||
|
from pose_tracking_exp.schema.detection import BoxDetections, PoseBatchRequest, PoseDetections, SourceFrame
|
||||||
|
|
||||||
|
COCO_PERSON_CLASS_ID = 0
|
||||||
|
|
||||||
|
|
||||||
|
class YoloObjectDetector:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
checkpoint: Path,
|
||||||
|
*,
|
||||||
|
device: str,
|
||||||
|
conf_threshold: float,
|
||||||
|
max_batch_frames: int,
|
||||||
|
) -> None:
|
||||||
|
import ultralytics
|
||||||
|
|
||||||
|
yolo_ctor = getattr(ultralytics, "YOLO")
|
||||||
|
self._model: Any = yolo_ctor(str(checkpoint))
|
||||||
|
self._device = device
|
||||||
|
self._conf_threshold = conf_threshold
|
||||||
|
self._max_batch_frames = max_batch_frames
|
||||||
|
|
||||||
|
def detect_many(
|
||||||
|
self,
|
||||||
|
frames_rgb: Sequence[np.ndarray],
|
||||||
|
*,
|
||||||
|
classes: Sequence[int] | None = None,
|
||||||
|
) -> list[BoxDetections]:
|
||||||
|
if not frames_rgb:
|
||||||
|
return []
|
||||||
|
|
||||||
|
frames_list = list(frames_rgb)
|
||||||
|
results = self._model(
|
||||||
|
frames_list,
|
||||||
|
conf=self._conf_threshold,
|
||||||
|
device=self._device,
|
||||||
|
classes=classes,
|
||||||
|
batch=min(self._max_batch_frames, len(frames_list)),
|
||||||
|
verbose=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
detections: list[BoxDetections] = []
|
||||||
|
for frame_rgb, result in zip(frames_list, results, strict=True):
|
||||||
|
boxes = result.boxes
|
||||||
|
if boxes is None:
|
||||||
|
detections.append(
|
||||||
|
BoxDetections(
|
||||||
|
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
|
||||||
|
scores=np.empty((0,), dtype=np.float32),
|
||||||
|
reference_frame_shape=(frame_rgb.shape[0], frame_rgb.shape[1]),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
detections.append(
|
||||||
|
BoxDetections(
|
||||||
|
boxes_xyxy=boxes.xyxy.cpu().numpy(),
|
||||||
|
scores=boxes.conf.cpu().numpy(),
|
||||||
|
reference_frame_shape=(frame_rgb.shape[0], frame_rgb.shape[1]),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return detections
|
||||||
|
|
||||||
|
|
||||||
|
@contextmanager
|
||||||
|
def legacy_torch_checkpoint_loading():
|
||||||
|
import torch
|
||||||
|
|
||||||
|
original_torch_load = torch.load
|
||||||
|
|
||||||
|
def patched_torch_load(*args, **kwargs):
|
||||||
|
kwargs.setdefault("weights_only", False)
|
||||||
|
return original_torch_load(*args, **kwargs)
|
||||||
|
|
||||||
|
torch.load = patched_torch_load
|
||||||
|
try:
|
||||||
|
yield
|
||||||
|
finally:
|
||||||
|
torch.load = original_torch_load
|
||||||
|
|
||||||
|
|
||||||
|
class WholeBodyPoseEstimator:
|
||||||
|
def __init__(self, config_path: Path, checkpoint_path: Path, *, device: str) -> None:
|
||||||
|
from mmengine.dataset import Compose, pseudo_collate
|
||||||
|
from mmengine.registry import init_default_scope
|
||||||
|
from mmpose.apis import init_model
|
||||||
|
|
||||||
|
self._compose = Compose
|
||||||
|
self._pseudo_collate = pseudo_collate
|
||||||
|
self._init_default_scope = init_default_scope
|
||||||
|
|
||||||
|
with legacy_torch_checkpoint_loading():
|
||||||
|
self._model: Any = init_model(str(config_path), str(checkpoint_path), device=device)
|
||||||
|
|
||||||
|
model_cfg = cast(Any, self._model.cfg)
|
||||||
|
self._scope = cast(str | None, model_cfg.get("default_scope", "mmpose"))
|
||||||
|
self._pipeline = self._compose(cast(Any, model_cfg.test_dataloader.dataset.pipeline))
|
||||||
|
|
||||||
|
def estimate_batch(
|
||||||
|
self,
|
||||||
|
requests: Sequence[PoseBatchRequest],
|
||||||
|
) -> list[tuple[np.ndarray, np.ndarray]]:
|
||||||
|
import torch
|
||||||
|
|
||||||
|
if not requests:
|
||||||
|
return []
|
||||||
|
|
||||||
|
if self._scope is not None:
|
||||||
|
self._init_default_scope(self._scope)
|
||||||
|
|
||||||
|
torch_module = cast(Any, torch)
|
||||||
|
data_list = []
|
||||||
|
detection_counts: list[int] = []
|
||||||
|
for request in requests:
|
||||||
|
boxes = np.asarray(request.boxes_xyxy, dtype=np.float32)
|
||||||
|
detections = int(boxes.shape[0])
|
||||||
|
detection_counts.append(detections)
|
||||||
|
for bbox in boxes:
|
||||||
|
data_info = {
|
||||||
|
"img": request.image_rgb,
|
||||||
|
"bbox": bbox[None],
|
||||||
|
"bbox_score": np.ones(1, dtype=np.float32),
|
||||||
|
}
|
||||||
|
data_info.update(cast(Any, self._model.dataset_meta))
|
||||||
|
data_list.append(self._pipeline(data_info))
|
||||||
|
|
||||||
|
samples = []
|
||||||
|
if data_list:
|
||||||
|
batch = self._pseudo_collate(data_list)
|
||||||
|
with torch_module.no_grad():
|
||||||
|
samples = self._model.test_step(batch)
|
||||||
|
|
||||||
|
outputs: list[tuple[np.ndarray, np.ndarray]] = []
|
||||||
|
offset = 0
|
||||||
|
for detections in detection_counts:
|
||||||
|
keypoints = np.zeros((detections, 133, 2), dtype=np.float32)
|
||||||
|
scores = np.zeros((detections, 133), dtype=np.float32)
|
||||||
|
for index in range(detections):
|
||||||
|
pred_instances = samples[offset + index].pred_instances
|
||||||
|
try:
|
||||||
|
keypoints[index] = np.asarray(pred_instances.keypoints[0], dtype=np.float32)
|
||||||
|
scores[index] = np.asarray(
|
||||||
|
pred_instances.keypoint_scores[0],
|
||||||
|
dtype=np.float32,
|
||||||
|
)
|
||||||
|
except IndexError:
|
||||||
|
continue
|
||||||
|
outputs.append((keypoints, scores))
|
||||||
|
offset += detections
|
||||||
|
return outputs
|
||||||
|
|
||||||
|
|
||||||
|
class YoloRtmposeShim:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
object_detector: ObjectDetector,
|
||||||
|
pose_estimator: PoseEstimator,
|
||||||
|
*,
|
||||||
|
bbox_area_threshold: int,
|
||||||
|
) -> None:
|
||||||
|
self._object_detector = object_detector
|
||||||
|
self._pose_estimator = pose_estimator
|
||||||
|
self._bbox_area_threshold = bbox_area_threshold
|
||||||
|
|
||||||
|
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
|
||||||
|
if not frames:
|
||||||
|
return []
|
||||||
|
|
||||||
|
frames_rgb = [
|
||||||
|
cv2.cvtColor(frame.image_bgr, cv2.COLOR_BGR2RGB)
|
||||||
|
for frame in frames
|
||||||
|
]
|
||||||
|
detections = self._object_detector.detect_many(
|
||||||
|
frames_rgb,
|
||||||
|
classes=[COCO_PERSON_CLASS_ID],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = [
|
||||||
|
PoseDetections(
|
||||||
|
source_name=frame.source_name,
|
||||||
|
frame_index=frame.frame_index,
|
||||||
|
source_size=(frame.image_bgr.shape[1], frame.image_bgr.shape[0]),
|
||||||
|
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
|
||||||
|
box_scores=np.empty((0,), dtype=np.float32),
|
||||||
|
keypoints_xy=np.empty((0, 133, 2), dtype=np.float32),
|
||||||
|
keypoint_scores=np.empty((0, 133), dtype=np.float32),
|
||||||
|
timestamp_unix_ns=frame.timestamp_unix_ns,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
for frame in frames
|
||||||
|
]
|
||||||
|
pose_requests: list[PoseBatchRequest] = []
|
||||||
|
detection_mapping: list[tuple[int, BoxDetections]] = []
|
||||||
|
for index, (frame, frame_rgb, detection_result) in enumerate(
|
||||||
|
zip(frames, frames_rgb, detections, strict=True)
|
||||||
|
):
|
||||||
|
filtered_result = detection_result.filter_by_area(self._bbox_area_threshold)
|
||||||
|
if filtered_result.boxes_num == 0:
|
||||||
|
continue
|
||||||
|
pose_requests.append(
|
||||||
|
PoseBatchRequest(
|
||||||
|
image_rgb=frame_rgb,
|
||||||
|
boxes_xyxy=filtered_result.boxes_xyxy,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
detection_mapping.append((index, filtered_result))
|
||||||
|
|
||||||
|
pose_outputs = self._pose_estimator.estimate_batch(pose_requests)
|
||||||
|
for (frame_index, detection_result), (keypoints, keypoint_scores) in zip(
|
||||||
|
detection_mapping,
|
||||||
|
pose_outputs,
|
||||||
|
strict=True,
|
||||||
|
):
|
||||||
|
source_frame = frames[frame_index]
|
||||||
|
results[frame_index] = PoseDetections(
|
||||||
|
source_name=source_frame.source_name,
|
||||||
|
frame_index=source_frame.frame_index,
|
||||||
|
source_size=detection_result.reference_size,
|
||||||
|
boxes_xyxy=detection_result.boxes_xyxy,
|
||||||
|
box_scores=detection_result.scores,
|
||||||
|
keypoints_xy=keypoints,
|
||||||
|
keypoint_scores=keypoint_scores,
|
||||||
|
timestamp_unix_ns=source_frame.timestamp_unix_ns,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def build_yolo_rtmpose_shim(
|
||||||
|
*,
|
||||||
|
yolo_checkpoint: Path,
|
||||||
|
yolo_conf_threshold: float,
|
||||||
|
pose_checkpoint: Path,
|
||||||
|
pose_config_path: Path,
|
||||||
|
device: str,
|
||||||
|
max_batch_frames: int,
|
||||||
|
bbox_area_threshold: int,
|
||||||
|
) -> YoloRtmposeShim:
|
||||||
|
object_detector = YoloObjectDetector(
|
||||||
|
yolo_checkpoint,
|
||||||
|
device=device,
|
||||||
|
conf_threshold=yolo_conf_threshold,
|
||||||
|
max_batch_frames=max_batch_frames,
|
||||||
|
)
|
||||||
|
pose_estimator = WholeBodyPoseEstimator(
|
||||||
|
pose_config_path,
|
||||||
|
pose_checkpoint,
|
||||||
|
device=device,
|
||||||
|
)
|
||||||
|
return YoloRtmposeShim(
|
||||||
|
object_detector,
|
||||||
|
pose_estimator,
|
||||||
|
bbox_area_threshold=bbox_area_threshold,
|
||||||
|
)
|
||||||
@@ -1,224 +0,0 @@
|
|||||||
from dataclasses import dataclass, field
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Literal
|
|
||||||
|
|
||||||
import cv2
|
|
||||||
import numpy as np
|
|
||||||
|
|
||||||
from pose_tracking_exp.tensor_types import Matrix3, Pose2D, Pose3D, Vector3
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class CameraCalibration:
|
|
||||||
name: str
|
|
||||||
width: int
|
|
||||||
height: int
|
|
||||||
K: Matrix3
|
|
||||||
DC: np.ndarray
|
|
||||||
# Canonical in-repo convention: OpenCV world->camera extrinsics.
|
|
||||||
R: Matrix3
|
|
||||||
T: Vector3
|
|
||||||
model: str = "pinhole"
|
|
||||||
rvec: np.ndarray | None = None
|
|
||||||
pose_R: Matrix3 = field(init=False)
|
|
||||||
pose_T: Vector3 = field(init=False)
|
|
||||||
|
|
||||||
def __post_init__(self) -> None:
|
|
||||||
self.K = np.asarray(self.K, dtype=np.float64).reshape(3, 3)
|
|
||||||
self.DC = np.asarray(self.DC, dtype=np.float64).reshape(-1)
|
|
||||||
self.R = np.asarray(self.R, dtype=np.float64).reshape(3, 3)
|
|
||||||
self.T = np.asarray(self.T, dtype=np.float64).reshape(3)
|
|
||||||
if self.rvec is None:
|
|
||||||
rvec, _ = cv2.Rodrigues(self.R)
|
|
||||||
self.rvec = np.asarray(rvec, dtype=np.float64).reshape(3)
|
|
||||||
else:
|
|
||||||
self.rvec = np.asarray(self.rvec, dtype=np.float64).reshape(3)
|
|
||||||
self.pose_R = self.R.T
|
|
||||||
self.pose_T = -(self.pose_R @ self.T)
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def from_opencv_extrinsics(
|
|
||||||
cls,
|
|
||||||
*,
|
|
||||||
name: str,
|
|
||||||
width: int,
|
|
||||||
height: int,
|
|
||||||
K: Matrix3,
|
|
||||||
DC: np.ndarray,
|
|
||||||
R: Matrix3,
|
|
||||||
T: Vector3,
|
|
||||||
model: str = "pinhole",
|
|
||||||
rvec: np.ndarray | None = None,
|
|
||||||
) -> "CameraCalibration":
|
|
||||||
return cls(
|
|
||||||
name=name,
|
|
||||||
width=width,
|
|
||||||
height=height,
|
|
||||||
K=K,
|
|
||||||
DC=DC,
|
|
||||||
R=R,
|
|
||||||
T=T,
|
|
||||||
model=model,
|
|
||||||
rvec=rvec,
|
|
||||||
)
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def from_rpt_pose(
|
|
||||||
cls,
|
|
||||||
*,
|
|
||||||
name: str,
|
|
||||||
width: int,
|
|
||||||
height: int,
|
|
||||||
K: Matrix3,
|
|
||||||
DC: np.ndarray,
|
|
||||||
R: Matrix3,
|
|
||||||
T: Vector3,
|
|
||||||
model: str = "pinhole",
|
|
||||||
) -> "CameraCalibration":
|
|
||||||
pose_R = np.asarray(R, dtype=np.float64).reshape(3, 3)
|
|
||||||
pose_T = np.asarray(T, dtype=np.float64).reshape(3)
|
|
||||||
rotation = pose_R.T
|
|
||||||
translation = -(rotation @ pose_T)
|
|
||||||
rvec, _ = cv2.Rodrigues(rotation)
|
|
||||||
return cls(
|
|
||||||
name=name,
|
|
||||||
width=width,
|
|
||||||
height=height,
|
|
||||||
K=K,
|
|
||||||
DC=DC,
|
|
||||||
R=rotation,
|
|
||||||
T=translation,
|
|
||||||
model=model,
|
|
||||||
rvec=np.asarray(rvec, dtype=np.float64).reshape(3),
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class SceneConfig:
|
|
||||||
room_size: Vector3
|
|
||||||
room_center: Vector3
|
|
||||||
cameras: tuple[CameraCalibration, ...]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class PoseDetection:
|
|
||||||
bbox: np.ndarray
|
|
||||||
bbox_confidence: float
|
|
||||||
keypoints: Pose2D
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class CameraFrame:
|
|
||||||
camera_name: str
|
|
||||||
frame_index: int
|
|
||||||
timestamp_unix_ns: int
|
|
||||||
detections: tuple[PoseDetection, ...]
|
|
||||||
source_size: tuple[int, int]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class FrameBundle:
|
|
||||||
bundle_index: int
|
|
||||||
timestamp_unix_ns: int
|
|
||||||
views: tuple[CameraFrame, ...]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class ReplaySequence:
|
|
||||||
scene_path: Path
|
|
||||||
replay_path: Path
|
|
||||||
frames_by_camera: dict[str, list[CameraFrame]]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class ProposalCluster:
|
|
||||||
pose3d: Pose3D
|
|
||||||
root: Vector3
|
|
||||||
source_views: frozenset[str]
|
|
||||||
support_size: int
|
|
||||||
mean_score: float
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class SkeletonState:
|
|
||||||
parameters: np.ndarray
|
|
||||||
beta: np.ndarray
|
|
||||||
pose3d: Pose3D
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class TentativeTrackState:
|
|
||||||
track_id: int
|
|
||||||
state: Literal["tentative"] = "tentative"
|
|
||||||
age: int = 0
|
|
||||||
misses: int = 0
|
|
||||||
score: float = 0.0
|
|
||||||
last_bundle_index: int = -1
|
|
||||||
root: Vector3 = field(default_factory=lambda: np.zeros(3, dtype=np.float64))
|
|
||||||
pose3d: Pose3D = field(default_factory=lambda: np.zeros((20, 4), dtype=np.float64))
|
|
||||||
evidence_buffer: list[Pose3D] = field(default_factory=list)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class ActiveTrackState:
|
|
||||||
track_id: int
|
|
||||||
status: Literal["active", "lost"] = "active"
|
|
||||||
misses: int = 0
|
|
||||||
lost_age: int = 0
|
|
||||||
score: float = 0.0
|
|
||||||
last_bundle_index: int = -1
|
|
||||||
skeleton: SkeletonState = field(
|
|
||||||
default_factory=lambda: SkeletonState(
|
|
||||||
parameters=np.zeros(31, dtype=np.float64),
|
|
||||||
beta=np.ones(8, dtype=np.float64),
|
|
||||||
pose3d=np.zeros((20, 4), dtype=np.float64),
|
|
||||||
)
|
|
||||||
)
|
|
||||||
noise_scale: np.ndarray = field(default_factory=lambda: np.full((20,), 9.0, dtype=np.float64))
|
|
||||||
|
|
||||||
|
|
||||||
TrackState = TentativeTrackState | ActiveTrackState
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class TrackedFrameResult:
|
|
||||||
bundle_index: int
|
|
||||||
timestamp_unix_ns: int
|
|
||||||
tentative_tracks: tuple[TentativeTrackState, ...]
|
|
||||||
active_tracks: tuple[ActiveTrackState, ...]
|
|
||||||
lost_tracks: tuple[ActiveTrackState, ...]
|
|
||||||
proposals: tuple[ProposalCluster, ...]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class TrackerDiagnostics:
|
|
||||||
match_existing_calls: int = 0
|
|
||||||
match_existing_seconds: float = 0.0
|
|
||||||
proposal_build_calls: int = 0
|
|
||||||
proposal_build_seconds: float = 0.0
|
|
||||||
promotions: int = 0
|
|
||||||
reacquisitions: int = 0
|
|
||||||
active_updates: int = 0
|
|
||||||
seed_initializations: int = 0
|
|
||||||
nonlinear_refinements: int = 0
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class TrackerConfig:
|
|
||||||
mode: Literal["general", "single_person"] = "general"
|
|
||||||
min_bundle_views: int = 2
|
|
||||||
max_sync_skew_ns: int = 12_000_000
|
|
||||||
tentative_buffer_size: int = 5
|
|
||||||
tentative_min_age: int = 3
|
|
||||||
tentative_hits_required: int = 3
|
|
||||||
tentative_promote_score: float = 3.0
|
|
||||||
tentative_max_misses: int = 2
|
|
||||||
active_min_views: int = 2
|
|
||||||
active_core_gate_px: float = 80.0
|
|
||||||
active_joint_gate_px: float = 120.0
|
|
||||||
active_miss_to_lost: int = 3
|
|
||||||
lost_delete_age: int = 15
|
|
||||||
proposal_match_distance_m: float = 0.45
|
|
||||||
noise_ema: float = 0.85
|
|
||||||
proposal_min_score: float = 0.9
|
|
||||||
proposal_min_group_size: int = 1
|
|
||||||
@@ -1,147 +0,0 @@
|
|||||||
import base64
|
|
||||||
import json
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
from beartype import beartype
|
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraFrame, PoseDetection
|
|
||||||
from pose_tracking_exp.normalization import normalize_rtmpose_body20
|
|
||||||
|
|
||||||
PROTOCOL_HEADER = bytes([0x80]) + b"POSE"
|
|
||||||
POSE_JOINT_COUNT = 133
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class DecodedPosePayload:
|
|
||||||
frame_index: int
|
|
||||||
reference_size: tuple[int, int]
|
|
||||||
timestamp_unix_ns: int
|
|
||||||
detections: tuple[PoseDetection, ...]
|
|
||||||
|
|
||||||
|
|
||||||
def _read_u8(payload: memoryview, offset: int) -> tuple[int, int]:
|
|
||||||
return int(payload[offset]), offset + 1
|
|
||||||
|
|
||||||
|
|
||||||
def _read_u16_array(payload: memoryview, offset: int, count: int) -> tuple[np.ndarray, int]:
|
|
||||||
size = count * 2
|
|
||||||
array = np.frombuffer(payload[offset : offset + size], dtype="<u2", count=count).astype(np.float64)
|
|
||||||
return array, offset + size
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def decode_pose_payload(payload: bytes) -> DecodedPosePayload:
|
|
||||||
if not payload.startswith(PROTOCOL_HEADER):
|
|
||||||
raise ValueError("Invalid ParaJumping pose payload header.")
|
|
||||||
|
|
||||||
view = memoryview(payload)
|
|
||||||
offset = len(PROTOCOL_HEADER)
|
|
||||||
frame_index = int.from_bytes(view[offset : offset + 4], "little")
|
|
||||||
offset += 4
|
|
||||||
reference_size = tuple(int(x) for x in np.frombuffer(view[offset : offset + 4], dtype="<u2", count=2))
|
|
||||||
offset += 4
|
|
||||||
|
|
||||||
num_bbox = int(view[offset])
|
|
||||||
offset += 1
|
|
||||||
bbox_raw, offset = _read_u16_array(view, offset, num_bbox * 4)
|
|
||||||
bboxes = bbox_raw.reshape(num_bbox, 4) if num_bbox > 0 else np.zeros((0, 4), dtype=np.float64)
|
|
||||||
|
|
||||||
num_bbox_conf = int(view[offset])
|
|
||||||
offset += 1
|
|
||||||
bbox_confidence = np.frombuffer(view[offset : offset + num_bbox_conf], dtype=np.uint8, count=num_bbox_conf)
|
|
||||||
offset += num_bbox_conf
|
|
||||||
|
|
||||||
num_keypoints = int(view[offset])
|
|
||||||
offset += 1
|
|
||||||
keypoints_raw, offset = _read_u16_array(view, offset, num_keypoints * POSE_JOINT_COUNT * 2)
|
|
||||||
keypoints_xy = (
|
|
||||||
keypoints_raw.reshape(num_keypoints, POSE_JOINT_COUNT, 2)
|
|
||||||
if num_keypoints > 0
|
|
||||||
else np.zeros((0, POSE_JOINT_COUNT, 2), dtype=np.float64)
|
|
||||||
)
|
|
||||||
|
|
||||||
num_keypoint_conf = int(view[offset])
|
|
||||||
offset += 1
|
|
||||||
keypoint_confidence = (
|
|
||||||
np.frombuffer(view[offset : offset + num_keypoint_conf], dtype=np.uint8, count=num_keypoint_conf).astype(np.float64)
|
|
||||||
/ 255.0
|
|
||||||
)
|
|
||||||
offset += num_keypoint_conf
|
|
||||||
timestamp_unix_ns = int.from_bytes(view[offset : offset + 8], "little")
|
|
||||||
|
|
||||||
if num_keypoint_conf > 0 and num_keypoint_conf != num_keypoints * POSE_JOINT_COUNT:
|
|
||||||
raise ValueError("Unexpected keypoint confidence payload length.")
|
|
||||||
|
|
||||||
detection_items: list[PoseDetection] = []
|
|
||||||
confidences = (
|
|
||||||
keypoint_confidence.reshape(num_keypoints, POSE_JOINT_COUNT)
|
|
||||||
if num_keypoints > 0
|
|
||||||
else np.zeros((0, POSE_JOINT_COUNT), dtype=np.float64)
|
|
||||||
)
|
|
||||||
|
|
||||||
for index in range(num_keypoints):
|
|
||||||
normalized = normalize_rtmpose_body20(keypoints_xy[index], confidences[index])
|
|
||||||
bbox_score = float(bbox_confidence[index] / 255.0) if index < bbox_confidence.shape[0] else 0.0
|
|
||||||
bbox = bboxes[index] if index < bboxes.shape[0] else np.zeros(4, dtype=np.float64)
|
|
||||||
detection_items.append(
|
|
||||||
PoseDetection(
|
|
||||||
bbox=np.asarray(bbox, dtype=np.float64),
|
|
||||||
bbox_confidence=bbox_score,
|
|
||||||
keypoints=np.asarray(normalized, dtype=np.float64),
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
return DecodedPosePayload(
|
|
||||||
frame_index=frame_index,
|
|
||||||
reference_size=(reference_size[0], reference_size[1]),
|
|
||||||
timestamp_unix_ns=timestamp_unix_ns,
|
|
||||||
detections=tuple(detection_items),
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def frame_from_payload(camera_name: str, payload: bytes) -> CameraFrame:
|
|
||||||
decoded = decode_pose_payload(payload)
|
|
||||||
return CameraFrame(
|
|
||||||
camera_name=camera_name,
|
|
||||||
frame_index=decoded.frame_index,
|
|
||||||
timestamp_unix_ns=decoded.timestamp_unix_ns,
|
|
||||||
detections=decoded.detections,
|
|
||||||
source_size=decoded.reference_size,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def convert_payload_record(record: dict[str, object]) -> dict[str, object]:
|
|
||||||
camera_name = str(record["camera"])
|
|
||||||
payload_b64 = str(record["payload_b64"])
|
|
||||||
frame = frame_from_payload(camera_name, base64.b64decode(payload_b64))
|
|
||||||
|
|
||||||
return {
|
|
||||||
"camera": frame.camera_name,
|
|
||||||
"frame_index": frame.frame_index,
|
|
||||||
"timestamp_unix_ns": frame.timestamp_unix_ns,
|
|
||||||
"source_size": list(frame.source_size),
|
|
||||||
"detections": [
|
|
||||||
{
|
|
||||||
"bbox": detection.bbox.tolist(),
|
|
||||||
"bbox_confidence": detection.bbox_confidence,
|
|
||||||
"keypoints": detection.keypoints.tolist(),
|
|
||||||
}
|
|
||||||
for detection in frame.detections
|
|
||||||
],
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def convert_payload_jsonl_lines(lines: list[str]) -> list[str]:
|
|
||||||
output_lines: list[str] = []
|
|
||||||
for line in lines:
|
|
||||||
if not line.strip():
|
|
||||||
continue
|
|
||||||
record = json.loads(line)
|
|
||||||
converted = convert_payload_record(record)
|
|
||||||
output_lines.append(json.dumps(converted))
|
|
||||||
return output_lines
|
|
||||||
|
|
||||||
@@ -1,108 +0,0 @@
|
|||||||
import json
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
from beartype import beartype
|
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraCalibration, CameraFrame, PoseDetection, ReplaySequence, SceneConfig
|
|
||||||
|
|
||||||
_OPENCV_EXTRINSICS = "opencv_world_to_camera"
|
|
||||||
_RPT_POSE = "rpt_camera_pose"
|
|
||||||
|
|
||||||
|
|
||||||
def _as_float_array(values: object, shape: tuple[int, ...]) -> np.ndarray:
|
|
||||||
array = np.asarray(values, dtype=np.float64)
|
|
||||||
if array.shape != shape:
|
|
||||||
raise ValueError(f"Expected shape {shape}, got {array.shape}.")
|
|
||||||
return array
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def load_scene_file(path: Path) -> SceneConfig:
|
|
||||||
payload = json.loads(path.read_text(encoding="utf-8"))
|
|
||||||
default_extrinsic_format = str(payload.get("extrinsic_format", _OPENCV_EXTRINSICS))
|
|
||||||
cameras: list[CameraCalibration] = []
|
|
||||||
for camera_payload in payload["cameras"]:
|
|
||||||
extrinsic_format = str(camera_payload.get("extrinsic_format", default_extrinsic_format))
|
|
||||||
name = str(camera_payload["name"])
|
|
||||||
width = int(camera_payload["width"])
|
|
||||||
height = int(camera_payload["height"])
|
|
||||||
K = _as_float_array(camera_payload["K"], (3, 3))
|
|
||||||
DC = np.asarray(camera_payload.get("DC", [0.0, 0.0, 0.0, 0.0, 0.0]), dtype=np.float64)
|
|
||||||
R = _as_float_array(camera_payload["R"], (3, 3))
|
|
||||||
T = _as_float_array(camera_payload["T"], (3, 1)).reshape(3)
|
|
||||||
model = str(camera_payload.get("model", "pinhole"))
|
|
||||||
if extrinsic_format == _OPENCV_EXTRINSICS:
|
|
||||||
cameras.append(
|
|
||||||
CameraCalibration.from_opencv_extrinsics(
|
|
||||||
name=name,
|
|
||||||
width=width,
|
|
||||||
height=height,
|
|
||||||
K=K,
|
|
||||||
DC=DC,
|
|
||||||
R=R,
|
|
||||||
T=T,
|
|
||||||
model=model,
|
|
||||||
rvec=np.asarray(camera_payload["rvec"], dtype=np.float64).reshape(3)
|
|
||||||
if "rvec" in camera_payload
|
|
||||||
else None,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
elif extrinsic_format == _RPT_POSE:
|
|
||||||
cameras.append(
|
|
||||||
CameraCalibration.from_rpt_pose(
|
|
||||||
name=name,
|
|
||||||
width=width,
|
|
||||||
height=height,
|
|
||||||
K=K,
|
|
||||||
DC=DC,
|
|
||||||
R=R,
|
|
||||||
T=T,
|
|
||||||
model=model,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
raise ValueError(
|
|
||||||
f"Unsupported extrinsic format {extrinsic_format!r}. "
|
|
||||||
f"Expected {_OPENCV_EXTRINSICS!r} or {_RPT_POSE!r}."
|
|
||||||
)
|
|
||||||
return SceneConfig(
|
|
||||||
room_size=_as_float_array(payload["room_size"], (3,)),
|
|
||||||
room_center=_as_float_array(payload["room_center"], (3,)),
|
|
||||||
cameras=tuple(cameras),
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
|
||||||
def load_replay_file(scene_path: Path, replay_path: Path) -> ReplaySequence:
|
|
||||||
frames_by_camera: dict[str, list[CameraFrame]] = {}
|
|
||||||
for raw_line in replay_path.read_text(encoding="utf-8").splitlines():
|
|
||||||
if not raw_line.strip():
|
|
||||||
continue
|
|
||||||
payload = json.loads(raw_line)
|
|
||||||
camera_name = str(payload["camera"])
|
|
||||||
detections: list[PoseDetection] = []
|
|
||||||
for detection_payload in payload["detections"]:
|
|
||||||
detections.append(
|
|
||||||
PoseDetection(
|
|
||||||
bbox=np.asarray(detection_payload["bbox"], dtype=np.float64),
|
|
||||||
bbox_confidence=float(detection_payload["bbox_confidence"]),
|
|
||||||
keypoints=np.asarray(detection_payload["keypoints"], dtype=np.float64),
|
|
||||||
)
|
|
||||||
)
|
|
||||||
frames_by_camera.setdefault(camera_name, []).append(
|
|
||||||
CameraFrame(
|
|
||||||
camera_name=camera_name,
|
|
||||||
frame_index=int(payload["frame_index"]),
|
|
||||||
timestamp_unix_ns=int(payload["timestamp_unix_ns"]),
|
|
||||||
detections=tuple(detections),
|
|
||||||
source_size=(
|
|
||||||
int(payload["source_size"][0]),
|
|
||||||
int(payload["source_size"][1]),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
for frames in frames_by_camera.values():
|
|
||||||
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
|
|
||||||
return ReplaySequence(scene_path=scene_path, replay_path=replay_path, frames_by_camera=frames_by_camera)
|
|
||||||
@@ -0,0 +1,50 @@
|
|||||||
|
from pose_tracking_exp.schema.camera import (
|
||||||
|
CameraCalibration,
|
||||||
|
CameraModel,
|
||||||
|
PINHOLE_CAMERA_MODEL,
|
||||||
|
SceneConfig,
|
||||||
|
parse_camera_model,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.detection import (
|
||||||
|
BoxDetections,
|
||||||
|
CocoKeypointSchema,
|
||||||
|
PoseBatchRequest,
|
||||||
|
PoseDetections,
|
||||||
|
SourceFrame,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.observation import CameraFrame, FrameBundle, PoseDetection, ReplaySequence
|
||||||
|
from pose_tracking_exp.schema.tracking import (
|
||||||
|
ActiveTrackState,
|
||||||
|
ProposalCluster,
|
||||||
|
SkeletonState,
|
||||||
|
TentativeTrackState,
|
||||||
|
TrackState,
|
||||||
|
TrackerConfig,
|
||||||
|
TrackerDiagnostics,
|
||||||
|
TrackedFrameResult,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ActiveTrackState",
|
||||||
|
"BoxDetections",
|
||||||
|
"CameraCalibration",
|
||||||
|
"CameraFrame",
|
||||||
|
"CameraModel",
|
||||||
|
"CocoKeypointSchema",
|
||||||
|
"FrameBundle",
|
||||||
|
"PINHOLE_CAMERA_MODEL",
|
||||||
|
"PoseBatchRequest",
|
||||||
|
"PoseDetection",
|
||||||
|
"PoseDetections",
|
||||||
|
"ProposalCluster",
|
||||||
|
"ReplaySequence",
|
||||||
|
"SceneConfig",
|
||||||
|
"SkeletonState",
|
||||||
|
"TentativeTrackState",
|
||||||
|
"TrackState",
|
||||||
|
"TrackerConfig",
|
||||||
|
"TrackerDiagnostics",
|
||||||
|
"TrackedFrameResult",
|
||||||
|
"SourceFrame",
|
||||||
|
"parse_camera_model",
|
||||||
|
]
|
||||||
@@ -0,0 +1,106 @@
|
|||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.tensor_types import Matrix3, Vector3
|
||||||
|
|
||||||
|
CameraModel = Literal["pinhole"]
|
||||||
|
PINHOLE_CAMERA_MODEL: CameraModel = "pinhole"
|
||||||
|
|
||||||
|
|
||||||
|
def parse_camera_model(model: str) -> CameraModel:
|
||||||
|
if model != PINHOLE_CAMERA_MODEL:
|
||||||
|
raise ValueError(
|
||||||
|
f"Unsupported camera model {model!r}. Expected {PINHOLE_CAMERA_MODEL!r}."
|
||||||
|
)
|
||||||
|
return PINHOLE_CAMERA_MODEL
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class CameraCalibration:
|
||||||
|
name: str
|
||||||
|
width: int
|
||||||
|
height: int
|
||||||
|
K: Matrix3
|
||||||
|
DC: np.ndarray
|
||||||
|
R: Matrix3
|
||||||
|
T: Vector3
|
||||||
|
model: CameraModel = PINHOLE_CAMERA_MODEL
|
||||||
|
rvec: np.ndarray | None = None
|
||||||
|
pose_R: Matrix3 = field(init=False)
|
||||||
|
pose_T: Vector3 = field(init=False)
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
self.K = np.asarray(self.K, dtype=np.float64).reshape(3, 3)
|
||||||
|
self.DC = np.asarray(self.DC, dtype=np.float64).reshape(-1)
|
||||||
|
self.R = np.asarray(self.R, dtype=np.float64).reshape(3, 3)
|
||||||
|
self.T = np.asarray(self.T, dtype=np.float64).reshape(3)
|
||||||
|
self.model = parse_camera_model(self.model)
|
||||||
|
if self.rvec is None:
|
||||||
|
rvec, _ = cv2.Rodrigues(self.R)
|
||||||
|
self.rvec = np.asarray(rvec, dtype=np.float64).reshape(3)
|
||||||
|
else:
|
||||||
|
self.rvec = np.asarray(self.rvec, dtype=np.float64).reshape(3)
|
||||||
|
self.pose_R = self.R.T
|
||||||
|
self.pose_T = -(self.pose_R @ self.T)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def from_opencv_extrinsics(
|
||||||
|
name: str,
|
||||||
|
width: int,
|
||||||
|
height: int,
|
||||||
|
K: Matrix3,
|
||||||
|
DC: np.ndarray,
|
||||||
|
R: Matrix3,
|
||||||
|
T: Vector3,
|
||||||
|
model: CameraModel = PINHOLE_CAMERA_MODEL,
|
||||||
|
rvec: np.ndarray | None = None,
|
||||||
|
) -> "CameraCalibration":
|
||||||
|
return CameraCalibration(
|
||||||
|
name=name,
|
||||||
|
width=width,
|
||||||
|
height=height,
|
||||||
|
K=K,
|
||||||
|
DC=DC,
|
||||||
|
R=R,
|
||||||
|
T=T,
|
||||||
|
model=model,
|
||||||
|
rvec=rvec,
|
||||||
|
)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def from_rpt_pose(
|
||||||
|
name: str,
|
||||||
|
width: int,
|
||||||
|
height: int,
|
||||||
|
K: Matrix3,
|
||||||
|
DC: np.ndarray,
|
||||||
|
R: Matrix3,
|
||||||
|
T: Vector3,
|
||||||
|
model: CameraModel = PINHOLE_CAMERA_MODEL,
|
||||||
|
) -> "CameraCalibration":
|
||||||
|
pose_R = np.asarray(R, dtype=np.float64).reshape(3, 3)
|
||||||
|
pose_T = np.asarray(T, dtype=np.float64).reshape(3)
|
||||||
|
rotation = pose_R.T
|
||||||
|
translation = -(rotation @ pose_T)
|
||||||
|
rvec, _ = cv2.Rodrigues(rotation)
|
||||||
|
return CameraCalibration(
|
||||||
|
name=name,
|
||||||
|
width=width,
|
||||||
|
height=height,
|
||||||
|
K=K,
|
||||||
|
DC=DC,
|
||||||
|
R=rotation,
|
||||||
|
T=translation,
|
||||||
|
model=model,
|
||||||
|
rvec=np.asarray(rvec, dtype=np.float64).reshape(3),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class SceneConfig:
|
||||||
|
room_size: Vector3
|
||||||
|
room_center: Vector3
|
||||||
|
cameras: tuple[CameraCalibration, ...]
|
||||||
@@ -0,0 +1,116 @@
|
|||||||
|
"""Shared 2D detection schema.
|
||||||
|
|
||||||
|
`coco_wholebody133` matches the COCO-WholeBody dataset terminology used by
|
||||||
|
MMPose and the official dataset repo. The first 17 joints follow the standard
|
||||||
|
COCO body ordering, so it is body-compatible with `coco17`.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- https://mmpose.readthedocs.io/en/latest/dataset_zoo/2d_wholebody_keypoint.html
|
||||||
|
- https://github.com/jin-s13/COCO-WholeBody
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
CocoKeypointSchema = Literal["coco17", "coco_wholebody133"]
|
||||||
|
|
||||||
|
|
||||||
|
def expected_keypoint_count(schema: CocoKeypointSchema) -> int:
|
||||||
|
if schema == "coco17":
|
||||||
|
return 17
|
||||||
|
return 133
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class SourceFrame:
|
||||||
|
source_name: str
|
||||||
|
image_bgr: np.ndarray
|
||||||
|
frame_index: int
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class BoxDetections:
|
||||||
|
boxes_xyxy: np.ndarray
|
||||||
|
scores: np.ndarray
|
||||||
|
reference_frame_shape: tuple[int, int]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def reference_size(self) -> tuple[int, int]:
|
||||||
|
return (self.reference_frame_shape[1], self.reference_frame_shape[0])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def boxes_num(self) -> int:
|
||||||
|
return int(self.boxes_xyxy.shape[0])
|
||||||
|
|
||||||
|
def filter_by_area(self, area_threshold: int) -> "BoxDetections":
|
||||||
|
if area_threshold <= 0:
|
||||||
|
raise ValueError("Area threshold must be positive.")
|
||||||
|
areas = np.abs(
|
||||||
|
(self.boxes_xyxy[:, 2] - self.boxes_xyxy[:, 0])
|
||||||
|
* (self.boxes_xyxy[:, 3] - self.boxes_xyxy[:, 1])
|
||||||
|
)
|
||||||
|
mask = areas >= area_threshold
|
||||||
|
return BoxDetections(
|
||||||
|
boxes_xyxy=self.boxes_xyxy[mask],
|
||||||
|
scores=self.scores[mask],
|
||||||
|
reference_frame_shape=self.reference_frame_shape,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class PoseBatchRequest:
|
||||||
|
image_rgb: np.ndarray
|
||||||
|
boxes_xyxy: np.ndarray
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class PoseDetections:
|
||||||
|
source_name: str
|
||||||
|
frame_index: int
|
||||||
|
source_size: tuple[int, int]
|
||||||
|
boxes_xyxy: np.ndarray
|
||||||
|
box_scores: np.ndarray | None
|
||||||
|
keypoints_xy: np.ndarray
|
||||||
|
keypoint_scores: np.ndarray | None
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
keypoint_schema: CocoKeypointSchema = "coco_wholebody133"
|
||||||
|
|
||||||
|
def validate(self) -> None:
|
||||||
|
if self.boxes_xyxy.ndim != 2 or self.boxes_xyxy.shape[1] != 4:
|
||||||
|
raise ValueError(
|
||||||
|
f"Expected boxes with shape (N, 4), got {self.boxes_xyxy.shape}."
|
||||||
|
)
|
||||||
|
if self.keypoints_xy.ndim != 3 or self.keypoints_xy.shape[2] != 2:
|
||||||
|
raise ValueError(
|
||||||
|
"Expected keypoints with shape (N, K, 2), "
|
||||||
|
f"got {self.keypoints_xy.shape}."
|
||||||
|
)
|
||||||
|
|
||||||
|
expected_count = expected_keypoint_count(self.keypoint_schema)
|
||||||
|
if self.keypoints_xy.shape[1] != expected_count:
|
||||||
|
raise ValueError(
|
||||||
|
f"Expected {self.keypoint_schema} keypoints with {expected_count} joints, "
|
||||||
|
f"got {self.keypoints_xy.shape[1]}."
|
||||||
|
)
|
||||||
|
|
||||||
|
detection_count = int(self.keypoints_xy.shape[0])
|
||||||
|
if self.boxes_xyxy.shape[0] != detection_count:
|
||||||
|
raise ValueError(
|
||||||
|
"Expected box and keypoint detection counts to match, "
|
||||||
|
f"got {self.boxes_xyxy.shape[0]} and {detection_count}."
|
||||||
|
)
|
||||||
|
if self.box_scores is not None and self.box_scores.shape != (detection_count,):
|
||||||
|
raise ValueError(
|
||||||
|
f"Expected box scores with shape ({detection_count},), got {self.box_scores.shape}."
|
||||||
|
)
|
||||||
|
if self.keypoint_scores is not None and self.keypoint_scores.shape != (
|
||||||
|
detection_count,
|
||||||
|
expected_count,
|
||||||
|
):
|
||||||
|
raise ValueError(
|
||||||
|
"Expected keypoint scores with shape "
|
||||||
|
f"({detection_count}, {expected_count}), got {self.keypoint_scores.shape}."
|
||||||
|
)
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.tensor_types import Pose2D
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class PoseDetection:
|
||||||
|
bbox: np.ndarray
|
||||||
|
bbox_confidence: float
|
||||||
|
keypoints: Pose2D
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class CameraFrame:
|
||||||
|
camera_name: str
|
||||||
|
frame_index: int
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
detections: tuple[PoseDetection, ...]
|
||||||
|
source_size: tuple[int, int]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class FrameBundle:
|
||||||
|
bundle_index: int
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
views: tuple[CameraFrame, ...]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class ReplaySequence:
|
||||||
|
scene_path: Path
|
||||||
|
replay_path: Path
|
||||||
|
frames_by_camera: dict[str, list[CameraFrame]]
|
||||||
@@ -0,0 +1,102 @@
|
|||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.tensor_types import Pose3D, Vector3
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class ProposalCluster:
|
||||||
|
pose3d: Pose3D
|
||||||
|
root: Vector3
|
||||||
|
source_views: frozenset[str]
|
||||||
|
support_size: int
|
||||||
|
mean_score: float
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class SkeletonState:
|
||||||
|
parameters: np.ndarray
|
||||||
|
beta: np.ndarray
|
||||||
|
pose3d: Pose3D
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class TentativeTrackState:
|
||||||
|
track_id: int
|
||||||
|
state: Literal["tentative"] = "tentative"
|
||||||
|
age: int = 0
|
||||||
|
misses: int = 0
|
||||||
|
score: float = 0.0
|
||||||
|
last_bundle_index: int = -1
|
||||||
|
root: Vector3 = field(default_factory=lambda: np.zeros(3, dtype=np.float64))
|
||||||
|
pose3d: Pose3D = field(default_factory=lambda: np.zeros((20, 4), dtype=np.float64))
|
||||||
|
evidence_buffer: list[Pose3D] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class ActiveTrackState:
|
||||||
|
track_id: int
|
||||||
|
status: Literal["active", "lost"] = "active"
|
||||||
|
misses: int = 0
|
||||||
|
lost_age: int = 0
|
||||||
|
score: float = 0.0
|
||||||
|
last_bundle_index: int = -1
|
||||||
|
skeleton: SkeletonState = field(
|
||||||
|
default_factory=lambda: SkeletonState(
|
||||||
|
parameters=np.zeros(31, dtype=np.float64),
|
||||||
|
beta=np.ones(8, dtype=np.float64),
|
||||||
|
pose3d=np.zeros((20, 4), dtype=np.float64),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
noise_scale: np.ndarray = field(
|
||||||
|
default_factory=lambda: np.full((20,), 9.0, dtype=np.float64)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
TrackState = TentativeTrackState | ActiveTrackState
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class TrackedFrameResult:
|
||||||
|
bundle_index: int
|
||||||
|
timestamp_unix_ns: int
|
||||||
|
tentative_tracks: tuple[TentativeTrackState, ...]
|
||||||
|
active_tracks: tuple[ActiveTrackState, ...]
|
||||||
|
lost_tracks: tuple[ActiveTrackState, ...]
|
||||||
|
proposals: tuple[ProposalCluster, ...]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class TrackerDiagnostics:
|
||||||
|
match_existing_calls: int = 0
|
||||||
|
match_existing_seconds: float = 0.0
|
||||||
|
proposal_build_calls: int = 0
|
||||||
|
proposal_build_seconds: float = 0.0
|
||||||
|
promotions: int = 0
|
||||||
|
reacquisitions: int = 0
|
||||||
|
active_updates: int = 0
|
||||||
|
seed_initializations: int = 0
|
||||||
|
nonlinear_refinements: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class TrackerConfig:
|
||||||
|
max_active_tracks: int | None = None
|
||||||
|
min_bundle_views: int = 2
|
||||||
|
max_sync_skew_ns: int = 12_000_000
|
||||||
|
tentative_buffer_size: int = 5
|
||||||
|
tentative_min_age: int = 3
|
||||||
|
tentative_hits_required: int = 3
|
||||||
|
tentative_promote_score: float = 3.0
|
||||||
|
tentative_max_misses: int = 2
|
||||||
|
active_min_views: int = 2
|
||||||
|
active_core_gate_px: float = 80.0
|
||||||
|
active_joint_gate_px: float = 120.0
|
||||||
|
active_miss_to_lost: int = 3
|
||||||
|
lost_delete_age: int = 15
|
||||||
|
proposal_match_distance_m: float = 0.45
|
||||||
|
noise_ema: float = 0.85
|
||||||
|
proposal_min_score: float = 0.9
|
||||||
|
proposal_min_group_size: int = 1
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview
|
||||||
|
from pose_tracking_exp.tracking.replay_io import load_parquet_replay_dir, load_replay_file, load_scene_file
|
||||||
|
from pose_tracking_exp.tracking.sync import synchronize_frames
|
||||||
|
from pose_tracking_exp.tracking.tracker import PoseTracker
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"PoseTracker",
|
||||||
|
"load_parquet_replay_dir",
|
||||||
|
"load_replay_file",
|
||||||
|
"load_scene_file",
|
||||||
|
"seed_state_from_pose3d",
|
||||||
|
"synchronize_frames",
|
||||||
|
"update_noise_scale",
|
||||||
|
"update_state_from_multiview",
|
||||||
|
]
|
||||||
@@ -4,10 +4,10 @@ import numpy as np
|
|||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
from scipy.optimize import least_squares
|
from scipy.optimize import least_squares
|
||||||
|
|
||||||
from pose_tracking_exp.camera_math import project_pose
|
from pose_tracking_exp.common.camera_math import project_pose
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
from pose_tracking_exp.models import CameraCalibration, PoseDetection, SkeletonState
|
from pose_tracking_exp.common.tensor_types import Pose3D
|
||||||
from pose_tracking_exp.tensor_types import Pose3D
|
from pose_tracking_exp.schema import CameraCalibration, PoseDetection, SkeletonState
|
||||||
|
|
||||||
PARAMETER_DIMENSION = 31
|
PARAMETER_DIMENSION = 31
|
||||||
SHAPE_DIMENSION = 8
|
SHAPE_DIMENSION = 8
|
||||||
@@ -0,0 +1,221 @@
|
|||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import cast
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pyarrow.parquet as pq
|
||||||
|
from beartype import beartype
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.detection_parquet import DETECTED_PARQUET_SUFFIX
|
||||||
|
from pose_tracking_exp.common.normalization import infer_bbox_from_keypoints, normalize_coco_body20
|
||||||
|
from pose_tracking_exp.schema import (
|
||||||
|
CameraCalibration,
|
||||||
|
CameraFrame,
|
||||||
|
CocoKeypointSchema,
|
||||||
|
PoseDetection,
|
||||||
|
ReplaySequence,
|
||||||
|
SceneConfig,
|
||||||
|
parse_camera_model,
|
||||||
|
)
|
||||||
|
|
||||||
|
_OPENCV_EXTRINSICS = "opencv_world_to_camera"
|
||||||
|
_RPT_POSE = "rpt_camera_pose"
|
||||||
|
|
||||||
|
|
||||||
|
def _as_float_array(values: object, shape: tuple[int, ...]) -> np.ndarray:
|
||||||
|
array = np.asarray(values, dtype=np.float64)
|
||||||
|
if array.shape != shape:
|
||||||
|
raise ValueError(f"Expected shape {shape}, got {array.shape}.")
|
||||||
|
return array
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def load_scene_file(path: Path) -> SceneConfig:
|
||||||
|
payload = json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
default_extrinsic_format = str(payload.get("extrinsic_format", _OPENCV_EXTRINSICS))
|
||||||
|
cameras: list[CameraCalibration] = []
|
||||||
|
for camera_payload in payload["cameras"]:
|
||||||
|
extrinsic_format = str(
|
||||||
|
camera_payload.get("extrinsic_format", default_extrinsic_format)
|
||||||
|
)
|
||||||
|
name = str(camera_payload["name"])
|
||||||
|
width = int(camera_payload["width"])
|
||||||
|
height = int(camera_payload["height"])
|
||||||
|
K = _as_float_array(camera_payload["K"], (3, 3))
|
||||||
|
DC = np.asarray(
|
||||||
|
camera_payload.get("DC", [0.0, 0.0, 0.0, 0.0, 0.0]), dtype=np.float64
|
||||||
|
)
|
||||||
|
R = _as_float_array(camera_payload["R"], (3, 3))
|
||||||
|
T = _as_float_array(camera_payload["T"], (3, 1)).reshape(3)
|
||||||
|
model = parse_camera_model(camera_payload.get("model", "pinhole"))
|
||||||
|
if extrinsic_format == _OPENCV_EXTRINSICS:
|
||||||
|
cameras.append(
|
||||||
|
CameraCalibration.from_opencv_extrinsics(
|
||||||
|
name=name,
|
||||||
|
width=width,
|
||||||
|
height=height,
|
||||||
|
K=K,
|
||||||
|
DC=DC,
|
||||||
|
R=R,
|
||||||
|
T=T,
|
||||||
|
model=model,
|
||||||
|
rvec=np.asarray(camera_payload["rvec"], dtype=np.float64).reshape(3)
|
||||||
|
if "rvec" in camera_payload
|
||||||
|
else None,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif extrinsic_format == _RPT_POSE:
|
||||||
|
cameras.append(
|
||||||
|
CameraCalibration.from_rpt_pose(
|
||||||
|
name=name,
|
||||||
|
width=width,
|
||||||
|
height=height,
|
||||||
|
K=K,
|
||||||
|
DC=DC,
|
||||||
|
R=R,
|
||||||
|
T=T,
|
||||||
|
model=model,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise ValueError(
|
||||||
|
f"Unsupported extrinsic format {extrinsic_format!r}. "
|
||||||
|
f"Expected {_OPENCV_EXTRINSICS!r} or {_RPT_POSE!r}."
|
||||||
|
)
|
||||||
|
return SceneConfig(
|
||||||
|
room_size=_as_float_array(payload["room_size"], (3,)),
|
||||||
|
room_center=_as_float_array(payload["room_center"], (3,)),
|
||||||
|
cameras=tuple(cameras),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def load_replay_file(scene_path: Path, replay_path: Path) -> ReplaySequence:
|
||||||
|
if replay_path.is_dir():
|
||||||
|
return load_parquet_replay_dir(scene_path, replay_path)
|
||||||
|
|
||||||
|
frames_by_camera: dict[str, list[CameraFrame]] = {}
|
||||||
|
for raw_line in replay_path.read_text(encoding="utf-8").splitlines():
|
||||||
|
if not raw_line.strip():
|
||||||
|
continue
|
||||||
|
payload = json.loads(raw_line)
|
||||||
|
camera_name = str(payload["camera"])
|
||||||
|
detections: list[PoseDetection] = []
|
||||||
|
for detection_payload in payload["detections"]:
|
||||||
|
detections.append(
|
||||||
|
PoseDetection(
|
||||||
|
bbox=np.asarray(detection_payload["bbox"], dtype=np.float64),
|
||||||
|
bbox_confidence=float(detection_payload["bbox_confidence"]),
|
||||||
|
keypoints=np.asarray(
|
||||||
|
detection_payload["keypoints"], dtype=np.float64
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
frames_by_camera.setdefault(camera_name, []).append(
|
||||||
|
CameraFrame(
|
||||||
|
camera_name=camera_name,
|
||||||
|
frame_index=int(payload["frame_index"]),
|
||||||
|
timestamp_unix_ns=int(payload["timestamp_unix_ns"]),
|
||||||
|
detections=tuple(detections),
|
||||||
|
source_size=(
|
||||||
|
int(payload["source_size"][0]),
|
||||||
|
int(payload["source_size"][1]),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for frames in frames_by_camera.values():
|
||||||
|
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
|
||||||
|
return ReplaySequence(
|
||||||
|
scene_path=scene_path,
|
||||||
|
replay_path=replay_path,
|
||||||
|
frames_by_camera=frames_by_camera,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pose_detections_from_parquet_row(row: dict[str, object]) -> tuple[PoseDetection, ...]:
|
||||||
|
boxes = np.asarray(row.get("boxes", []), dtype=np.float64)
|
||||||
|
if boxes.size == 0:
|
||||||
|
boxes = np.empty((0, 4), dtype=np.float64)
|
||||||
|
|
||||||
|
box_scores = np.asarray(row.get("box_scores", []), dtype=np.float64)
|
||||||
|
keypoints_xy = np.asarray(row.get("kps", []), dtype=np.float64)
|
||||||
|
if keypoints_xy.size == 0:
|
||||||
|
keypoints_xy = np.empty((0, 133, 2), dtype=np.float64)
|
||||||
|
|
||||||
|
keypoint_scores = np.asarray(row.get("kps_scores", []), dtype=np.float64)
|
||||||
|
if keypoint_scores.size == 0:
|
||||||
|
keypoint_scores = np.empty((0, 133), dtype=np.float64)
|
||||||
|
|
||||||
|
raw_keypoint_schema = row.get("keypoint_schema", "coco_wholebody133")
|
||||||
|
if raw_keypoint_schema not in {"coco17", "coco_wholebody133"}:
|
||||||
|
raise ValueError(f"Unsupported keypoint schema in parquet replay: {raw_keypoint_schema!r}")
|
||||||
|
keypoint_schema = cast(CocoKeypointSchema, raw_keypoint_schema)
|
||||||
|
if keypoints_xy.shape[0] != keypoint_scores.shape[0]:
|
||||||
|
raise ValueError(
|
||||||
|
"Expected matching keypoint coordinate and score counts in parquet replay row."
|
||||||
|
)
|
||||||
|
|
||||||
|
detections: list[PoseDetection] = []
|
||||||
|
for detection_index in range(int(keypoints_xy.shape[0])):
|
||||||
|
normalized = normalize_coco_body20(
|
||||||
|
keypoints_xy[detection_index],
|
||||||
|
keypoint_scores[detection_index],
|
||||||
|
keypoint_schema=keypoint_schema,
|
||||||
|
)
|
||||||
|
bbox = (
|
||||||
|
boxes[detection_index]
|
||||||
|
if detection_index < boxes.shape[0]
|
||||||
|
else infer_bbox_from_keypoints(normalized)
|
||||||
|
)
|
||||||
|
visible = normalized[:, 2] > 0.0
|
||||||
|
bbox_confidence = (
|
||||||
|
float(box_scores[detection_index])
|
||||||
|
if detection_index < box_scores.shape[0]
|
||||||
|
else float(np.mean(normalized[visible, 2]))
|
||||||
|
if np.any(visible)
|
||||||
|
else 0.0
|
||||||
|
)
|
||||||
|
detections.append(
|
||||||
|
PoseDetection(
|
||||||
|
bbox=np.asarray(bbox, dtype=np.float64),
|
||||||
|
bbox_confidence=bbox_confidence,
|
||||||
|
keypoints=np.asarray(normalized, dtype=np.float64),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return tuple(detections)
|
||||||
|
|
||||||
|
|
||||||
|
@beartype
|
||||||
|
def load_parquet_replay_dir(scene_path: Path, replay_root: Path) -> ReplaySequence:
|
||||||
|
parquet_paths = sorted(replay_root.glob(f"*{DETECTED_PARQUET_SUFFIX}"))
|
||||||
|
if not parquet_paths:
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"No detection parquet files matching *{DETECTED_PARQUET_SUFFIX} under {replay_root}."
|
||||||
|
)
|
||||||
|
|
||||||
|
frames_by_camera: dict[str, list[CameraFrame]] = {}
|
||||||
|
for parquet_path in parquet_paths:
|
||||||
|
camera_name = parquet_path.name.removesuffix(DETECTED_PARQUET_SUFFIX)
|
||||||
|
frames: list[CameraFrame] = []
|
||||||
|
for row in pq.read_table(parquet_path).to_pylist():
|
||||||
|
frames.append(
|
||||||
|
CameraFrame(
|
||||||
|
camera_name=camera_name,
|
||||||
|
frame_index=int(row["frame_index"]),
|
||||||
|
timestamp_unix_ns=int(row["timestamp_unix_ns"]),
|
||||||
|
detections=_pose_detections_from_parquet_row(row),
|
||||||
|
source_size=(
|
||||||
|
int(row.get("source_width", 0)),
|
||||||
|
int(row.get("source_height", 0)),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
frames.sort(key=lambda item: (item.timestamp_unix_ns, item.frame_index))
|
||||||
|
frames_by_camera[camera_name] = frames
|
||||||
|
|
||||||
|
return ReplaySequence(
|
||||||
|
scene_path=scene_path,
|
||||||
|
replay_path=replay_root,
|
||||||
|
frames_by_camera=frames_by_camera,
|
||||||
|
)
|
||||||
+11
-8
@@ -1,16 +1,19 @@
|
|||||||
from typing import Any
|
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import rpt
|
import rpt
|
||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
|
from rpt._core import TriangulationConfig, TriangulationTrace # type: ignore[reportMissingModuleSource]
|
||||||
|
|
||||||
from pose_tracking_exp.joints import BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT, BODY20_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME, BODY20_JOINT_NAMES, BODY20_OBSERVATION_COUNT
|
||||||
from pose_tracking_exp.models import CameraFrame, ProposalCluster, SceneConfig
|
from pose_tracking_exp.common.tensor_types import Pose2D
|
||||||
from pose_tracking_exp.tensor_types import Pose2D
|
from pose_tracking_exp.schema import CameraFrame, ProposalCluster, SceneConfig
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
def build_rpt_config(
|
||||||
def build_rpt_config(scene: SceneConfig, *, min_match_score: float, min_group_size: int) -> Any:
|
scene: SceneConfig,
|
||||||
|
*,
|
||||||
|
min_match_score: float,
|
||||||
|
min_group_size: int,
|
||||||
|
) -> TriangulationConfig:
|
||||||
cameras = [
|
cameras = [
|
||||||
{
|
{
|
||||||
"name": camera.name,
|
"name": camera.name,
|
||||||
@@ -50,7 +53,7 @@ def pack_view_detections(frames: tuple[CameraFrame, ...], unmatched_indices: dic
|
|||||||
|
|
||||||
@beartype
|
@beartype
|
||||||
def extract_clusters(
|
def extract_clusters(
|
||||||
trace: Any,
|
trace: TriangulationTrace,
|
||||||
camera_names: tuple[str, ...],
|
camera_names: tuple[str, ...],
|
||||||
) -> tuple[ProposalCluster, ...]:
|
) -> tuple[ProposalCluster, ...]:
|
||||||
clusters: list[ProposalCluster] = []
|
clusters: list[ProposalCluster] = []
|
||||||
@@ -2,7 +2,7 @@ from collections.abc import Iterable
|
|||||||
|
|
||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraFrame, FrameBundle, ReplaySequence
|
from pose_tracking_exp.schema import CameraFrame, FrameBundle, ReplaySequence
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
@beartype
|
||||||
@@ -50,4 +50,3 @@ def synchronize_frames(
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
return bundles
|
return bundles
|
||||||
|
|
||||||
@@ -5,10 +5,10 @@ import numpy as np
|
|||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
from scipy.optimize import linear_sum_assignment
|
from scipy.optimize import linear_sum_assignment
|
||||||
|
|
||||||
from pose_tracking_exp.camera_math import project_pose
|
from pose_tracking_exp.common.camera_math import project_pose
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME, CORE_JOINT_INDICES
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME, CORE_JOINT_INDICES
|
||||||
from pose_tracking_exp.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview
|
from pose_tracking_exp.common.normalization import core_reprojection_distance
|
||||||
from pose_tracking_exp.models import (
|
from pose_tracking_exp.schema import (
|
||||||
ActiveTrackState,
|
ActiveTrackState,
|
||||||
FrameBundle,
|
FrameBundle,
|
||||||
PoseDetection,
|
PoseDetection,
|
||||||
@@ -20,8 +20,8 @@ from pose_tracking_exp.models import (
|
|||||||
TrackerConfig,
|
TrackerConfig,
|
||||||
TrackerDiagnostics,
|
TrackerDiagnostics,
|
||||||
)
|
)
|
||||||
from pose_tracking_exp.normalization import core_reprojection_distance
|
from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d, update_noise_scale, update_state_from_multiview
|
||||||
from pose_tracking_exp.rpt_adapter import build_rpt_config, extract_clusters, pack_view_detections
|
from pose_tracking_exp.tracking.rpt_adapter import build_rpt_config, extract_clusters, pack_view_detections
|
||||||
|
|
||||||
CORE_JOINT_MASK = np.zeros((20,), dtype=bool)
|
CORE_JOINT_MASK = np.zeros((20,), dtype=bool)
|
||||||
CORE_JOINT_MASK[list(CORE_JOINT_INDICES)] = True
|
CORE_JOINT_MASK[list(CORE_JOINT_INDICES)] = True
|
||||||
@@ -78,20 +78,24 @@ class PoseTracker:
|
|||||||
return replace(self._diagnostics)
|
return replace(self._diagnostics)
|
||||||
|
|
||||||
def run(self, bundles: list[FrameBundle]) -> list[TrackedFrameResult]:
|
def run(self, bundles: list[FrameBundle]) -> list[TrackedFrameResult]:
|
||||||
|
self._tentative.clear()
|
||||||
|
self._active.clear()
|
||||||
|
self._lost.clear()
|
||||||
|
self._next_track_id = 1
|
||||||
self._diagnostics = TrackerDiagnostics()
|
self._diagnostics = TrackerDiagnostics()
|
||||||
return [self.step(bundle) for bundle in bundles]
|
return [self.step(bundle) for bundle in bundles]
|
||||||
|
|
||||||
def step(self, bundle: FrameBundle) -> TrackedFrameResult:
|
def step(self, bundle: FrameBundle) -> TrackedFrameResult:
|
||||||
self._enforce_single_person_constraints()
|
self._enforce_track_limits()
|
||||||
matches, unmatched = self._match_existing_tracks(bundle)
|
matches, unmatched = self._match_existing_tracks(bundle)
|
||||||
self._update_active_tracks(bundle, matches)
|
self._update_active_tracks(bundle, matches)
|
||||||
self._update_lost_tracks(bundle, matches)
|
self._update_lost_tracks(bundle, matches)
|
||||||
proposals = self._refresh_single_person_track_from_proposals(bundle, self._build_proposals(bundle, unmatched))
|
proposals = self._refresh_capped_single_track_from_proposals(bundle, self._build_proposals(bundle, unmatched))
|
||||||
self._update_tentative_tracks(bundle, self._birth_candidate_proposals(proposals))
|
self._update_tentative_tracks(bundle, self._birth_candidate_proposals(proposals))
|
||||||
self._promote_tentative_tracks(bundle)
|
self._promote_tentative_tracks(bundle)
|
||||||
self._reacquire_lost_tracks(bundle, proposals)
|
self._reacquire_lost_tracks(bundle, proposals)
|
||||||
self._delete_expired_tracks()
|
self._delete_expired_tracks()
|
||||||
self._enforce_single_person_constraints()
|
self._enforce_track_limits()
|
||||||
return TrackedFrameResult(
|
return TrackedFrameResult(
|
||||||
bundle_index=bundle.bundle_index,
|
bundle_index=bundle.bundle_index,
|
||||||
timestamp_unix_ns=bundle.timestamp_unix_ns,
|
timestamp_unix_ns=bundle.timestamp_unix_ns,
|
||||||
@@ -101,46 +105,58 @@ class PoseTracker:
|
|||||||
proposals=proposals,
|
proposals=proposals,
|
||||||
)
|
)
|
||||||
|
|
||||||
def _single_person_mode(self) -> bool:
|
def _track_limit(self) -> int | None:
|
||||||
return self._config.mode == "single_person"
|
return self._config.max_active_tracks
|
||||||
|
|
||||||
def _keep_best_active_track(self) -> None:
|
def _single_track_cap_enabled(self) -> bool:
|
||||||
if len(self._active) <= 1:
|
return self._config.max_active_tracks == 1
|
||||||
|
|
||||||
|
def _keep_best_active_tracks(self, limit: int) -> None:
|
||||||
|
if len(self._active) <= limit:
|
||||||
return
|
return
|
||||||
best_id = max(self._active, key=lambda track_id: _active_track_rank(self._active[track_id]))
|
ranked_ids = sorted(self._active, key=lambda track_id: _active_track_rank(self._active[track_id]), reverse=True)
|
||||||
|
keep_ids = set(ranked_ids[:limit])
|
||||||
for track_id in list(self._active):
|
for track_id in list(self._active):
|
||||||
if track_id != best_id:
|
if track_id not in keep_ids:
|
||||||
self._active.pop(track_id, None)
|
self._active.pop(track_id, None)
|
||||||
|
|
||||||
def _keep_best_lost_track(self) -> None:
|
def _keep_best_lost_tracks(self, limit: int) -> None:
|
||||||
if len(self._lost) <= 1:
|
if len(self._lost) <= limit:
|
||||||
return
|
return
|
||||||
best_id = max(self._lost, key=lambda track_id: _lost_track_rank(self._lost[track_id]))
|
ranked_ids = sorted(self._lost, key=lambda track_id: _lost_track_rank(self._lost[track_id]), reverse=True)
|
||||||
|
keep_ids = set(ranked_ids[:limit])
|
||||||
for track_id in list(self._lost):
|
for track_id in list(self._lost):
|
||||||
if track_id != best_id:
|
if track_id not in keep_ids:
|
||||||
self._lost.pop(track_id, None)
|
self._lost.pop(track_id, None)
|
||||||
|
|
||||||
def _keep_best_tentative_track(self) -> None:
|
def _keep_best_tentative_tracks(self, limit: int) -> None:
|
||||||
if len(self._tentative) <= 1:
|
if len(self._tentative) <= limit:
|
||||||
return
|
return
|
||||||
best_id = max(self._tentative, key=lambda track_id: _tentative_track_rank(self._tentative[track_id]))
|
ranked_ids = sorted(
|
||||||
|
self._tentative,
|
||||||
|
key=lambda track_id: _tentative_track_rank(self._tentative[track_id]),
|
||||||
|
reverse=True,
|
||||||
|
)
|
||||||
|
keep_ids = set(ranked_ids[:limit])
|
||||||
for track_id in list(self._tentative):
|
for track_id in list(self._tentative):
|
||||||
if track_id != best_id:
|
if track_id not in keep_ids:
|
||||||
self._tentative.pop(track_id, None)
|
self._tentative.pop(track_id, None)
|
||||||
|
|
||||||
def _enforce_single_person_constraints(self) -> None:
|
def _enforce_track_limits(self) -> None:
|
||||||
if not self._single_person_mode():
|
limit = self._track_limit()
|
||||||
|
if limit is None:
|
||||||
|
return
|
||||||
|
self._keep_best_active_tracks(limit)
|
||||||
|
self._keep_best_lost_tracks(limit)
|
||||||
|
self._keep_best_tentative_tracks(limit)
|
||||||
|
if not self._single_track_cap_enabled():
|
||||||
return
|
return
|
||||||
self._keep_best_active_track()
|
|
||||||
if self._active:
|
if self._active:
|
||||||
self._lost.clear()
|
self._lost.clear()
|
||||||
self._tentative.clear()
|
self._tentative.clear()
|
||||||
return
|
return
|
||||||
self._keep_best_lost_track()
|
|
||||||
if self._lost:
|
if self._lost:
|
||||||
self._tentative.clear()
|
self._tentative.clear()
|
||||||
return
|
|
||||||
self._keep_best_tentative_track()
|
|
||||||
|
|
||||||
def _predicted_pose_by_track(self) -> dict[int, np.ndarray]:
|
def _predicted_pose_by_track(self) -> dict[int, np.ndarray]:
|
||||||
result: dict[int, np.ndarray] = {}
|
result: dict[int, np.ndarray] = {}
|
||||||
@@ -278,7 +294,7 @@ class PoseTracker:
|
|||||||
self._diagnostics.proposal_build_seconds += perf_counter() - started_at
|
self._diagnostics.proposal_build_seconds += perf_counter() - started_at
|
||||||
|
|
||||||
def _birth_candidate_proposals(self, proposals: tuple[ProposalCluster, ...]) -> tuple[ProposalCluster, ...]:
|
def _birth_candidate_proposals(self, proposals: tuple[ProposalCluster, ...]) -> tuple[ProposalCluster, ...]:
|
||||||
if not self._single_person_mode():
|
if not self._single_track_cap_enabled():
|
||||||
return proposals
|
return proposals
|
||||||
if self._active or self._lost:
|
if self._active or self._lost:
|
||||||
return ()
|
return ()
|
||||||
@@ -286,12 +302,12 @@ class PoseTracker:
|
|||||||
return ()
|
return ()
|
||||||
return (max(proposals, key=_proposal_rank),)
|
return (max(proposals, key=_proposal_rank),)
|
||||||
|
|
||||||
def _refresh_single_person_track_from_proposals(
|
def _refresh_capped_single_track_from_proposals(
|
||||||
self,
|
self,
|
||||||
bundle: FrameBundle,
|
bundle: FrameBundle,
|
||||||
proposals: tuple[ProposalCluster, ...],
|
proposals: tuple[ProposalCluster, ...],
|
||||||
) -> tuple[ProposalCluster, ...]:
|
) -> tuple[ProposalCluster, ...]:
|
||||||
if not self._single_person_mode() or not proposals:
|
if not self._single_track_cap_enabled() or not proposals:
|
||||||
return proposals
|
return proposals
|
||||||
|
|
||||||
remaining = list(proposals)
|
remaining = list(proposals)
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
"""Test package for support helpers and test-local utilities."""
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
"""Test-only support helpers."""
|
||||||
@@ -1,18 +1,21 @@
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
import click
|
||||||
import cv2
|
import cv2
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import pyarrow.parquet as pq
|
import pyarrow.parquet as pq
|
||||||
from beartype import beartype
|
from beartype import beartype
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraCalibration, CameraFrame, FrameBundle, PoseDetection, SceneConfig
|
from pose_tracking_exp.common.normalization import infer_bbox_from_keypoints, normalize_rtmpose_body20
|
||||||
from pose_tracking_exp.normalization import infer_bbox_from_keypoints, normalize_rtmpose_body20
|
from pose_tracking_exp.schema import CameraCalibration, CameraFrame, FrameBundle, PoseDetection, SceneConfig, TrackerConfig
|
||||||
|
from pose_tracking_exp.tracking import PoseTracker
|
||||||
|
|
||||||
_NOMINAL_FRAME_PERIOD_NS = 33_333_333
|
_NOMINAL_FRAME_PERIOD_NS = 33_333_333
|
||||||
|
|
||||||
|
|
||||||
@beartype
|
@beartype
|
||||||
def load_actualtest_scene(root: Path) -> SceneConfig:
|
def load_actual_test_scene(root: Path) -> SceneConfig:
|
||||||
# ActualTest parquet comes from the ChArUco/OpenCV side, so `rvec` / `tvec`
|
# ActualTest parquet comes from the ChArUco/OpenCV side, so `rvec` / `tvec`
|
||||||
# are world->camera extrinsics. The RPT-facing camera pose is derived later
|
# are world->camera extrinsics. The RPT-facing camera pose is derived later
|
||||||
# from this canonical OpenCV form.
|
# from this canonical OpenCV form.
|
||||||
@@ -40,13 +43,14 @@ def load_actualtest_scene(root: Path) -> SceneConfig:
|
|||||||
|
|
||||||
|
|
||||||
@beartype
|
@beartype
|
||||||
def load_actualtest_segment_bundles(
|
def load_actual_test_segment_bundles(
|
||||||
root: Path,
|
root: Path,
|
||||||
segment_name: str,
|
segment_name: str,
|
||||||
*,
|
*,
|
||||||
frame_start: int = 690,
|
frame_start: int = 690,
|
||||||
frame_stop: int | None = None,
|
frame_stop: int | None = None,
|
||||||
max_frames: int | None = None,
|
max_frames: int | None = None,
|
||||||
|
min_cameras_with_rows: int = 1,
|
||||||
min_visible_joints: int = 6,
|
min_visible_joints: int = 6,
|
||||||
) -> list[FrameBundle]:
|
) -> list[FrameBundle]:
|
||||||
segment_root = root / segment_name
|
segment_root = root / segment_name
|
||||||
@@ -98,24 +102,31 @@ def load_actualtest_segment_bundles(
|
|||||||
if not by_camera:
|
if not by_camera:
|
||||||
return []
|
return []
|
||||||
|
|
||||||
common_frames = sorted(set.intersection(*(set(frames) for frames in by_camera.values())))
|
candidate_frames = sorted(set().union(*(set(frames) for frames in by_camera.values())))
|
||||||
|
if min_cameras_with_rows > 1:
|
||||||
|
candidate_frames = [
|
||||||
|
frame_index
|
||||||
|
for frame_index in candidate_frames
|
||||||
|
if sum(frame_index in frames for frames in by_camera.values()) >= min_cameras_with_rows
|
||||||
|
]
|
||||||
if max_frames is not None:
|
if max_frames is not None:
|
||||||
common_frames = common_frames[:max_frames]
|
candidate_frames = candidate_frames[:max_frames]
|
||||||
|
|
||||||
scene = load_actualtest_scene(root)
|
scene = load_actual_test_scene(root)
|
||||||
camera_by_name = {camera.name: camera for camera in scene.cameras}
|
camera_by_name = {camera.name: camera for camera in scene.cameras}
|
||||||
bundles: list[FrameBundle] = []
|
bundles: list[FrameBundle] = []
|
||||||
for bundle_index, frame_index in enumerate(common_frames):
|
ordered_camera_names = [camera.name for camera in scene.cameras]
|
||||||
|
for bundle_index, frame_index in enumerate(candidate_frames):
|
||||||
timestamp_unix_ns = bundle_index * _NOMINAL_FRAME_PERIOD_NS
|
timestamp_unix_ns = bundle_index * _NOMINAL_FRAME_PERIOD_NS
|
||||||
views: list[CameraFrame] = []
|
views: list[CameraFrame] = []
|
||||||
for camera_name in sorted(by_camera):
|
for camera_name in ordered_camera_names:
|
||||||
camera = camera_by_name[camera_name]
|
camera = camera_by_name[camera_name]
|
||||||
views.append(
|
views.append(
|
||||||
CameraFrame(
|
CameraFrame(
|
||||||
camera_name=camera_name,
|
camera_name=camera_name,
|
||||||
frame_index=frame_index,
|
frame_index=frame_index,
|
||||||
timestamp_unix_ns=timestamp_unix_ns,
|
timestamp_unix_ns=timestamp_unix_ns,
|
||||||
detections=by_camera[camera_name][frame_index],
|
detections=by_camera.get(camera_name, {}).get(frame_index, ()),
|
||||||
source_size=(camera.width, camera.height),
|
source_size=(camera.width, camera.height),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
@@ -127,3 +138,49 @@ def load_actualtest_segment_bundles(
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
return bundles
|
return bundles
|
||||||
|
|
||||||
|
|
||||||
|
@click.command()
|
||||||
|
@click.argument("root_path", type=click.Path(path_type=Path, exists=True, file_okay=False))
|
||||||
|
@click.option("--segment", "segment_name", default="Segment_1", show_default=True)
|
||||||
|
@click.option("--frame-start", default=690, type=int, show_default=True)
|
||||||
|
@click.option("--frame-stop", type=int)
|
||||||
|
@click.option("--max-frames", type=click.IntRange(min=1))
|
||||||
|
@click.option("--min-camera-rows", default=1, type=click.IntRange(min=1), show_default=True)
|
||||||
|
@click.option("--max-active-tracks", default=1, type=click.IntRange(min=1), show_default=True)
|
||||||
|
def main(
|
||||||
|
root_path: Path,
|
||||||
|
segment_name: str,
|
||||||
|
frame_start: int,
|
||||||
|
frame_stop: int | None,
|
||||||
|
max_frames: int | None,
|
||||||
|
min_camera_rows: int,
|
||||||
|
max_active_tracks: int,
|
||||||
|
) -> None:
|
||||||
|
logger.remove()
|
||||||
|
logger.add(
|
||||||
|
click.get_text_stream("stderr"),
|
||||||
|
level="INFO",
|
||||||
|
format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}",
|
||||||
|
)
|
||||||
|
scene = load_actual_test_scene(root_path)
|
||||||
|
bundles = load_actual_test_segment_bundles(
|
||||||
|
root_path,
|
||||||
|
segment_name,
|
||||||
|
frame_start=frame_start,
|
||||||
|
frame_stop=frame_stop,
|
||||||
|
max_frames=max_frames,
|
||||||
|
min_cameras_with_rows=min_camera_rows,
|
||||||
|
)
|
||||||
|
tracker = PoseTracker(scene, TrackerConfig(max_active_tracks=max_active_tracks))
|
||||||
|
results = tracker.run(bundles)
|
||||||
|
logger.info(
|
||||||
|
"actual_test bundles={} active_frames={} proposal_frames={}",
|
||||||
|
len(results),
|
||||||
|
sum(1 for result in results if result.active_tracks),
|
||||||
|
sum(1 for result in results if result.proposals),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -4,8 +4,8 @@ import numpy as np
|
|||||||
import pyarrow as pa
|
import pyarrow as pa
|
||||||
import pyarrow.parquet as pq
|
import pyarrow.parquet as pq
|
||||||
|
|
||||||
from pose_tracking_exp.actualtest import load_actualtest_scene, load_actualtest_segment_bundles
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME
|
from tests.support.actual_test import load_actual_test_scene, load_actual_test_segment_bundles
|
||||||
|
|
||||||
|
|
||||||
def _write_parquet(path: Path, rows: list[dict[str, object]]) -> None:
|
def _write_parquet(path: Path, rows: list[dict[str, object]]) -> None:
|
||||||
@@ -25,7 +25,7 @@ def _sample_rtmpose_detection() -> tuple[list[float], list[list[float]], list[fl
|
|||||||
return [8.0, 4.0, 32.0, 64.0], keypoints_xy.tolist(), scores.tolist()
|
return [8.0, 4.0, 32.0, 64.0], keypoints_xy.tolist(), scores.tolist()
|
||||||
|
|
||||||
|
|
||||||
def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None:
|
def test_load_actual_test_parquet_scene_and_segment(tmp_path: Path) -> None:
|
||||||
root = tmp_path / "ActualTest_WeiHua"
|
root = tmp_path / "ActualTest_WeiHua"
|
||||||
_write_parquet(
|
_write_parquet(
|
||||||
root / "camera_params" / "camera_params.parquet",
|
root / "camera_params" / "camera_params.parquet",
|
||||||
@@ -62,8 +62,8 @@ def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None:
|
|||||||
],
|
],
|
||||||
)
|
)
|
||||||
|
|
||||||
scene = load_actualtest_scene(root)
|
scene = load_actual_test_scene(root)
|
||||||
bundles = load_actualtest_segment_bundles(root, "Segment_1", frame_start=690, max_frames=1)
|
bundles = load_actual_test_segment_bundles(root, "Segment_1", frame_start=690, max_frames=1)
|
||||||
|
|
||||||
assert [camera.name for camera in scene.cameras] == ["5602", "5603"]
|
assert [camera.name for camera in scene.cameras] == ["5602", "5603"]
|
||||||
np.testing.assert_allclose(scene.cameras[0].pose_T, [0.0, 0.0, 0.0])
|
np.testing.assert_allclose(scene.cameras[0].pose_T, [0.0, 0.0, 0.0])
|
||||||
@@ -75,3 +75,53 @@ def test_load_actualtest_parquet_scene_and_segment(tmp_path: Path) -> None:
|
|||||||
bundles[0].views[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2],
|
bundles[0].views[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2],
|
||||||
[20.0, 60.0],
|
[20.0, 60.0],
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_actual_test_keeps_partial_camera_frames(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "ActualTest_WeiHua"
|
||||||
|
_write_parquet(
|
||||||
|
root / "camera_params" / "camera_params.parquet",
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"name": "AF_02",
|
||||||
|
"port": 5602,
|
||||||
|
"intrinsic": {
|
||||||
|
"camera_matrix": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
|
||||||
|
"distortion_coefficients": [0.0, 0.0, 0.0, 0.0, 0.0],
|
||||||
|
},
|
||||||
|
"extrinsic": {"rvec": [0.0, 0.0, 0.0], "tvec": [0.0, 0.0, 0.0]},
|
||||||
|
"resolution": {"width": 640, "height": 480},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "AF_03",
|
||||||
|
"port": 5603,
|
||||||
|
"intrinsic": {
|
||||||
|
"camera_matrix": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
|
||||||
|
"distortion_coefficients": [0.0, 0.0, 0.0, 0.0, 0.0],
|
||||||
|
},
|
||||||
|
"extrinsic": {"rvec": [0.0, 0.0, 0.0], "tvec": [1.0, 0.0, 0.0]},
|
||||||
|
"resolution": {"width": 640, "height": 480},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
)
|
||||||
|
box, keypoints_xy, scores = _sample_rtmpose_detection()
|
||||||
|
_write_parquet(
|
||||||
|
root / "Segment_1" / "5602_detected.parquet",
|
||||||
|
[
|
||||||
|
{"frame_index": 690, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
|
||||||
|
{"frame_index": 691, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
|
||||||
|
],
|
||||||
|
)
|
||||||
|
_write_parquet(
|
||||||
|
root / "Segment_1" / "5603_detected.parquet",
|
||||||
|
[
|
||||||
|
{"frame_index": 690, "boxes": [box], "kps": [keypoints_xy], "kps_scores": [scores]},
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
bundles = load_actual_test_segment_bundles(root, "Segment_1", frame_start=690)
|
||||||
|
|
||||||
|
assert [bundle.views[0].frame_index for bundle in bundles] == [690, 691]
|
||||||
|
assert [view.camera_name for view in bundles[1].views] == ["5602", "5603"]
|
||||||
|
assert len(bundles[1].views[0].detections) == 1
|
||||||
|
assert bundles[1].views[1].detections == ()
|
||||||
@@ -8,9 +8,9 @@ import pytest
|
|||||||
|
|
||||||
pytest.importorskip("rpt")
|
pytest.importorskip("rpt")
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraCalibration, SceneConfig
|
from pose_tracking_exp.schema import CameraCalibration, CameraModel, SceneConfig, parse_camera_model
|
||||||
from pose_tracking_exp.replay import load_scene_file
|
from pose_tracking_exp.tracking.replay_io import load_scene_file
|
||||||
from pose_tracking_exp.rpt_adapter import build_rpt_config
|
from pose_tracking_exp.tracking.rpt_adapter import build_rpt_config
|
||||||
|
|
||||||
|
|
||||||
class _CameraArgs(NamedTuple):
|
class _CameraArgs(NamedTuple):
|
||||||
@@ -19,7 +19,7 @@ class _CameraArgs(NamedTuple):
|
|||||||
height: int
|
height: int
|
||||||
K: np.ndarray
|
K: np.ndarray
|
||||||
DC: np.ndarray
|
DC: np.ndarray
|
||||||
model: str
|
model: CameraModel
|
||||||
|
|
||||||
|
|
||||||
def _camera_args() -> _CameraArgs:
|
def _camera_args() -> _CameraArgs:
|
||||||
@@ -29,7 +29,7 @@ def _camera_args() -> _CameraArgs:
|
|||||||
height=480,
|
height=480,
|
||||||
K=np.asarray([[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]], dtype=np.float64),
|
K=np.asarray([[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]], dtype=np.float64),
|
||||||
DC=np.zeros(5, dtype=np.float64),
|
DC=np.zeros(5, dtype=np.float64),
|
||||||
model="pinhole",
|
model=parse_camera_model("pinhole"),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -139,7 +139,7 @@ def test_build_rpt_config_uses_pose_convention(monkeypatch: pytest.MonkeyPatch)
|
|||||||
captured["min_group_size"] = min_group_size
|
captured["min_group_size"] = min_group_size
|
||||||
return captured
|
return captured
|
||||||
|
|
||||||
monkeypatch.setattr("pose_tracking_exp.rpt_adapter.rpt.make_triangulation_config", fake_make_triangulation_config)
|
monkeypatch.setattr("pose_tracking_exp.tracking.rpt_adapter.rpt.make_triangulation_config", fake_make_triangulation_config)
|
||||||
|
|
||||||
build_rpt_config(scene, min_match_score=0.5, min_group_size=2)
|
build_rpt_config(scene, min_match_score=0.5, min_group_size=2)
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,223 @@
|
|||||||
|
from collections.abc import AsyncIterator, Sequence
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import anyio
|
||||||
|
import numpy as np
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from pose_tracking_exp.detection.config import (
|
||||||
|
DetectionRunnerConfig,
|
||||||
|
load_detection_runner_config,
|
||||||
|
resolve_instances,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.detection.runner import (
|
||||||
|
PendingFrame,
|
||||||
|
SourceSlot,
|
||||||
|
run_detection_runner,
|
||||||
|
store_latest_frame,
|
||||||
|
take_pending_batch,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections, SourceFrame
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_detection_runner_config_from_toml_and_env(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
config_path = tmp_path / "runner.toml"
|
||||||
|
config_path.write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
'instances = ["front_left", "front_right"]',
|
||||||
|
'device = "cuda:1"',
|
||||||
|
'nats_host = "nats://localhost:4222"',
|
||||||
|
'yolo_checkpoint = "checkpoint/yolo/yolo11_mix_epoch10.pt"',
|
||||||
|
'pose_checkpoint = "checkpoint/dwpose/best_coco-wholebody_AP_epoch_50.pth"',
|
||||||
|
"bbox_area_threshold = 2500",
|
||||||
|
"max_batch_frames = 6",
|
||||||
|
"max_batch_wait_ms = 3",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
monkeypatch.setenv("POSE_TRACKING_EXP_DETECTION_DEVICE", "cpu")
|
||||||
|
config = load_detection_runner_config(config_path)
|
||||||
|
|
||||||
|
assert config.instances == ("front_left", "front_right")
|
||||||
|
assert config.device == "cpu"
|
||||||
|
assert config.nats_host == "nats://localhost:4222"
|
||||||
|
assert config.bbox_area_threshold == 2500
|
||||||
|
assert config.max_batch_frames == 6
|
||||||
|
assert config.max_batch_wait_ms == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_instances_prefers_cli_values() -> None:
|
||||||
|
assert resolve_instances(("cli_a", "cli_b"), ("cfg_a",)) == ("cli_a", "cli_b")
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_instances_falls_back_to_config_values() -> None:
|
||||||
|
assert resolve_instances((), ("cfg_a", "cfg_b")) == ("cfg_a", "cfg_b")
|
||||||
|
|
||||||
|
|
||||||
|
def test_store_latest_frame_overwrites_pending_frame() -> None:
|
||||||
|
slot = SourceSlot(source_name="front_left")
|
||||||
|
first = SourceFrame(
|
||||||
|
source_name="front_left",
|
||||||
|
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
|
||||||
|
frame_index=1,
|
||||||
|
timestamp_unix_ns=100,
|
||||||
|
)
|
||||||
|
second = SourceFrame(
|
||||||
|
source_name="front_left",
|
||||||
|
image_bgr=np.ones((1, 1, 3), dtype=np.uint8),
|
||||||
|
frame_index=2,
|
||||||
|
timestamp_unix_ns=200,
|
||||||
|
)
|
||||||
|
|
||||||
|
store_latest_frame(slot, first)
|
||||||
|
store_latest_frame(slot, second)
|
||||||
|
|
||||||
|
assert slot.received_frames == 2
|
||||||
|
assert slot.dropped_frames == 1
|
||||||
|
assert slot.pending_frame is not None
|
||||||
|
assert slot.pending_frame.frame is second
|
||||||
|
|
||||||
|
|
||||||
|
def test_take_pending_batch_collects_at_most_one_frame_per_source() -> None:
|
||||||
|
slots = {
|
||||||
|
"front_left": SourceSlot(
|
||||||
|
source_name="front_left",
|
||||||
|
pending_frame=PendingFrame(
|
||||||
|
source_name="front_left",
|
||||||
|
frame=SourceFrame(
|
||||||
|
source_name="front_left",
|
||||||
|
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
|
||||||
|
frame_index=11,
|
||||||
|
timestamp_unix_ns=110,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
"front_right": SourceSlot(
|
||||||
|
source_name="front_right",
|
||||||
|
pending_frame=PendingFrame(
|
||||||
|
source_name="front_right",
|
||||||
|
frame=SourceFrame(
|
||||||
|
source_name="front_right",
|
||||||
|
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
|
||||||
|
frame_index=22,
|
||||||
|
timestamp_unix_ns=220,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
"rear": SourceSlot(
|
||||||
|
source_name="rear",
|
||||||
|
pending_frame=PendingFrame(
|
||||||
|
source_name="rear",
|
||||||
|
frame=SourceFrame(
|
||||||
|
source_name="rear",
|
||||||
|
image_bgr=np.zeros((1, 1, 3), dtype=np.uint8),
|
||||||
|
frame_index=33,
|
||||||
|
timestamp_unix_ns=330,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
batch = take_pending_batch(slots, max_batch_frames=2)
|
||||||
|
|
||||||
|
assert [frame.source_name for frame in batch] == ["front_left", "front_right"]
|
||||||
|
assert slots["front_left"].pending_frame is None
|
||||||
|
assert slots["front_right"].pending_frame is None
|
||||||
|
assert slots["rear"].pending_frame is not None
|
||||||
|
|
||||||
|
|
||||||
|
class StubSource:
|
||||||
|
def __init__(self, source_name: str, frames: tuple[SourceFrame, ...]) -> None:
|
||||||
|
self.source_name = source_name
|
||||||
|
self._frames = frames
|
||||||
|
|
||||||
|
async def frames(self) -> AsyncIterator[SourceFrame]:
|
||||||
|
for frame in self._frames:
|
||||||
|
yield frame
|
||||||
|
|
||||||
|
|
||||||
|
class StubPoseShim:
|
||||||
|
def process_many(self, frames: Sequence[SourceFrame]) -> list[PoseDetections]:
|
||||||
|
detections: list[PoseDetections] = []
|
||||||
|
for frame in frames:
|
||||||
|
detections.append(
|
||||||
|
PoseDetections(
|
||||||
|
source_name=frame.source_name,
|
||||||
|
frame_index=frame.frame_index,
|
||||||
|
source_size=(frame.image_bgr.shape[1], frame.image_bgr.shape[0]),
|
||||||
|
boxes_xyxy=np.asarray([[0.0, 0.0, 10.0, 10.0]], dtype=np.float32),
|
||||||
|
box_scores=np.asarray([1.0], dtype=np.float32),
|
||||||
|
keypoints_xy=np.zeros((1, 133, 2), dtype=np.float32),
|
||||||
|
keypoint_scores=np.ones((1, 133), dtype=np.float32),
|
||||||
|
timestamp_unix_ns=frame.timestamp_unix_ns,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return detections
|
||||||
|
|
||||||
|
|
||||||
|
class StubSink:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.messages: list[PoseDetections] = []
|
||||||
|
self.closed = False
|
||||||
|
|
||||||
|
async def publish_pose(self, detections: PoseDetections) -> None:
|
||||||
|
self.messages.append(detections)
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
self.closed = True
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_detection_runner_publishes_payloads() -> None:
|
||||||
|
sink = StubSink()
|
||||||
|
sources = (
|
||||||
|
StubSource(
|
||||||
|
"cam0",
|
||||||
|
(
|
||||||
|
SourceFrame(
|
||||||
|
source_name="cam0",
|
||||||
|
image_bgr=np.zeros((2, 3, 3), dtype=np.uint8),
|
||||||
|
frame_index=1,
|
||||||
|
timestamp_unix_ns=100,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
StubSource(
|
||||||
|
"cam1",
|
||||||
|
(
|
||||||
|
SourceFrame(
|
||||||
|
source_name="cam1",
|
||||||
|
image_bgr=np.zeros((2, 3, 3), dtype=np.uint8),
|
||||||
|
frame_index=2,
|
||||||
|
timestamp_unix_ns=200,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
config = DetectionRunnerConfig(
|
||||||
|
instances=("cam0", "cam1"),
|
||||||
|
pose_config_path=Path(__file__),
|
||||||
|
yolo_checkpoint=Path(__file__),
|
||||||
|
pose_checkpoint=Path(__file__),
|
||||||
|
max_batch_frames=2,
|
||||||
|
)
|
||||||
|
|
||||||
|
anyio.run(
|
||||||
|
run_detection_runner,
|
||||||
|
sources,
|
||||||
|
StubPoseShim(),
|
||||||
|
sink,
|
||||||
|
config,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert sink.closed is True
|
||||||
|
assert [(item.source_name, item.frame_index, item.timestamp_unix_ns) for item in sink.messages] == [
|
||||||
|
("cam0", 1, 100),
|
||||||
|
("cam1", 2, 200),
|
||||||
|
]
|
||||||
@@ -0,0 +1,137 @@
|
|||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import anyio
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
import pyarrow.parquet as pq
|
||||||
|
|
||||||
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
|
from pose_tracking_exp.detection.sinks import ParquetPoseSink
|
||||||
|
from pose_tracking_exp.detection.sources import VideoFrameSource
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
from pose_tracking_exp.tracking import load_replay_file
|
||||||
|
|
||||||
|
|
||||||
|
def _write_synthetic_video(path: Path) -> None:
|
||||||
|
writer = cv2.VideoWriter(
|
||||||
|
str(path),
|
||||||
|
cv2.VideoWriter.fourcc(*"MJPG"),
|
||||||
|
10.0,
|
||||||
|
(8, 6),
|
||||||
|
)
|
||||||
|
if not writer.isOpened():
|
||||||
|
raise RuntimeError("Could not open synthetic video writer.")
|
||||||
|
try:
|
||||||
|
for frame_index in range(3):
|
||||||
|
frame = np.full((6, 8, 3), frame_index * 32, dtype=np.uint8)
|
||||||
|
writer.write(frame)
|
||||||
|
finally:
|
||||||
|
writer.release()
|
||||||
|
|
||||||
|
|
||||||
|
def _sample_wholebody_detection(*, source_name: str, frame_index: int) -> PoseDetections:
|
||||||
|
keypoints_xy = np.zeros((1, 133, 2), dtype=np.float32)
|
||||||
|
keypoint_scores = np.zeros((1, 133), dtype=np.float32)
|
||||||
|
keypoints_xy[0, 5] = [10.0, 20.0]
|
||||||
|
keypoints_xy[0, 6] = [30.0, 20.0]
|
||||||
|
keypoints_xy[0, 11] = [12.0, 60.0]
|
||||||
|
keypoints_xy[0, 12] = [28.0, 60.0]
|
||||||
|
keypoints_xy[0, 0] = [20.0, 8.0]
|
||||||
|
keypoint_scores[0, [0, 5, 6, 11, 12]] = 1.0
|
||||||
|
return PoseDetections(
|
||||||
|
source_name=source_name,
|
||||||
|
frame_index=frame_index,
|
||||||
|
source_size=(640, 480),
|
||||||
|
boxes_xyxy=np.asarray([[8.0, 4.0, 32.0, 64.0]], dtype=np.float32),
|
||||||
|
box_scores=np.asarray([0.9], dtype=np.float32),
|
||||||
|
keypoints_xy=keypoints_xy,
|
||||||
|
keypoint_scores=keypoint_scores,
|
||||||
|
timestamp_unix_ns=frame_index * 100_000_000,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_video_frame_source_reads_frames(tmp_path: Path) -> None:
|
||||||
|
video_path = tmp_path / "cam0.avi"
|
||||||
|
_write_synthetic_video(video_path)
|
||||||
|
source = VideoFrameSource(video_path, source_name="cam0")
|
||||||
|
|
||||||
|
async def collect() -> list[tuple[str, int, int, tuple[int, int, int]]]:
|
||||||
|
frames: list[tuple[str, int, int, tuple[int, int, int]]] = []
|
||||||
|
async for frame in source.frames():
|
||||||
|
frames.append(
|
||||||
|
(
|
||||||
|
frame.source_name,
|
||||||
|
frame.frame_index,
|
||||||
|
frame.timestamp_unix_ns,
|
||||||
|
frame.image_bgr.shape,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return frames
|
||||||
|
|
||||||
|
frames = anyio.run(collect)
|
||||||
|
|
||||||
|
assert [item[0] for item in frames] == ["cam0", "cam0", "cam0"]
|
||||||
|
assert [item[1] for item in frames] == [0, 1, 2]
|
||||||
|
assert [item[3] for item in frames] == [(6, 8, 3), (6, 8, 3), (6, 8, 3)]
|
||||||
|
assert frames[0][2] <= frames[1][2] <= frames[2][2]
|
||||||
|
|
||||||
|
|
||||||
|
def test_parquet_sink_round_trips_into_tracking_replay(tmp_path: Path) -> None:
|
||||||
|
output_dir = tmp_path / "detections"
|
||||||
|
sink = ParquetPoseSink(output_dir, flush_rows=1)
|
||||||
|
|
||||||
|
async def write_rows() -> None:
|
||||||
|
await sink.publish_pose(_sample_wholebody_detection(source_name="cam0", frame_index=0))
|
||||||
|
await sink.publish_pose(
|
||||||
|
PoseDetections(
|
||||||
|
source_name="cam0",
|
||||||
|
frame_index=1,
|
||||||
|
source_size=(640, 480),
|
||||||
|
boxes_xyxy=np.empty((0, 4), dtype=np.float32),
|
||||||
|
box_scores=np.empty((0,), dtype=np.float32),
|
||||||
|
keypoints_xy=np.empty((0, 133, 2), dtype=np.float32),
|
||||||
|
keypoint_scores=np.empty((0, 133), dtype=np.float32),
|
||||||
|
timestamp_unix_ns=100_000_000,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
await sink.aclose()
|
||||||
|
|
||||||
|
anyio.run(write_rows)
|
||||||
|
|
||||||
|
parquet_path = output_dir / "cam0_detected.parquet"
|
||||||
|
assert parquet_path.exists()
|
||||||
|
assert pq.read_table(parquet_path).num_rows == 2
|
||||||
|
|
||||||
|
scene_path = tmp_path / "scene.json"
|
||||||
|
scene_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"room_size": [6.0, 4.0, 3.0],
|
||||||
|
"room_center": [0.0, 0.0, 1.0],
|
||||||
|
"cameras": [
|
||||||
|
{
|
||||||
|
"name": "cam0",
|
||||||
|
"width": 640,
|
||||||
|
"height": 480,
|
||||||
|
"K": [[500.0, 0.0, 320.0], [0.0, 500.0, 240.0], [0.0, 0.0, 1.0]],
|
||||||
|
"DC": [0.0, 0.0, 0.0, 0.0, 0.0],
|
||||||
|
"R": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]],
|
||||||
|
"T": [[0.0], [0.0], [0.0]],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
replay = load_replay_file(scene_path, output_dir)
|
||||||
|
frames = replay.frames_by_camera["cam0"]
|
||||||
|
assert [frame.frame_index for frame in frames] == [0, 1]
|
||||||
|
assert frames[1].detections == ()
|
||||||
|
np.testing.assert_allclose(
|
||||||
|
frames[0].detections[0].keypoints[BODY20_INDEX_BY_NAME["hip_middle"], :2],
|
||||||
|
[20.0, 60.0],
|
||||||
|
)
|
||||||
@@ -1,7 +1,7 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
from pose_tracking_exp.kinematics import seed_state_from_pose3d
|
from pose_tracking_exp.tracking.kinematics import seed_state_from_pose3d
|
||||||
|
|
||||||
|
|
||||||
def _sample_pose3d() -> np.ndarray:
|
def _sample_pose3d() -> np.ndarray:
|
||||||
@@ -38,7 +38,7 @@ def test_seed_state_from_pose3d_does_not_call_least_squares(monkeypatch) -> None
|
|||||||
def fail_least_squares(*args: object, **kwargs: object) -> object:
|
def fail_least_squares(*args: object, **kwargs: object) -> object:
|
||||||
raise AssertionError("seed_state_from_pose3d should not call scipy.optimize.least_squares")
|
raise AssertionError("seed_state_from_pose3d should not call scipy.optimize.least_squares")
|
||||||
|
|
||||||
monkeypatch.setattr("pose_tracking_exp.kinematics.least_squares", fail_least_squares)
|
monkeypatch.setattr("pose_tracking_exp.tracking.kinematics.least_squares", fail_least_squares)
|
||||||
state = seed_state_from_pose3d(_sample_pose3d())
|
state = seed_state_from_pose3d(_sample_pose3d())
|
||||||
|
|
||||||
assert state.parameters.shape == (31,)
|
assert state.parameters.shape == (31,)
|
||||||
|
|||||||
@@ -4,11 +4,17 @@ from pathlib import Path
|
|||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
from pose_tracking_exp.normalization import normalize_rtmpose_body20
|
from pose_tracking_exp.common.normalization import normalize_coco_body20, normalize_rtmpose_body20
|
||||||
from pose_tracking_exp.parajumping import PROTOCOL_HEADER, convert_payload_record, decode_pose_payload
|
from pose_tracking_exp.detection.cvmmap_payload import (
|
||||||
from pose_tracking_exp.replay import load_replay_file, load_scene_file
|
COCO_WHOLEBODY_KEYPOINT_COUNT,
|
||||||
from pose_tracking_exp.sync import synchronize_frames
|
PROTOCOL_HEADER,
|
||||||
|
CvmmapPosePayloadCodec,
|
||||||
|
convert_payload_record,
|
||||||
|
decode_pose_payload,
|
||||||
|
)
|
||||||
|
from pose_tracking_exp.schema.detection import PoseDetections
|
||||||
|
from pose_tracking_exp.tracking import load_replay_file, load_scene_file, synchronize_frames
|
||||||
|
|
||||||
|
|
||||||
def _encode_payload(
|
def _encode_payload(
|
||||||
@@ -31,7 +37,7 @@ def _encode_payload(
|
|||||||
+ np.asarray(box_scores, dtype=np.uint8).tobytes()
|
+ np.asarray(box_scores, dtype=np.uint8).tobytes()
|
||||||
+ int(keypoints_xy.shape[0]).to_bytes(1, "little")
|
+ int(keypoints_xy.shape[0]).to_bytes(1, "little")
|
||||||
+ np.asarray(keypoints_xy, dtype="<u2").tobytes()
|
+ np.asarray(keypoints_xy, dtype="<u2").tobytes()
|
||||||
+ int(keypoint_scores.size).to_bytes(1, "little")
|
+ int(keypoint_scores.shape[0]).to_bytes(1, "little")
|
||||||
+ np.asarray(keypoint_scores, dtype=np.uint8).reshape(-1).tobytes()
|
+ np.asarray(keypoint_scores, dtype=np.uint8).reshape(-1).tobytes()
|
||||||
+ int(timestamp_unix_ns).to_bytes(8, "little")
|
+ int(timestamp_unix_ns).to_bytes(8, "little")
|
||||||
)
|
)
|
||||||
@@ -54,6 +60,23 @@ def test_normalize_rtmpose_body20_derives_midpoints_and_head():
|
|||||||
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0])
|
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0])
|
||||||
|
|
||||||
|
|
||||||
|
def test_normalize_coco17_body20_derives_midpoints_and_head():
|
||||||
|
keypoints = np.zeros((17, 2), dtype=np.float64)
|
||||||
|
scores = np.zeros((17,), dtype=np.float64)
|
||||||
|
keypoints[5] = [10.0, 20.0]
|
||||||
|
keypoints[6] = [30.0, 20.0]
|
||||||
|
keypoints[11] = [12.0, 60.0]
|
||||||
|
keypoints[12] = [28.0, 60.0]
|
||||||
|
keypoints[0] = [20.0, 8.0]
|
||||||
|
scores[[0, 5, 6, 11, 12]] = 1.0
|
||||||
|
|
||||||
|
normalized = normalize_coco_body20(keypoints, scores, keypoint_schema="coco17")
|
||||||
|
|
||||||
|
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["hip_middle"], :2], [20.0, 60.0])
|
||||||
|
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["shoulder_middle"], :2], [20.0, 20.0])
|
||||||
|
np.testing.assert_allclose(normalized[BODY20_INDEX_BY_NAME["head"], :2], [20.0, 8.0])
|
||||||
|
|
||||||
|
|
||||||
def test_decode_payload_and_convert_record():
|
def test_decode_payload_and_convert_record():
|
||||||
keypoints_xy = np.zeros((1, 133, 2), dtype=np.uint16)
|
keypoints_xy = np.zeros((1, 133, 2), dtype=np.uint16)
|
||||||
keypoint_scores = np.zeros((1, 133), dtype=np.uint8)
|
keypoint_scores = np.zeros((1, 133), dtype=np.uint8)
|
||||||
@@ -87,6 +110,26 @@ def test_decode_payload_and_convert_record():
|
|||||||
assert converted["frame_index"] == 7
|
assert converted["frame_index"] == 7
|
||||||
|
|
||||||
|
|
||||||
|
def test_encode_pose_payload_requires_coco_wholebody133():
|
||||||
|
codec = CvmmapPosePayloadCodec()
|
||||||
|
detections = PoseDetections(
|
||||||
|
source_name="cam0",
|
||||||
|
frame_index=1,
|
||||||
|
source_size=(640, 480),
|
||||||
|
boxes_xyxy=np.zeros((1, 4), dtype=np.float32),
|
||||||
|
box_scores=np.ones((1,), dtype=np.float32),
|
||||||
|
keypoints_xy=np.zeros((1, COCO_WHOLEBODY_KEYPOINT_COUNT, 2), dtype=np.float32),
|
||||||
|
keypoint_scores=np.ones((1, COCO_WHOLEBODY_KEYPOINT_COUNT), dtype=np.float32),
|
||||||
|
timestamp_unix_ns=123,
|
||||||
|
keypoint_schema="coco_wholebody133",
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = codec.encode(detections)
|
||||||
|
decoded = decode_pose_payload(payload)
|
||||||
|
assert decoded.frame_index == 1
|
||||||
|
assert decoded.reference_size == (640, 480)
|
||||||
|
|
||||||
|
|
||||||
def test_load_replay_and_synchronize(tmp_path: Path):
|
def test_load_replay_and_synchronize(tmp_path: Path):
|
||||||
scene_path = tmp_path / "scene.json"
|
scene_path = tmp_path / "scene.json"
|
||||||
replay_path = tmp_path / "replay.jsonl"
|
replay_path = tmp_path / "replay.jsonl"
|
||||||
@@ -153,4 +196,3 @@ def test_load_replay_and_synchronize(tmp_path: Path):
|
|||||||
bundles = synchronize_frames(replay, max_skew_ns=20, min_views=2)
|
bundles = synchronize_frames(replay, max_skew_ns=20, min_views=2)
|
||||||
assert len(bundles) == 1
|
assert len(bundles) == 1
|
||||||
assert {frame.camera_name for frame in bundles[0].views} == {"cam0", "cam1"}
|
assert {frame.camera_name for frame in bundles[0].views} == {"cam0", "cam1"}
|
||||||
|
|
||||||
|
|||||||
@@ -5,9 +5,9 @@ import pytest
|
|||||||
|
|
||||||
pytest.importorskip("rpt")
|
pytest.importorskip("rpt")
|
||||||
|
|
||||||
from pose_tracking_exp.joints import BODY20_INDEX_BY_NAME
|
from pose_tracking_exp.common.joints import BODY20_INDEX_BY_NAME
|
||||||
from pose_tracking_exp.models import CameraCalibration, CameraFrame, FrameBundle, ProposalCluster, SceneConfig, TrackerConfig
|
from pose_tracking_exp.schema import CameraCalibration, CameraFrame, FrameBundle, ProposalCluster, SceneConfig, TrackerConfig
|
||||||
from pose_tracking_exp.tracker import PoseTracker
|
from pose_tracking_exp.tracking import PoseTracker
|
||||||
|
|
||||||
|
|
||||||
def _make_scene() -> SceneConfig:
|
def _make_scene() -> SceneConfig:
|
||||||
@@ -96,7 +96,7 @@ def test_single_person_mode_caps_active_tracks(monkeypatch) -> None:
|
|||||||
tracker = PoseTracker(
|
tracker = PoseTracker(
|
||||||
_make_scene(),
|
_make_scene(),
|
||||||
TrackerConfig(
|
TrackerConfig(
|
||||||
mode="single_person",
|
max_active_tracks=1,
|
||||||
tentative_min_age=1,
|
tentative_min_age=1,
|
||||||
tentative_hits_required=1,
|
tentative_hits_required=1,
|
||||||
tentative_promote_score=0.0,
|
tentative_promote_score=0.0,
|
||||||
@@ -127,7 +127,7 @@ def test_single_person_mode_reuses_lost_track_id(monkeypatch) -> None:
|
|||||||
tracker = PoseTracker(
|
tracker = PoseTracker(
|
||||||
_make_scene(),
|
_make_scene(),
|
||||||
TrackerConfig(
|
TrackerConfig(
|
||||||
mode="single_person",
|
max_active_tracks=1,
|
||||||
tentative_min_age=1,
|
tentative_min_age=1,
|
||||||
tentative_hits_required=1,
|
tentative_hits_required=1,
|
||||||
tentative_promote_score=0.0,
|
tentative_promote_score=0.0,
|
||||||
|
|||||||
@@ -6,9 +6,9 @@ import pytest
|
|||||||
|
|
||||||
pytest.importorskip("rpt")
|
pytest.importorskip("rpt")
|
||||||
|
|
||||||
from pose_tracking_exp.models import CameraFrame, FrameBundle, PoseDetection, TrackerConfig
|
from pose_tracking_exp.schema import CameraFrame, FrameBundle, PoseDetection, TrackerConfig
|
||||||
from pose_tracking_exp.replay import load_scene_file
|
from pose_tracking_exp.tracking import PoseTracker
|
||||||
from pose_tracking_exp.tracker import PoseTracker
|
from pose_tracking_exp.tracking.replay_io import load_scene_file
|
||||||
|
|
||||||
RPT_ROOT = Path("/home/crosstyan/Code/RapidPoseTriangulation")
|
RPT_ROOT = Path("/home/crosstyan/Code/RapidPoseTriangulation")
|
||||||
|
|
||||||
|
|||||||
@@ -1,3 +1 @@
|
|||||||
[[index]]
|
no-build-isolation-package = ["chumpy", "xtcocotools"]
|
||||||
url = "https://pypi.org/simple"
|
|
||||||
default = true
|
|
||||||
|
|||||||
BIN
Binary file not shown.
BIN
Binary file not shown.
Reference in New Issue
Block a user