70 KiB
Real-Time Scoliosis Screening Pipeline (ScoNet)
TL;DR
Quick Summary: Build a production-oriented, real-time scoliosis screening pipeline inside the OpenGait repo. Reads video from cv-mmap shared memory or OpenCV, runs YOLO11n-seg for person detection/segmentation/tracking, extracts 64×44 binary silhouettes, feeds a sliding window of 30 frames into a DDP-free ScoNet wrapper, and publishes classification results (positive/neutral/negative) as JSON over NATS.
Deliverables:
ScoNetDemo— standalonenn.Modulewrapper for ScoNet inference (no DDP)- Silhouette preprocessing module — deterministic mask→64×44 float tensor pipeline
- Ring buffer / sliding window manager — per-track frame accumulation with reset logic
- Input adapters — cv-mmap async client + OpenCV VideoCapture fallback
- NATS publisher — JSON result output
- Main pipeline application — orchestrates all components
- pytest test suite — preprocessing, windowing, single-person policy, recovery
- Sample video for smoke testing
Estimated Effort: Large Parallel Execution: YES — 4 waves Critical Path: Task 1 (ScoNetDemo) → Task 3 (Silhouette Preprocess) → Task 5 (Ring Buffer) → Task 7 (Pipeline App) → Task 10 (Integration Tests)
Context
Original Request
Build a camera-to-classification pipeline for scoliosis screening using OpenGait's ScoNet model, targeting Jetson AGX Orin deployment, with cv-mmap shared-memory video framework integration.
Interview Summary
Key Discussions:
- Input: Both camera (via cv-mmap) and video files (OpenCV fallback). Single person only.
- CV Stack: YOLO11n-seg with built-in ByteTrack (replaces paper's BYTETracker + PP-HumanSeg)
- Inference: Sliding window of 30 frames, continuous classification
- Output: JSON over NATS (decided over binary protocol — simpler, cross-language)
- DDP Bypass: Create
ScoNetDemo(nn.Module)following All-in-One-Gait'sBaselineDemopattern - Build Location: Inside repo (opengait lacks
__init__.py, config system hardcodes paths) - Test Strategy: pytest, tests after implementation
- Hardware: Dev on i7-14700KF + RTX 5070 Ti/3090; deploy on Jetson AGX Orin
Research Findings:
- ScoNet input:
[N, 1, S, 64, 44]float32 [0,1]. Output:logits [N, 3, 16]→argmax(mean(-1))→ class index .pklpreprocessing: grayscale → crop vertical → resize h=64 → center-crop w=64 → cut 10px sides → /255.0BaseSilCuttingTransform: cutsint(W // 64) * 10px each side + divides by 255- All-in-One-Gait
BaselineDemo: extendsnn.Module, usestorch.load()+load_state_dict(),training=False - YOLO11n-seg: 6MB, ~50-60 FPS,
model.track(frame, persist=True)→ bbox + mask + track_id - cv-mmap Python client:
async for im, meta in CvMmapClient("name")— zero-copy numpy
Metis Review
Identified Gaps (addressed):
- Single-person policy undefined → Defined: largest-bbox selection, ignore others, reset window on ID change
- Sliding window stride undefined → Defined: stride=1 (every frame updates buffer), classify every N frames (configurable)
- No-detection / empty mask handling → Defined: skip frame, don't reset window unless gap exceeds threshold
- Mask quality / partial body → Defined: minimum mask area threshold to accept frame
- Track ID reset / re-identification → Defined: reset ring buffer on track ID change
- YOLO letterboxing → Defined: use
result.masks.datain original frame coords, not letterboxed - Async/sync impedance → Defined: synchronous pull-process-publish loop (no async queues in MVP)
- Scope creep lockdown → Explicitly excluded: TensorRT, DeepStream, multi-person, GUI, recording, auto-tuning
Work Objectives
Core Objective
Create a self-contained scoliosis screening pipeline that runs standalone (no DDP, no OpenGait training infrastructure) while reusing ScoNet's trained weights and matching its exact preprocessing contract.
Prerequisites (already present in repo)
- Checkpoint:
./ckpt/ScoNet-20000.pt— trained ScoNet weights (verified working, 80.88% accuracy on Scoliosis1K eval). Already exists in the repo, no download needed. - Config:
./configs/sconet/sconet_scoliosis1k.yaml— ScoNet architecture config. Already exists.
Concrete Deliverables
opengait/demo/sconet_demo.py— ScoNetDemo nn.Module wrapperopengait/demo/preprocess.py— Silhouette extraction and normalizationopengait/demo/window.py— Sliding window / ring buffer manageropengait/demo/input.py— Input adapters (cv-mmap + OpenCV)opengait/demo/output.py— NATS JSON publisheropengait/demo/pipeline.py— Main pipeline orchestratoropengait/demo/__main__.py— CLI entry pointtests/demo/test_preprocess.py— Preprocessing unit teststests/demo/test_window.py— Ring buffer + single-person policy teststests/demo/test_pipeline.py— Integration / smoke teststests/demo/test_pipeline.py— Integration / smoke tests
Definition of Done
uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120exits 0 and prints predictions (no NATS by default when--nats-urlnot provided)uv run pytest tests/demo/ -qpasses all tests- Pipeline processes ≥15 FPS on desktop GPU with 720p input
- JSON schema validated:
{"frame": int, "track_id": int, "label": str, "confidence": float, "window": int, "timestamp_ns": int}
Must Have
- Deterministic preprocessing matching ScoNet training data exactly (64×44, float32, [0,1])
- Single-person selection (largest bbox) with consistent tracking
- Sliding window of 30 frames with reset on track loss/ID change
- Graceful handling of: no detection, end of video, cv-mmap disconnect
- CLI with
--source,--checkpoint,--device,--window,--stride,--nats-url,--max-framesflags (usingclick) - Works without NATS server when
--nats-urlis omitted (console output fallback) - All tensor/array function signatures annotated with
jaxtypingtypes (e.g.,Float[Tensor, 'batch 1 seq 64 44']) and checked at runtime withbeartypevia@jaxtyped(typechecker=beartype)decorators - Generator-based input adapters — any
Iterable[tuple[np.ndarray, dict]]works as a source
Must NOT Have (Guardrails)
- No DDP: Demo must never import or call
torch.distributedanything - No BaseModel subclassing: ScoNetDemo extends
nn.Moduledirectly - No repo restructuring: Don't touch existing opengait training/eval/data code
- No TensorRT/DeepStream: Jetson acceleration is out of MVP scope
- No multi-person: Single tracked person only
- No GUI/visualization: Output is JSON, not rendered frames
- No dataset recording/auto-labeling: This is inference only
- No OpenCV GStreamer builds: Use pip-installed OpenCV
- No magic preprocessing: Every transform step must be explicit and testable
- No unbounded buffers: Every queue/buffer has a max size and drop policy
Verification Strategy
ZERO HUMAN INTERVENTION — ALL verification is agent-executed. No exceptions.
Test Decision
- Infrastructure exists: NO (creating with this plan)
- Automated tests: Tests after implementation (pytest)
- Framework: pytest (via
uv run pytest) - Setup: Add pytest to dev dependencies in pyproject.toml
QA Policy
Every task MUST include agent-executed QA scenarios.
Evidence saved to .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}.
- CLI/Pipeline: Use Bash — run pipeline with sample video, validate output
- Unit Tests: Use Bash —
uv run pytestspecific test files - NATS Integration: Use Bash — start NATS container, run pipeline, subscribe and validate JSON
Execution Strategy
Parallel Execution Waves
Wave 1 (Foundation — all independent, start immediately):
├── Task 1: Project scaffolding (opengait/demo/, tests/demo/, deps) [quick]
├── Task 2: ScoNetDemo nn.Module wrapper [deep]
├── Task 3: Silhouette preprocessing module [deep]
└── Task 4: Input adapters (cv-mmap + OpenCV) [unspecified-high]
Wave 2 (Core logic — depends on Wave 1 foundations):
├── Task 5: Ring buffer / sliding window manager (depends: 3) [unspecified-high]
├── Task 6: NATS JSON publisher (depends: 1) [quick]
├── Task 7: Unit tests — preprocessing (depends: 3) [unspecified-high]
└── Task 8: Unit tests — ScoNetDemo forward pass (depends: 2) [unspecified-high]
Wave 3 (Integration — combines all components):
├── Task 9: Main pipeline application + CLI (depends: 2,3,4,5,6) [deep]
├── Task 10: Single-person policy tests (depends: 5) [unspecified-high]
└── Task 11: Sample video acquisition (depends: 1) [quick]
Wave 4 (Verification — end-to-end):
├── Task 12: Integration tests + smoke test (depends: 9,11) [deep]
└── Task 13: NATS integration test (depends: 9,6) [unspecified-high]
Wave FINAL (Independent review — 4 parallel):
├── Task F1: Plan compliance audit (oracle)
├── Task F2: Code quality review (unspecified-high)
├── Task F3: Real manual QA (unspecified-high)
└── Task F4: Scope fidelity check (deep)
Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 9 → Task 12 → F1-F4
Parallel Speedup: ~60% faster than sequential
Max Concurrent: 4 (Waves 1 & 2)
Dependency Matrix
| Task | Depends On | Blocks | Wave |
|---|---|---|---|
| 1 | — | 6, 11 | 1 |
| 2 | — | 8, 9 | 1 |
| 3 | — | 5, 7, 9 | 1 |
| 4 | — | 9 | 1 |
| 5 | 3 | 9, 10 | 2 |
| 6 | 1 | 9, 13 | 2 |
| 7 | 3 | — | 2 |
| 8 | 2 | — | 2 |
| 9 | 2, 3, 4, 5, 6 | 12, 13 | 3 |
| 10 | 5 | — | 3 |
| 11 | 1 | 12 | 3 |
| 12 | 9, 11 | F1-F4 | 4 |
| 13 | 9, 6 | F1-F4 | 4 |
| F1-F4 | 12, 13 | — | FINAL |
Agent Dispatch Summary
- Wave 1: 4 — T1 →
quick, T2 →deep, T3 →deep, T4 →unspecified-high - Wave 2: 4 — T5 →
unspecified-high, T6 →quick, T7 →unspecified-high, T8 →unspecified-high - Wave 3: 3 — T9 →
deep, T10 →unspecified-high, T11 →quick - Wave 4: 2 — T12 →
deep, T13 →unspecified-high - FINAL: 4 — F1 →
oracle, F2 →unspecified-high, F3 →unspecified-high, F4 →deep
TODOs
Implementation + Test = ONE Task. Never separate. EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.
-
1. Project Scaffolding + Dependencies
What to do:
- Create
opengait/demo/__init__.py(empty, makes it a package) - Create
opengait/demo/__main__.py(stub:from .pipeline import main; main()) - Create
tests/demo/__init__.pyandtests/__init__.pyif missing - Create
tests/demo/conftest.pywith shared fixtures (sample tensor, mock frame) - Add dev dependencies to
pyproject.toml:pytest,nats-py,ultralytics,jaxtyping,beartype,click - Verify:
uv sync --extra torchsucceeds with new deps - Verify:
uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click"works
Must NOT do:
- Don't modify existing opengait code or imports
- Don't add runtime deps that aren't needed (no flask, no fastapi, etc.)
Recommended Agent Profile:
- Category:
quick- Reason: Boilerplate file creation and dependency management, no complex logic
- Skills: []
- Skills Evaluated but Omitted:
explore: Not needed — we know exactly what files to create
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 2, 3)
- Blocks: Tasks 6, 11
- Blocked By: None (can start immediately)
References:
Pattern References:
opengait/modeling/models/__init__.py— Example of package init in this repopyproject.toml— Current dependency structure; add to[project.optional-dependencies]or[dependency-groups]
External References:
- ultralytics pip package:
pip install ultralytics(includes YOLO + ByteTrack) - nats-py:
pip install nats-py(async NATS client)
WHY Each Reference Matters:
pyproject.toml: Must match existing dep management style (uv + groups) to avoid breakinguv syncopengait/modeling/models/__init__.py: Shows the repo's package init convention (dynamic imports vs empty)
Acceptance Criteria:
opengait/demo/__init__.pyexistsopengait/demo/__main__.pyexists with stub entry pointtests/demo/conftest.pyexists with at least one fixtureuv syncsucceeds without errorsuv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click; print('OK')"prints OK
QA Scenarios:
Scenario: Dependencies install correctly Tool: Bash Preconditions: Clean uv environment Steps: 1. Run `uv sync --extra torch` 2. Run `uv run python -c "import ultralytics; import nats; import pytest; import jaxtyping; import beartype; import click; print('ALL_DEPS_OK')"` Expected Result: Both commands exit 0, second prints 'ALL_DEPS_OK' Failure Indicators: ImportError, uv sync failure, missing package Evidence: .sisyphus/evidence/task-4-deps-install.txt Scenario: Package structure is importable Tool: Bash Preconditions: uv sync completed Steps: 1. Run `uv run python -c "from opengait.demo import __main__; print('IMPORT_OK')"` Expected Result: Prints 'IMPORT_OK' without errors Failure Indicators: ModuleNotFoundError, ImportError Evidence: .sisyphus/evidence/task-4-import-check.txtCommit: YES
- Message:
chore(demo): scaffold demo package and test infrastructure - Files:
opengait/demo/__init__.py,opengait/demo/__main__.py,tests/demo/conftest.py,tests/demo/__init__.py,tests/__init__.py,pyproject.toml - Pre-commit:
uv sync && uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click"
- Create
-
2. ScoNetDemo — DDP-Free Inference Wrapper
What to do:
- Create
opengait/demo/sconet_demo.py - Class
ScoNetDemo(nn.Module)— NOT a BaseModel subclass - Constructor takes
cfg_path: strandcheckpoint_path: str - Use
config_loaderfromopengait/utils/common.pyto parse YAML config - Build the ScoNet architecture layers directly:
Backbone(ResNet9 fromopengait/modeling/backbones/resnet.py)TemporalPool(fromopengait/modeling/modules.py)HorizontalPoolingPyramid(fromopengait/modeling/modules.py)SeparateFCs(fromopengait/modeling/modules.py)SeparateBNNecks(fromopengait/modeling/modules.py)
- Load checkpoint:
torch.load(checkpoint_path, map_location=device)→ extract state_dict →load_state_dict() - Handle checkpoint format: may be
{'model': state_dict, ...}or plain state_dict - Strip
module.prefix from DDP-wrapped keys if present - All public methods decorated with
@jaxtyped(typechecker=beartype)for runtime shape checking forward(sils: Float[Tensor, 'batch 1 seq 64 44']) -> dictwhere seq=30 (window size)- Use jaxtyping:
from jaxtyping import Float, Int, jaxtyped - Use beartype:
from beartype import beartype
- Use jaxtyping:
- Returns
{'logits': Float[Tensor, 'batch 3 16'], 'label': int, 'confidence': float} predict(sils: Float[Tensor, 'batch 1 seq 64 44']) -> tuple[str, float]convenience method: returns('positive'|'neutral'|'negative', confidence)- Prediction logic:
argmax(logits.mean(dim=-1), dim=-1)→ index → label string - Confidence:
softmax(logits.mean(dim=-1)).max()— probability of chosen class - Class mapping:
{0: 'negative', 1: 'neutral', 2: 'positive'}
Must NOT do:
- Do NOT import anything from
torch.distributed - Do NOT subclass
BaseModel - Do NOT use
ddp_all_gatherorget_ddp_module - Do NOT modify
sconet.pyor any existing model file
Recommended Agent Profile:
- Category:
deep- Reason: Requires understanding ScoNet's architecture, checkpoint format, and careful weight loading — high complexity, correctness-critical
- Skills: []
- Skills Evaluated but Omitted:
explore: Agent should read referenced files directly, not search broadly
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 2, 3, 4)
- Blocks: Tasks 8, 9
- Blocked By: None (can start immediately)
References:
Pattern References:
opengait/modeling/models/sconet.py— ScoNet model definition. Study__init__to see which submodules are built and howforward()assembles the pipeline. Lines ~10-54.opengait/modeling/base_model.py— BaseModel class. Study__init__(lines ~30-80) to see how it builds backbone/heads from config. Replicate the build logic WITHOUT DDP calls.- All-in-One-Gait
BaselineDemopattern: extendsnn.Moduledirectly, usestorch.load()+load_state_dict()withtraining=False
API/Type References:
opengait/modeling/backbones/resnet.py— ResNet9 backbone class. Constructor signature and forward signature.opengait/modeling/modules.py—TemporalPool,HorizontalPoolingPyramid,SeparateFCs,SeparateBNNecksclasses. Constructor args come from config YAML.opengait/utils/common.py::config_loader— Loads YAML config, merges with default.yaml. Returns dict.
Config References:
configs/sconet/sconet_scoliosis1k.yaml— ScoNet config specifying backbone, head, loss params. Themodel_cfgsection defines architecture hyperparams.configs/default.yaml— Default config merged by config_loader
Checkpoint Reference:
./ckpt/ScoNet-20000.pt— Trained ScoNet checkpoint. Verify format:torch.load()and inspect keys.
Inference Logic Reference:
opengait/evaluation/evaluator.py:evaluate_scoliosis()(line ~418) — Showsargmax(logits.mean(-1))prediction logic and label mapping
WHY Each Reference Matters:
sconet.py: Defines the exact forward pass we must replicate — TP → backbone → HPP → FCs → BNNecksbase_model.py: Shows how config is parsed into submodule instantiation — we copy this logic minus DDPmodules.py: Constructor signatures tell us what config keys to extractevaluator.py: The prediction aggregation (mean over parts, argmax) is the canonical inference logicsconet_scoliosis1k.yaml: Contains the exact hyperparams (channels, num_parts, etc.) for building layers
Acceptance Criteria:
opengait/demo/sconet_demo.pyexists withScoNetDemo(nn.Module)class- No
torch.distributedimports in the file ScoNetDemodoes not inherit fromBaseModeluv run python -c "from opengait.demo.sconet_demo import ScoNetDemo; print('OK')"works
QA Scenarios:
Scenario: ScoNetDemo loads checkpoint and produces correct output shape Tool: Bash Preconditions: ./ckpt/ScoNet-20000.pt exists, CUDA available Steps: 1. Run `uv run python -c "` ```python import torch from opengait.demo.sconet_demo import ScoNetDemo model = ScoNetDemo('./configs/sconet/sconet_scoliosis1k.yaml', './ckpt/ScoNet-20000.pt', device='cuda:0') model.eval() dummy = torch.rand(1, 1, 30, 64, 44, device='cuda:0') with torch.no_grad(): result = model(dummy) assert result['logits'].shape == (1, 3, 16), f'Bad shape: {result["logits"].shape}' label, conf = model.predict(dummy) assert label in ('negative', 'neutral', 'positive'), f'Bad label: {label}' assert 0.0 <= conf <= 1.0, f'Bad confidence: {conf}' print(f'SCONET_OK label={label} conf={conf:.3f}') ``` Expected Result: Prints 'SCONET_OK label=... conf=...' with valid label and confidence Failure Indicators: ImportError, state_dict key mismatch, shape error, CUDA error Evidence: .sisyphus/evidence/task-1-sconet-forward.txt Scenario: ScoNetDemo rejects DDP-wrapped usage Tool: Bash Preconditions: File exists Steps: 1. Run `grep -c 'torch.distributed' opengait/demo/sconet_demo.py` 2. Run `grep -c 'BaseModel' opengait/demo/sconet_demo.py` Expected Result: Both commands output '0' Failure Indicators: Any count > 0 Evidence: .sisyphus/evidence/task-1-no-ddp.txtCommit: YES
- Message:
feat(demo): add ScoNetDemo DDP-free inference wrapper - Files:
opengait/demo/sconet_demo.py - Pre-commit:
uv run python -c "from opengait.demo.sconet_demo import ScoNetDemo"
- Create
-
3. Silhouette Preprocessing Module
What to do:
- Create
opengait/demo/preprocess.py - All public functions decorated with
@jaxtyped(typechecker=beartype)for runtime shape checking - Function
mask_to_silhouette(mask: UInt8[ndarray, 'h w'], bbox: tuple[int,int,int,int]) -> Float[ndarray, '64 44'] | None:- Uses jaxtyping:
from jaxtyping import Float, UInt8, jaxtypedandfrom numpy import ndarray - Input: binary mask (H, W) uint8 from YOLO, bounding box (x1, y1, x2, y2)
- Crop mask to bbox region
- Find vertical extent of foreground pixels (top/bottom rows with nonzero)
- Crop to tight vertical bounding box (remove empty rows above/below)
- Resize height to 64, maintaining aspect ratio
- Center-crop or center-pad width to 64
- Cut 10px from each side → final 64×44
- Return float32 array [0.0, 1.0] (divide by 255)
- Return
Noneif mask area belowMIN_MASK_AREAthreshold (default: 500 pixels)
- Uses jaxtyping:
- Function
frame_to_person_mask(result, min_area: int = 500) -> tuple[UInt8[ndarray, 'h w'], tuple[int,int,int,int]] | None:- Extract single-person mask + bbox from YOLO result object
- Uses
result.masks.dataandresult.boxes.xyxy - Returns
Noneif no valid detection
- Constants:
SIL_HEIGHT = 64,SIL_WIDTH = 44,SIL_FULL_WIDTH = 64,SIDE_CUT = 10,MIN_MASK_AREA = 500 - Each step must match the preprocessing in
datasets/pretreatment.py(grayscale → crop → resize → center) andBaseSilCuttingTransform(cut sides → /255)
Must NOT do:
- Don't import or modify
datasets/pretreatment.py - Don't add color/texture features — binary silhouettes only
- Don't resize to arbitrary sizes — must be exactly 64×44 output
Recommended Agent Profile:
- Category:
deep- Reason: Correctness-critical — must exactly match training data preprocessing. Subtle pixel-level bugs will silently degrade accuracy.
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 2, 4)
- Blocks: Tasks 5, 7, 9
- Blocked By: None
References:
Pattern References:
datasets/pretreatment.py:18-96(functionimgs2pickle) — The canonical preprocessing pipeline. Study lines 45-80 carefully:cv2.imread(GRAYSCALE)→ find contours → crop to person bbox →cv2.resize(img, (int(64 * ratio), 64))→ center-crop width. This is the EXACT sequence to replicate for live masks.opengait/data/transform.py:46-58(BaseSilCuttingTransform) — The runtime transform applied during training/eval.cutting = int(w // 64) * 10then slices[:, :, cutting:-cutting]then divides by 255.0. For w=64 input, cutting=10, output width=44.
API/Type References:
- Ultralytics
Resultsobject:result.masks.data→Tensor[N, H, W]binary masks;result.boxes.xyxy→Tensor[N, 4]bounding boxes;result.boxes.id→ track IDs (may be None)
WHY Each Reference Matters:
pretreatment.py: Defines the ground truth preprocessing. If our live pipeline differs by even 1 pixel of padding, ScoNet accuracy degrades.BaseSilCuttingTransform: The 10px side cut + /255 is applied at runtime during training. We must apply the SAME transform.- Ultralytics masks: Need to know exact API to extract binary masks from YOLO output
Acceptance Criteria:
opengait/demo/preprocess.pyexistsmask_to_silhouette()returnsnp.ndarrayof shape(64, 44)dtypefloat32with values in[0, 1]- Returns
Nonefor masks below MIN_MASK_AREA
QA Scenarios:
Scenario: Preprocessing produces correct output shape and range Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Create a synthetic mask: 200x100 person-shaped blob mask = np.zeros((480, 640), dtype=np.uint8) mask[100:400, 250:400] = 255 # person region bbox = (250, 100, 400, 400) sil = mask_to_silhouette(mask, bbox) assert sil is not None, 'Should not be None for valid mask' assert sil.shape == (64, 44), f'Bad shape: {sil.shape}' assert sil.dtype == np.float32, f'Bad dtype: {sil.dtype}' assert 0.0 <= sil.min() and sil.max() <= 1.0, f'Bad range: [{sil.min()}, {sil.max()}]' assert sil.max() > 0, 'Should have nonzero pixels' print('PREPROCESS_OK') ``` Expected Result: Prints 'PREPROCESS_OK' Failure Indicators: Shape mismatch, dtype error, range error Evidence: .sisyphus/evidence/task-3-preprocess-shape.txt Scenario: Small masks are rejected Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Tiny mask: only 10x10 = 100 pixels (below MIN_MASK_AREA=500) mask = np.zeros((480, 640), dtype=np.uint8) mask[100:110, 100:110] = 255 bbox = (100, 100, 110, 110) sil = mask_to_silhouette(mask, bbox) assert sil is None, f'Should be None for tiny mask, got {type(sil)}' print('SMALL_MASK_REJECTED_OK') ``` Expected Result: Prints 'SMALL_MASK_REJECTED_OK' Failure Indicators: Returns non-None for tiny mask Evidence: .sisyphus/evidence/task-3-small-mask-reject.txtCommit: YES
- Message:
feat(demo): add silhouette preprocessing module - Files:
opengait/demo/preprocess.py - Pre-commit:
uv run python -c "from opengait.demo.preprocess import mask_to_silhouette"
- Create
-
4. Input Adapters (cv-mmap + OpenCV)
What to do:
- Create
opengait/demo/input.py - The pipeline contract is simple: it consumes any
Iterable[tuple[np.ndarray, dict]]— any generator or iterator that yields(frame_bgr_uint8, metadata_dict)works - Type alias:
FrameStream = Iterable[tuple[np.ndarray, dict]] - Generator function
opencv_source(path: str | int, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:pathcan be video file path or camera index (int)- Opens
cv2.VideoCapture(path) - Yields
(frame, {'frame_count': int, 'timestamp_ns': int})tuples - Handles end-of-video gracefully (just returns)
- Handles camera disconnect (log warning, return)
- Respects
max_frameslimit
- Generator function
cvmmap_source(name: str, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:- Wraps
CvMmapClientfrom/home/crosstyan/Code/cv-mmap/client/cvmmap/ - Since cv-mmap is async (anyio), this adapter must bridge async→sync:
- Run anyio event loop in a background thread, drain frames via
queue.Queue - Or use
anyio.from_thread/asyncio.run()withasync forinternally - Choose simplest correct approach
- Run anyio event loop in a background thread, drain frames via
- Yields same
(frame, metadata_dict)tuple format as opencv_source - Handles cv-mmap disconnect/offline events gracefully
- Import should be conditional (try/except ImportError) so pipeline works without cv-mmap installed
- Wraps
- Factory function
create_source(source: str, max_frames: int | None = None) -> FrameStream:- If source starts with
cvmmap://→cvmmap_source(name) - If source is a digit string →
opencv_source(int(source))(camera index) - Otherwise →
opencv_source(source)(file path)
- If source starts with
- The key design point: any user-written generator that yields
(np.ndarray, dict)plugs in directly — no class inheritance needed
Must NOT do:
- Don't build GStreamer pipelines
- Don't add async to the main pipeline loop — keep synchronous pull model
- Don't use abstract base classes or heavy OOP — plain generator functions are the interface
- Don't buffer frames internally (no unbounded queue between source and consumer)
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Integration with external library (cv-mmap) requires careful async→sync bridging
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 3, 4)
- Blocks: Task 9
- Blocked By: None
References:
Pattern References:
/home/crosstyan/Code/cv-mmap/client/cvmmap/__init__.py—CvMmapClientclass. Async iterator:async for im, meta in client. Understand the__aiter__/__anext__protocol./home/crosstyan/Code/cv-mmap/client/test_cvmmap.py— Example consumer pattern usinganyio.run()/home/crosstyan/Code/cv-mmap/client/cvmmap/msg.py—FrameMetadataandFrameInfodataclasses. Fields:frame_count,timestamp_ns,info.width,info.height,info.pixel_format
API/Type References:
cv2.VideoCapture— OpenCV video capture.cap.read()returns(bool, np.ndarray).cap.get(cv2.CAP_PROP_FRAME_COUNT)for total frames.
WHY Each Reference Matters:
CvMmapClient: The async iterator yields(numpy_array, FrameMetadata)— need to know exact types for sync bridgingmsg.py: Metadata fields must be mapped to our genericdictmetadata formattest_cvmmap.py: Shows the canonical consumer pattern we must wrap
Acceptance Criteria:
opengait/demo/input.pyexists withopencv_source,cvmmap_source,create_sourceas functions (not classes)create_source('./some/video.mp4')returns a generator/iterablecreate_source('cvmmap://default')returns a generator (or raises if cv-mmap not installed)create_source('0')returns a generator for camera index 0- Any custom generator
def my_source(): yield (frame, meta)can be used directly by the pipeline
QA Scenarios:
Scenario: opencv_source reads frames from a video file Tool: Bash Preconditions: Any .mp4 or .avi file available (use sample video or create synthetic one) Steps: 1. Create a short test video if none exists: `uv run python -c "import cv2; import numpy as np; w=cv2.VideoWriter('/tmp/test.avi', cv2.VideoWriter_fourcc(*'MJPG'), 15, (640,480)); [w.write(np.random.randint(0,255,(480,640,3),dtype=np.uint8)) for _ in range(30)]; w.release(); print('VIDEO_CREATED')"` 2. Run `uv run python -c "` ```python from opengait.demo.input import create_source src = create_source('/tmp/test.avi', max_frames=10) count = 0 for frame, meta in src: assert frame.shape[2] == 3, f'Not BGR: {frame.shape}' assert 'frame_count' in meta count += 1 assert count == 10, f'Expected 10 frames, got {count}' print('OPENCV_SOURCE_OK') ``` Expected Result: Prints 'OPENCV_SOURCE_OK' Failure Indicators: Shape error, missing metadata, wrong frame count Evidence: .sisyphus/evidence/task-2-opencv-source.txt Scenario: Custom generator works as pipeline input Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.input import FrameStream import typing # Any generator works — no class needed def my_source(): for i in range(5): yield np.zeros((480, 640, 3), dtype=np.uint8), {'frame_count': i} src = my_source() frames = list(src) assert len(frames) == 5 print('CUSTOM_GENERATOR_OK') ``` Expected Result: Prints 'CUSTOM_GENERATOR_OK' Failure Indicators: Type error, protocol mismatch Evidence: .sisyphus/evidence/task-2-custom-gen.txtCommit: YES
- Message:
feat(demo): add generator-based input adapters for cv-mmap and OpenCV - Files:
opengait/demo/input.py - Pre-commit:
uv run python -c "from opengait.demo.input import create_source"
- Create
-
5. Sliding Window / Ring Buffer Manager
What to do:
- Create
opengait/demo/window.py - Class
SilhouetteWindow:- Constructor:
__init__(self, window_size: int = 30, stride: int = 1, gap_threshold: int = 15) - Internal storage:
collections.deque(maxlen=window_size)ofnp.ndarray(64×44 float32) push(sil: np.ndarray, frame_idx: int, track_id: int) -> None:- If
track_iddiffers from current tracked ID → reset buffer, update tracked ID - If
frame_idx - last_frame_idx > gap_threshold→ reset buffer (too many missed frames) - Append silhouette to deque
- Increment internal frame counter
- If
is_ready() -> bool: returnslen(buffer) == window_sizeshould_classify() -> bool: returnsis_ready() and (frames_since_last_classify >= stride)get_tensor(device: str = 'cpu') -> torch.Tensor:- Stack buffer into
np.arrayshape[window_size, 64, 44] - Convert to
torch.Tensorshape[1, 1, window_size, 64, 44]ondevice - This is the exact input shape for ScoNetDemo
- Stack buffer into
reset() -> None: clear buffer and countersmark_classified() -> None: reset frames_since_last_classify counter- Properties:
current_track_id,frame_count,fill_level(len/window_size as float)
- Constructor:
- Single-person selection policy (function or small helper):
select_person(results) -> tuple[np.ndarray, tuple, int] | None- From YOLO results, select the detection with the largest bounding box area
- Return
(mask, bbox, track_id)orNoneif no valid detection - If
result.boxes.idis None (tracker not yet initialized), skip frame
Must NOT do:
- No unbounded buffers — deque with maxlen enforces this
- No multi-person tracking — single person only, select largest bbox
- No time-based windowing — frame-count based only
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Data structure + policy logic with edge cases (ID changes, gaps, resets)
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 6, 7, 8)
- Blocks: Tasks 9, 10
- Blocked By: Task 3 (needs silhouette shape constants from preprocess.py)
References:
Pattern References:
opengait/demo/preprocess.py(Task 3) —SIL_HEIGHT,SIL_WIDTHconstants. The window stores arrays of this shape.opengait/data/dataset.py— Shows how OpenGait's DataSet samples fixed-length sequences. TheseqLparameter controls sequence length (our window_size=30).
API/Type References:
- Ultralytics
Results.boxes.id— Track IDs tensor, may beNoneif tracker hasn't assigned IDs yet - Ultralytics
Results.boxes.xyxy— Bounding boxes[N, 4]for area calculation - Ultralytics
Results.masks.data— Binary masks[N, H, W]
WHY Each Reference Matters:
preprocess.py: Window must store silhouettes of the exact shape produced by preprocessingdataset.py: Understanding how training samples sequences helps ensure our window matches- Ultralytics API: Need to handle
Nonetrack IDs and extract correct tensors
Acceptance Criteria:
opengait/demo/window.pyexists withSilhouetteWindowclass andselect_personfunction- Buffer is bounded (deque with maxlen)
get_tensor()returns shape[1, 1, 30, 64, 44]when full- Track ID change triggers reset
- Gap exceeding threshold triggers reset
QA Scenarios:
Scenario: Window fills and produces correct tensor shape Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.window import SilhouetteWindow win = SilhouetteWindow(window_size=30, stride=1) for i in range(30): sil = np.random.rand(64, 44).astype(np.float32) win.push(sil, frame_idx=i, track_id=1) assert win.is_ready(), 'Window should be ready after 30 frames' t = win.get_tensor() assert t.shape == (1, 1, 30, 64, 44), f'Bad shape: {t.shape}' assert t.dtype.is_floating_point, f'Bad dtype: {t.dtype}' print('WINDOW_FILL_OK') ``` Expected Result: Prints 'WINDOW_FILL_OK' Failure Indicators: Shape mismatch, not ready after 30 pushes Evidence: .sisyphus/evidence/task-5-window-fill.txt Scenario: Track ID change resets buffer Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.window import SilhouetteWindow win = SilhouetteWindow(window_size=30) for i in range(20): win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=i, track_id=1) assert win.frame_count == 20 # Switch track ID — should reset win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=20, track_id=2) assert win.frame_count == 1, f'Should reset to 1, got {win.frame_count}' assert win.current_track_id == 2 print('TRACK_RESET_OK') ``` Expected Result: Prints 'TRACK_RESET_OK' Failure Indicators: Buffer not reset, wrong track ID Evidence: .sisyphus/evidence/task-5-track-reset.txtCommit: YES
- Message:
feat(demo): add sliding window manager with single-person selection - Files:
opengait/demo/window.py - Pre-commit:
uv run python -c "from opengait.demo.window import SilhouetteWindow"
- Create
-
6. NATS JSON Publisher
What to do:
- Create
opengait/demo/output.py - Class
ResultPublisher(Protocol)— any object withpublish(result: dict) -> None - Function
console_publisher() -> Generatoror simple classConsolePublisher:- Prints JSON to stdout (default when
--nats-urlis not provided) - Format: one JSON object per line (JSONL)
- Prints JSON to stdout (default when
- Class
NatsPublisher:- Constructor:
__init__(self, url: str = 'nats://127.0.0.1:4222', subject: str = 'scoliosis.result') - Uses
nats-pyasync client, bridged to syncpublish()method - Handles connection failures gracefully (log warning, retry with backoff, don't crash pipeline)
- Handles reconnection automatically (nats-py does this by default)
publish(result: dict) -> None: serializes to JSON, publishes to subjectclose() -> None: drain and close NATS connection- Context manager support (
__enter__/__exit__)
- Constructor:
- JSON schema for results:
{ "frame": 1234, "track_id": 1, "label": "positive", "confidence": 0.82, "window": 30, "timestamp_ns": 1234567890000 } - Factory:
create_publisher(nats_url: str | None, subject: str = 'scoliosis.result') -> ResultPublisher- If
nats_urlis None → ConsolePublisher - Otherwise → NatsPublisher(url, subject)
- If
Must NOT do:
- Don't use JetStream (plain NATS PUB/SUB is sufficient)
- Don't build custom binary protocol
- Don't buffer/batch results — publish immediately
Recommended Agent Profile:
- Category:
quick- Reason: Straightforward JSON serialization + NATS client wrapper, well-documented library
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 7, 8)
- Blocks: Tasks 9, 13
- Blocked By: Task 1 (needs project scaffolding for nats-py dependency)
References:
External References:
- nats-py docs:
import nats; nc = await nats.connect(); await nc.publish(subject, data)— async API /home/crosstyan/Code/cv-mmap-gui/— Uses NATS.c for messaging; our Python publisher sends to the same broker
WHY Each Reference Matters:
- nats-py: Need to bridge async NATS client to sync
publish()call - cv-mmap-gui: Confirms NATS is the right transport for this ecosystem
Acceptance Criteria:
opengait/demo/output.pyexists withConsolePublisher,NatsPublisher,create_publisher- ConsolePublisher prints valid JSON to stdout
- NatsPublisher connects and publishes without crashing (when NATS available)
- NatsPublisher logs warning and doesn't crash when NATS unavailable
QA Scenarios:
Scenario: ConsolePublisher outputs valid JSONL Tool: Bash Steps: 1. Run `uv run python -c "` ```python import json, io, sys from opengait.demo.output import create_publisher pub = create_publisher(nats_url=None) result = {'frame': 100, 'track_id': 1, 'label': 'positive', 'confidence': 0.85, 'window': 30, 'timestamp_ns': 0} pub.publish(result) # should print to stdout print('CONSOLE_PUB_OK') ``` Expected Result: Prints a JSON line followed by 'CONSOLE_PUB_OK' Failure Indicators: Invalid JSON, missing fields, crash Evidence: .sisyphus/evidence/task-6-console-pub.txt Scenario: NatsPublisher handles missing server gracefully Tool: Bash Steps: 1. Run `uv run python -c "` ```python from opengait.demo.output import create_publisher try: pub = create_publisher(nats_url='nats://127.0.0.1:14222') # wrong port, no server pub.publish({'frame': 0, 'label': 'test'}) except SystemExit: print('SHOULD_NOT_EXIT') raise print('NATS_GRACEFUL_OK') ``` Expected Result: Prints 'NATS_GRACEFUL_OK' (warning logged, no crash) Failure Indicators: Unhandled exception, SystemExit, hang Evidence: .sisyphus/evidence/task-6-nats-graceful.txtCommit: YES
- Message:
feat(demo): add NATS JSON publisher and console fallback - Files:
opengait/demo/output.py - Pre-commit:
uv run python -c "from opengait.demo.output import create_publisher"
- Create
-
7. Unit Tests — Silhouette Preprocessing
What to do:
- Create
tests/demo/test_preprocess.py - Test
mask_to_silhouette()with:- Valid mask (person-shaped blob) → output shape (64, 44), dtype float32, range [0, 1]
- Tiny mask below MIN_MASK_AREA → returns None
- Empty mask (all zeros) → returns None
- Full-frame mask (all 255) → produces valid output (edge case: very wide person)
- Tall narrow mask → verify aspect ratio handling (resize h=64, center-crop width)
- Wide short mask → verify handling (should still produce 64×44)
- Test determinism: same input always produces same output
- Test against a reference
.pklsample if available:- Load a known
.pklfile from Scoliosis1K - Extract one frame
- Compare our preprocessing output to the stored frame (should be close/identical)
- Load a known
- Verify jaxtyping annotations are present and beartype checks fire on wrong shapes
Must NOT do:
- Don't test YOLO integration here — only test the
mask_to_silhouettefunction in isolation - Don't require GPU — all preprocessing is CPU numpy ops
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Must verify pixel-level correctness against training data contract, multiple edge cases
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 6, 8)
- Blocks: None (verification task)
- Blocked By: Task 3 (preprocess module must exist)
References:
Pattern References:
opengait/demo/preprocess.py(Task 3) — The module under testdatasets/pretreatment.py:18-96— Reference preprocessing to validate againstopengait/data/transform.py:46-58—BaseSilCuttingTransformfor expected output contract
WHY Each Reference Matters:
preprocess.py: Direct test targetpretreatment.py: Ground truth for what a correct silhouette looks likeBaseSilCuttingTransform: Defines the 64→44 cut + /255 contract we must match
Acceptance Criteria:
tests/demo/test_preprocess.pyexists with ≥5 test casesuv run pytest tests/demo/test_preprocess.py -qpasses- Tests cover: valid mask, tiny mask, empty mask, determinism
QA Scenarios:
Scenario: All preprocessing tests pass Tool: Bash Preconditions: Task 3 (preprocess.py) is complete Steps: 1. Run `uv run pytest tests/demo/test_preprocess.py -v` Expected Result: All tests pass (≥5 tests), exit code 0 Failure Indicators: Any assertion failure, import error Evidence: .sisyphus/evidence/task-7-preprocess-tests.txt Scenario: Jaxtyping annotation enforcement works Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Intentionally wrong type to verify beartype catches it try: mask_to_silhouette('not_an_array', (0, 0, 10, 10)) print('BEARTYPE_MISSED') # should not reach here except Exception as e: if 'beartype' in type(e).__module__ or 'BeartypeCallHint' in type(e).__name__: print('BEARTYPE_OK') else: print(f'WRONG_ERROR: {type(e).__name__}: {e}') ``` Expected Result: Prints 'BEARTYPE_OK' Failure Indicators: Prints 'BEARTYPE_MISSED' or 'WRONG_ERROR' Evidence: .sisyphus/evidence/task-7-beartype-check.txtCommit: YES (groups with Task 8)
- Message:
test(demo): add preprocessing and model unit tests - Files:
tests/demo/test_preprocess.py - Pre-commit:
uv run pytest tests/demo/test_preprocess.py -q
- Create
-
8. Unit Tests — ScoNetDemo Forward Pass
What to do:
- Create
tests/demo/test_sconet_demo.py - Test
ScoNetDemoconstruction:- Loads config from YAML
- Loads checkpoint weights
- Model is in eval mode
- Test
forward()with dummy tensor:- Input:
torch.rand(1, 1, 30, 64, 44)on available device - Output logits shape:
(1, 3, 16) - Output dtype: float32
- Input:
- Test
predict()convenience method:- Returns
(label_str, confidence_float) label_stris one of{'negative', 'neutral', 'positive'}confidenceis in[0.0, 1.0]
- Returns
- Test with various batch sizes: N=1, N=2
- Test with various sequence lengths if model supports it (should work with 30)
- Verify no
torch.distributedcalls are made (mocktorch.distributedto raise if called) - Verify jaxtyping shape annotations on forward/predict signatures
Must NOT do:
- Don't test with real video data — dummy tensors only for unit tests
- Don't modify the checkpoint
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Requires CUDA, checkpoint loading, shape validation — moderate complexity
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 6, 7)
- Blocks: None (verification task)
- Blocked By: Task 2 (ScoNetDemo must exist)
References:
Pattern References:
opengait/demo/sconet_demo.py(Task 1) — The module under testopengait/evaluation/evaluator.py:evaluate_scoliosis()(line ~418) — Canonical prediction logic to validate against
Config/Checkpoint References:
configs/sconet/sconet_scoliosis1k.yaml— Config file to pass to ScoNetDemo./ckpt/ScoNet-20000.pt— Trained checkpoint
WHY Each Reference Matters:
sconet_demo.py: Direct test targetevaluator.py: Defines expected prediction behavior (argmax of mean logits)
Acceptance Criteria:
tests/demo/test_sconet_demo.pyexists with ≥4 test casesuv run pytest tests/demo/test_sconet_demo.py -qpasses- Tests cover: construction, forward shape, predict output, no-DDP enforcement
QA Scenarios:
Scenario: All ScoNetDemo tests pass Tool: Bash Preconditions: Task 1 (sconet_demo.py) is complete, checkpoint exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_sconet_demo.py -v` Expected Result: All tests pass (≥4 tests), exit code 0 Failure Indicators: state_dict key mismatch, shape error, CUDA OOM Evidence: .sisyphus/evidence/task-8-sconet-tests.txt Scenario: No DDP leakage in ScoNetDemo Tool: Bash Steps: 1. Run `grep -rn 'torch.distributed' opengait/demo/sconet_demo.py` 2. Run `grep -rn 'ddp_all_gather\|get_ddp_module\|get_rank' opengait/demo/sconet_demo.py` Expected Result: Both commands produce no output (exit code 1 = no matches) Failure Indicators: Any match found Evidence: .sisyphus/evidence/task-8-no-ddp.txtCommit: YES (groups with Task 7)
- Message:
test(demo): add preprocessing and model unit tests - Files:
tests/demo/test_sconet_demo.py - Pre-commit:
uv run pytest tests/demo/test_sconet_demo.py -q
- Create
-
9. Main Pipeline Application + CLI
What to do:
- Create
opengait/demo/pipeline.py— the main orchestrator - Create
opengait/demo/__main__.py— CLI entry point (replace stub from Task 4) - Pipeline class
ScoliosisPipeline:- Constructor:
__init__(self, source: FrameStream, model: ScoNetDemo, publisher: ResultPublisher, window: SilhouetteWindow, yolo_model_path: str = 'yolo11n-seg.pt', device: str = 'cuda:0') - Uses jaxtyping annotations for all tensor-bearing methods:
from jaxtyping import Float, UInt8, jaxtyped from beartype import beartype from torch import Tensor import numpy as np from numpy import ndarray run() -> None— main loop:- Load YOLO model:
ultralytics.YOLO(yolo_model_path) - For each
(frame, meta)from source: a. Runyolo_model.track(frame, persist=True, verbose=False)→ results b.select_person(results)→(mask, bbox, track_id)or None → skip if None c.mask_to_silhouette(mask, bbox)→silor None → skip if None d.window.push(sil, meta['frame_count'], track_id)e. Ifwindow.should_classify():tensor = window.get_tensor(device=self.device)label, confidence = self.model.predict(tensor)publisher.publish({...})with JSON schema fieldswindow.mark_classified()
- Log FPS every 100 frames
- Cleanup on exit (close publisher, release resources)
- Load YOLO model:
- Graceful shutdown on KeyboardInterrupt / SIGTERM
- Constructor:
- CLI via
__main__.pyusingclick:--source(required): video path, camera index, orcvmmap://name--checkpoint(required): path to ScoNet checkpoint--config(default:./configs/sconet/sconet_scoliosis1k.yaml): ScoNet config YAML--device(default:cuda:0): torch device--yolo-model(default:yolo11n-seg.pt): YOLO model path (auto-downloads)--window(default: 30): sliding window size--stride(default: 30): classify every N frames after window is full--nats-url(default: None): NATS server URL, None = console output--nats-subject(default:scoliosis.result): NATS subject--max-frames(default: None): stop after N frames--help: print usage
- Entrypoint:
uv run python -m opengait.demo ...
Must NOT do:
- No async in the main loop — synchronous pull-process-publish
- No multi-threading for inference — single-threaded pipeline
- No GUI / frame display / cv2.imshow
- No unbounded accumulation — ring buffer handles memory
- No auto-download of ScoNet checkpoint — user must provide path
Recommended Agent Profile:
- Category:
deep- Reason: Integration of all components, error handling, CLI design, FPS logging — largest single task
- Skills: []
Parallelization:
- Can Run In Parallel: NO
- Parallel Group: Wave 3 (sequential — depends on most Wave 1+2 tasks)
- Blocks: Tasks 12, 13
- Blocked By: Tasks 2, 3, 4, 5, 6 (all components must exist)
References:
Pattern References:
opengait/demo/sconet_demo.py(Task 1) —ScoNetDemoclass,predict()methodopengait/demo/preprocess.py(Task 3) —mask_to_silhouette(),frame_to_person_mask()opengait/demo/window.py(Task 5) —SilhouetteWindow,select_person()opengait/demo/input.py(Task 2) —create_source(),FrameStreamtype aliasopengait/demo/output.py(Task 6) —create_publisher(),ResultPublisher
External References:
- Ultralytics tracking API:
model.track(frame, persist=True)— returnsResultslist - Ultralytics result object:
results[0].masks.data,results[0].boxes.xyxy,results[0].boxes.id
WHY Each Reference Matters:
- All Task refs: This task composes every component — must know each API surface
- Ultralytics: The YOLO
.track()call is the only external API used directly in this file
Acceptance Criteria:
opengait/demo/pipeline.pyexists withScoliosisPipelineclassopengait/demo/__main__.pyexists with click CLIuv run python -m opengait.demo --helpprints usage without errors- All public methods have jaxtyping annotations where tensor/array args are involved
QA Scenarios:
Scenario: CLI --help works Tool: Bash Steps: 1. Run `uv run python -m opengait.demo --help` Expected Result: Prints usage showing --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames Failure Indicators: ImportError, missing arguments, crash Evidence: .sisyphus/evidence/task-9-help.txt Scenario: Pipeline runs with sample video (no NATS) Tool: Bash Preconditions: sample.mp4 exists (from Task 11), checkpoint exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --device cuda:0 --max-frames 120 2>&1 | tee /tmp/pipeline_output.txt` 2. Count prediction lines: `grep -c '"label"' /tmp/pipeline_output.txt` Expected Result: Exit code 0, at least 1 prediction line with valid JSON containing "label" field Failure Indicators: Crash, no predictions, invalid JSON, CUDA error Evidence: .sisyphus/evidence/task-9-pipeline-run.txt Scenario: Pipeline handles missing video gracefully Tool: Bash Steps: 1. Run `uv run python -m opengait.demo --source /nonexistent/video.mp4 --checkpoint ./ckpt/ScoNet-20000.pt 2>&1; echo "EXIT_CODE=$?"` Expected Result: Prints error message about missing file, exits with non-zero code (no traceback dump) Failure Indicators: Unhandled exception with full traceback, exit code 0 Evidence: .sisyphus/evidence/task-9-missing-video.txtCommit: YES
- Message:
feat(demo): add main pipeline application with CLI entry point - Files:
opengait/demo/pipeline.py,opengait/demo/__main__.py - Pre-commit:
uv run python -m opengait.demo --help
- Create
-
10. Unit Tests — Single-Person Policy + Window Reset
What to do:
- Create
tests/demo/test_window.py - Test
SilhouetteWindow:- Fill to capacity →
is_ready()returns True - Underfilled →
is_ready()returns False - Track ID change resets buffer
- Frame gap exceeding threshold resets buffer
get_tensor()returns correct shape[1, 1, window_size, 64, 44]should_classify()respects stride
- Fill to capacity →
- Test
select_person():- Single detection → returns it
- Multiple detections → returns largest bbox area
- No detections → returns None
- Detections without track IDs (tracker not initialized) → returns None
- Use mock YOLO results (don't require actual YOLO model)
Must NOT do:
- Don't require GPU — window tests are CPU-only (get_tensor can use cpu device)
- Don't require YOLO model file — mock the results
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Multiple edge cases to cover, mock YOLO results require understanding ultralytics API
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Tasks 9, 11)
- Blocks: None (verification task)
- Blocked By: Task 5 (window module must exist)
References:
Pattern References:
opengait/demo/window.py(Task 5) — Module under test
WHY Each Reference Matters:
- Direct test target
Acceptance Criteria:
tests/demo/test_window.pyexists with ≥6 test casesuv run pytest tests/demo/test_window.py -qpasses
QA Scenarios:
Scenario: All window and single-person tests pass Tool: Bash Steps: 1. Run `uv run pytest tests/demo/test_window.py -v` Expected Result: All tests pass (≥6 tests), exit code 0 Failure Indicators: Assertion failures, import errors Evidence: .sisyphus/evidence/task-10-window-tests.txtCommit: YES
- Message:
test(demo): add window manager and single-person policy tests - Files:
tests/demo/test_window.py - Pre-commit:
uv run pytest tests/demo/test_window.py -q
- Create
-
11. Sample Video for Smoke Testing
What to do:
- Acquire or create a short sample video for pipeline smoke testing
- Options (in order of preference):
- Extract a short clip (5-10 seconds, ~120 frames at 15fps) from an existing Scoliosis1K raw video if accessible
- Record a short clip using webcam via
cv2.VideoCapture(0) - Generate a synthetic video with a person-shaped blob moving across frames
- Save to
./assets/sample.mp4(or./assets/sample.avi) - Requirements: contains at least one person walking, 720p or lower, ≥60 frames
- If no real video is available, create a synthetic one:
- 120 frames, 640×480, 15fps
- White rectangle (simulating person silhouette) moving across dark background
- This won't test YOLO detection quality but will verify pipeline doesn't crash
- Add
assets/sample.mp4to.gitignoreif it's large (>10MB)
Must NOT do:
- Don't use any Scoliosis1K dataset files that are symlinked (user constraint)
- Don't commit large video files to git
Recommended Agent Profile:
- Category:
quick- Reason: Simple file creation/acquisition task
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Tasks 9, 10)
- Blocks: Task 12
- Blocked By: Task 1 (needs OpenCV dependency from scaffolding)
References: None needed — standalone task
Acceptance Criteria:
-
./assets/sample.mp4(or.avi) exists -
Video has ≥60 frames
-
Playable with
uv run python -c "import cv2; cap=cv2.VideoCapture('./assets/sample.mp4'); print(f'frames={int(cap.get(7))}'); cap.release()"QA Scenarios:
Scenario: Sample video is valid Tool: Bash Steps: 1. Run `uv run python -c "` ```python import cv2 cap = cv2.VideoCapture('./assets/sample.mp4') assert cap.isOpened(), 'Cannot open video' n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) assert n >= 60, f'Too few frames: {n}' ret, frame = cap.read() assert ret and frame is not None, 'Cannot read first frame' h, w = frame.shape[:2] assert h >= 240 and w >= 320, f'Too small: {w}x{h}' cap.release() print(f'SAMPLE_VIDEO_OK frames={n} size={w}x{h}') ``` Expected Result: Prints 'SAMPLE_VIDEO_OK' with frame count ≥60 Failure Indicators: Cannot open, too few frames, too small Evidence: .sisyphus/evidence/task-11-sample-video.txtCommit: YES
- Message:
chore(demo): add sample video for smoke testing - Files:
assets/sample.mp4(or add to .gitignore and document) - Pre-commit: none
- Message:
-
12. Integration Tests — End-to-End Smoke Test
What to do:
- Create
tests/demo/test_pipeline.py - Integration test: run the full pipeline with sample video, no NATS
- Uses
subprocess.run()to invokepython -m opengait.demo - Captures stdout, parses JSON predictions
- Asserts: exit code 0, ≥1 prediction, valid JSON schema
- Uses
- Test graceful exit on end-of-video
- Test
--max-framesflag: run with max_frames=60, verify it stops - Test error handling: invalid source path → non-zero exit, error message
- Test error handling: invalid checkpoint path → non-zero exit, error message
- FPS benchmark (informational, not a hard assertion):
- Run pipeline on sample video, measure wall time, compute FPS
- Log FPS to evidence file (target: ≥15 FPS on desktop GPU)
Must NOT do:
- Don't require NATS server for this test — use console publisher
- Don't hardcode CUDA device — use
--device cuda:0only if CUDA available, else skip
Recommended Agent Profile:
- Category:
deep- Reason: Full integration test requiring all components working together, subprocess management, JSON parsing
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 4 (with Task 13)
- Blocks: F1-F4 (Final verification)
- Blocked By: Tasks 9 (pipeline), 11 (sample video)
References:
Pattern References:
opengait/demo/__main__.py(Task 9) — CLI flags to invokeopengait/demo/output.py(Task 6) — JSON schema to validate
WHY Each Reference Matters:
__main__.py: Need exact CLI flag names for subprocess invocationoutput.py: Need JSON schema to assert against
Acceptance Criteria:
tests/demo/test_pipeline.pyexists with ≥4 test casesCUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -qpasses- Tests cover: happy path, max-frames, invalid source, invalid checkpoint
QA Scenarios:
Scenario: Full pipeline integration test passes Tool: Bash Preconditions: All components built, sample video exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -v --timeout=120` Expected Result: All tests pass (≥4), exit code 0 Failure Indicators: Subprocess crash, JSON parse error, timeout Evidence: .sisyphus/evidence/task-12-integration.txt Scenario: FPS benchmark Tool: Bash Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -c "` ```python import subprocess, time start = time.monotonic() result = subprocess.run( ['uv', 'run', 'python', '-m', 'opengait.demo', '--source', './assets/sample.mp4', '--checkpoint', './ckpt/ScoNet-20000.pt', '--device', 'cuda:0', '--nats-url', ''], capture_output=True, text=True, timeout=120) elapsed = time.monotonic() - start import cv2 cap = cv2.VideoCapture('./assets/sample.mp4') n_frames = int(cap.get(7)); cap.release() fps = n_frames / elapsed if elapsed > 0 else 0 print(f'FPS_BENCHMARK frames={n_frames} elapsed={elapsed:.1f}s fps={fps:.1f}') assert fps >= 5, f'FPS too low: {fps}' # conservative threshold ``` Expected Result: Prints FPS benchmark, ≥5 FPS (conservative) Failure Indicators: Timeout, crash, FPS < 5 Evidence: .sisyphus/evidence/task-12-fps-benchmark.txtCommit: YES
- Message:
test(demo): add integration and end-to-end smoke tests - Files:
tests/demo/test_pipeline.py - Pre-commit:
CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -q
- Create
-
13. NATS Integration Test
What to do:
- Create
tests/demo/test_nats.py - Test requires NATS server (use Docker:
docker run -d --rm --name nats-test -p 4222:4222 nats:2) - Mark tests with
@pytest.mark.skipifif Docker/NATS not available - Test flow:
- Start NATS container
- Start a
nats-pysubscriber onscoliosis.result - Run pipeline with
--nats-url nats://127.0.0.1:4222 --max-frames 60 - Collect received messages
- Assert: ≥1 message received, valid JSON, correct schema
- Stop NATS container
- Test NatsPublisher reconnection: start pipeline, stop NATS, restart NATS — pipeline should recover
- JSON schema validation:
frame: inttrack_id: intlabel: str in {"negative", "neutral", "positive"}confidence: float in [0, 1]window: int (should equal window_size)timestamp_ns: int
Must NOT do:
- Don't leave Docker containers running after test
- Don't hardcode NATS port — use a fixture that finds an open port
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Docker orchestration + async NATS subscriber + subprocess pipeline — moderate complexity
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 4 (with Task 12)
- Blocks: F1-F4 (Final verification)
- Blocked By: Tasks 9 (pipeline), 6 (NATS publisher)
References:
Pattern References:
opengait/demo/output.py(Task 6) —NatsPublisherclass, JSON schema
External References:
- nats-py subscriber:
sub = await nc.subscribe('scoliosis.result'); msg = await sub.next_msg(timeout=10) - Docker NATS:
docker run -d --rm --name nats-test -p 4222:4222 nats:2
WHY Each Reference Matters:
output.py: Need to match the exact subject and JSON schema the publisher produces- nats-py: Need subscriber API to consume and validate messages
Acceptance Criteria:
tests/demo/test_nats.pyexists with ≥2 test cases- Tests are skippable when Docker/NATS not available
CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -qpasses (when Docker available)
QA Scenarios:
Scenario: NATS receives valid prediction JSON Tool: Bash Preconditions: Docker available, CUDA available, sample video exists Steps: 1. Run `docker run -d --rm --name nats-test -p 4222:4222 nats:2` 2. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -v --timeout=60` 3. Run `docker stop nats-test` Expected Result: Tests pass, at least one valid JSON message received on scoliosis.result Failure Indicators: No messages, invalid JSON, schema mismatch, timeout Evidence: .sisyphus/evidence/task-13-nats-integration.txt Scenario: NATS test is skipped when Docker unavailable Tool: Bash Preconditions: Docker NOT running or not installed Steps: 1. Run `uv run pytest tests/demo/test_nats.py -v -k 'nats' 2>&1 | head -20` Expected Result: Tests show as SKIPPED (not FAILED) Failure Indicators: Test fails instead of skipping Evidence: .sisyphus/evidence/task-13-nats-skip.txtCommit: YES
- Message:
test(demo): add NATS integration tests - Files:
tests/demo/test_nats.py - Pre-commit:
uv run pytest tests/demo/test_nats.py -q(skips if no Docker)
- Create
Final Verification Wave (MANDATORY — after ALL implementation tasks)
4 review agents run in PARALLEL. ALL must APPROVE. Rejection → fix → re-run.
-
F1. Plan Compliance Audit —
oracleRead the plan end-to-end. For each "Must Have": verify implementation exists (read file, run command). For each "Must NOT Have": search codebase for forbidden patterns (torch.distributed imports in demo/, BaseModel subclassing). Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan. Output:Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT -
F2. Code Quality Review —
unspecified-highRun linter +uv run pytest tests/demo/ -q. Review all new files inopengait/demo/for:as any/type:ignore, empty catches, print statements used instead of logging, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic variable names. Output:Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT -
F3. Real Manual QA —
unspecified-highStart from clean state. Run pipeline with sample video:uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120. Verify predictions are printed to console (no--nats-url= console output). Run with NATS: start container, run pipeline with--nats-url nats://127.0.0.1:4222, subscribe and validate JSON schema. Test edge cases: missing video file (graceful error), no checkpoint (graceful error), --help flag. Output:Scenarios [N/N pass] | Edge Cases [N tested] | VERDICT -
F4. Scope Fidelity Check —
deepFor each task: read "What to do", read actual files created. Verify 1:1 — everything in spec was built (no missing), nothing beyond spec was built (no creep). Check "Must NOT do" compliance: no torch.distributed in demo/, no BaseModel subclass, no TensorRT code, no multi-person logic. Flag unaccounted changes. Output:Tasks [N/N compliant] | Scope [CLEAN/N issues] | VERDICT
Commit Strategy
- Wave 1:
feat(demo): add ScoNetDemo inference wrapper— sconet_demo.py - Wave 1:
feat(demo): add input adapters and silhouette preprocessing— input.py, preprocess.py - Wave 1:
chore(demo): scaffold demo package and test infrastructure— __init__.py, conftest, pyproject.toml - Wave 2:
feat(demo): add sliding window manager and NATS publisher— window.py, output.py - Wave 2:
test(demo): add preprocessing and model unit tests— test_preprocess.py, test_sconet_demo.py - Wave 3:
feat(demo): add main pipeline application with CLI— pipeline.py, __main__.py - Wave 3:
test(demo): add window manager and single-person policy tests— test_window.py - Wave 4:
test(demo): add integration and NATS tests— test_pipeline.py, test_nats.py
Success Criteria
Verification Commands
# Smoke test (no NATS)
uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120
# Expected: exits 0, prints ≥3 prediction lines with label in {negative, neutral, positive}
# Unit tests
uv run pytest tests/demo/ -q
# Expected: all tests pass
# Help flag
uv run python -m opengait.demo --help
# Expected: prints usage with --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames
Final Checklist
- All "Must Have" present
- All "Must NOT Have" absent
- All tests pass
- Pipeline runs at ≥15 FPS on desktop GPU
- JSON schema matches spec
- No torch.distributed imports in opengait/demo/