Persist orchestration artifacts, including plan definition, progress state, decisions, issues, and learnings gathered during delegated execution and QA gates. This preserves implementation rationale and auditability without coupling documentation snapshots to runtime logic commits.
70 KiB
Real-Time Scoliosis Screening Pipeline (ScoNet)
TL;DR
Quick Summary: Build a production-oriented, real-time scoliosis screening pipeline inside the OpenGait repo. Reads video from cv-mmap shared memory or OpenCV, runs YOLO11n-seg for person detection/segmentation/tracking, extracts 64×44 binary silhouettes, feeds a sliding window of 30 frames into a DDP-free ScoNet wrapper, and publishes classification results (positive/neutral/negative) as JSON over NATS.
Deliverables:
ScoNetDemo— standalonenn.Modulewrapper for ScoNet inference (no DDP)- Silhouette preprocessing module — deterministic mask→64×44 float tensor pipeline
- Ring buffer / sliding window manager — per-track frame accumulation with reset logic
- Input adapters — cv-mmap async client + OpenCV VideoCapture fallback
- NATS publisher — JSON result output
- Main pipeline application — orchestrates all components
- pytest test suite — preprocessing, windowing, single-person policy, recovery
- Sample video for smoke testing
Estimated Effort: Large Parallel Execution: YES — 4 waves Critical Path: Task 1 (ScoNetDemo) → Task 3 (Silhouette Preprocess) → Task 5 (Ring Buffer) → Task 7 (Pipeline App) → Task 10 (Integration Tests)
Context
Original Request
Build a camera-to-classification pipeline for scoliosis screening using OpenGait's ScoNet model, targeting Jetson AGX Orin deployment, with cv-mmap shared-memory video framework integration.
Interview Summary
Key Discussions:
- Input: Both camera (via cv-mmap) and video files (OpenCV fallback). Single person only.
- CV Stack: YOLO11n-seg with built-in ByteTrack (replaces paper's BYTETracker + PP-HumanSeg)
- Inference: Sliding window of 30 frames, continuous classification
- Output: JSON over NATS (decided over binary protocol — simpler, cross-language)
- DDP Bypass: Create
ScoNetDemo(nn.Module)following All-in-One-Gait'sBaselineDemopattern - Build Location: Inside repo (opengait lacks
__init__.py, config system hardcodes paths) - Test Strategy: pytest, tests after implementation
- Hardware: Dev on i7-14700KF + RTX 5070 Ti/3090; deploy on Jetson AGX Orin
Research Findings:
- ScoNet input:
[N, 1, S, 64, 44]float32 [0,1]. Output:logits [N, 3, 16]→argmax(mean(-1))→ class index .pklpreprocessing: grayscale → crop vertical → resize h=64 → center-crop w=64 → cut 10px sides → /255.0BaseSilCuttingTransform: cutsint(W // 64) * 10px each side + divides by 255- All-in-One-Gait
BaselineDemo: extendsnn.Module, usestorch.load()+load_state_dict(),training=False - YOLO11n-seg: 6MB, ~50-60 FPS,
model.track(frame, persist=True)→ bbox + mask + track_id - cv-mmap Python client:
async for im, meta in CvMmapClient("name")— zero-copy numpy
Metis Review
Identified Gaps (addressed):
- Single-person policy undefined → Defined: largest-bbox selection, ignore others, reset window on ID change
- Sliding window stride undefined → Defined: stride=1 (every frame updates buffer), classify every N frames (configurable)
- No-detection / empty mask handling → Defined: skip frame, don't reset window unless gap exceeds threshold
- Mask quality / partial body → Defined: minimum mask area threshold to accept frame
- Track ID reset / re-identification → Defined: reset ring buffer on track ID change
- YOLO letterboxing → Defined: use
result.masks.datain original frame coords, not letterboxed - Async/sync impedance → Defined: synchronous pull-process-publish loop (no async queues in MVP)
- Scope creep lockdown → Explicitly excluded: TensorRT, DeepStream, multi-person, GUI, recording, auto-tuning
Work Objectives
Core Objective
Create a self-contained scoliosis screening pipeline that runs standalone (no DDP, no OpenGait training infrastructure) while reusing ScoNet's trained weights and matching its exact preprocessing contract.
Prerequisites (already present in repo)
- Checkpoint:
./ckpt/ScoNet-20000.pt— trained ScoNet weights (verified working, 80.88% accuracy on Scoliosis1K eval). Already exists in the repo, no download needed. - Config:
./configs/sconet/sconet_scoliosis1k.yaml— ScoNet architecture config. Already exists.
Concrete Deliverables
opengait/demo/sconet_demo.py— ScoNetDemo nn.Module wrapperopengait/demo/preprocess.py— Silhouette extraction and normalizationopengait/demo/window.py— Sliding window / ring buffer manageropengait/demo/input.py— Input adapters (cv-mmap + OpenCV)opengait/demo/output.py— NATS JSON publisheropengait/demo/pipeline.py— Main pipeline orchestratoropengait/demo/__main__.py— CLI entry pointtests/demo/test_preprocess.py— Preprocessing unit teststests/demo/test_window.py— Ring buffer + single-person policy teststests/demo/test_pipeline.py— Integration / smoke teststests/demo/test_pipeline.py— Integration / smoke tests
Definition of Done
uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120exits 0 and prints predictions (no NATS by default when--nats-urlnot provided)uv run pytest tests/demo/ -qpasses all tests- Pipeline processes ≥15 FPS on desktop GPU with 720p input
- JSON schema validated:
{"frame": int, "track_id": int, "label": str, "confidence": float, "window": int, "timestamp_ns": int}
Must Have
- Deterministic preprocessing matching ScoNet training data exactly (64×44, float32, [0,1])
- Single-person selection (largest bbox) with consistent tracking
- Sliding window of 30 frames with reset on track loss/ID change
- Graceful handling of: no detection, end of video, cv-mmap disconnect
- CLI with
--source,--checkpoint,--device,--window,--stride,--nats-url,--max-framesflags (usingclick) - Works without NATS server when
--nats-urlis omitted (console output fallback) - All tensor/array function signatures annotated with
jaxtypingtypes (e.g.,Float[Tensor, 'batch 1 seq 64 44']) and checked at runtime withbeartypevia@jaxtyped(typechecker=beartype)decorators - Generator-based input adapters — any
Iterable[tuple[np.ndarray, dict]]works as a source
Must NOT Have (Guardrails)
- No DDP: Demo must never import or call
torch.distributedanything - No BaseModel subclassing: ScoNetDemo extends
nn.Moduledirectly - No repo restructuring: Don't touch existing opengait training/eval/data code
- No TensorRT/DeepStream: Jetson acceleration is out of MVP scope
- No multi-person: Single tracked person only
- No GUI/visualization: Output is JSON, not rendered frames
- No dataset recording/auto-labeling: This is inference only
- No OpenCV GStreamer builds: Use pip-installed OpenCV
- No magic preprocessing: Every transform step must be explicit and testable
- No unbounded buffers: Every queue/buffer has a max size and drop policy
Verification Strategy
ZERO HUMAN INTERVENTION — ALL verification is agent-executed. No exceptions.
Test Decision
- Infrastructure exists: NO (creating with this plan)
- Automated tests: Tests after implementation (pytest)
- Framework: pytest (via
uv run pytest) - Setup: Add pytest to dev dependencies in pyproject.toml
QA Policy
Every task MUST include agent-executed QA scenarios.
Evidence saved to .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}.
- CLI/Pipeline: Use Bash — run pipeline with sample video, validate output
- Unit Tests: Use Bash —
uv run pytestspecific test files - NATS Integration: Use Bash — start NATS container, run pipeline, subscribe and validate JSON
Execution Strategy
Parallel Execution Waves
Wave 1 (Foundation — all independent, start immediately):
├── Task 1: Project scaffolding (opengait/demo/, tests/demo/, deps) [quick]
├── Task 2: ScoNetDemo nn.Module wrapper [deep]
├── Task 3: Silhouette preprocessing module [deep]
└── Task 4: Input adapters (cv-mmap + OpenCV) [unspecified-high]
Wave 2 (Core logic — depends on Wave 1 foundations):
├── Task 5: Ring buffer / sliding window manager (depends: 3) [unspecified-high]
├── Task 6: NATS JSON publisher (depends: 1) [quick]
├── Task 7: Unit tests — preprocessing (depends: 3) [unspecified-high]
└── Task 8: Unit tests — ScoNetDemo forward pass (depends: 2) [unspecified-high]
Wave 3 (Integration — combines all components):
├── Task 9: Main pipeline application + CLI (depends: 2,3,4,5,6) [deep]
├── Task 10: Single-person policy tests (depends: 5) [unspecified-high]
└── Task 11: Sample video acquisition (depends: 1) [quick]
Wave 4 (Verification — end-to-end):
├── Task 12: Integration tests + smoke test (depends: 9,11) [deep]
└── Task 13: NATS integration test (depends: 9,6) [unspecified-high]
Wave FINAL (Independent review — 4 parallel):
├── Task F1: Plan compliance audit (oracle)
├── Task F2: Code quality review (unspecified-high)
├── Task F3: Real manual QA (unspecified-high)
└── Task F4: Scope fidelity check (deep)
Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 9 → Task 12 → F1-F4
Parallel Speedup: ~60% faster than sequential
Max Concurrent: 4 (Waves 1 & 2)
Dependency Matrix
| Task | Depends On | Blocks | Wave |
|---|---|---|---|
| 1 | — | 6, 11 | 1 |
| 2 | — | 8, 9 | 1 |
| 3 | — | 5, 7, 9 | 1 |
| 4 | — | 9 | 1 |
| 5 | 3 | 9, 10 | 2 |
| 6 | 1 | 9, 13 | 2 |
| 7 | 3 | — | 2 |
| 8 | 2 | — | 2 |
| 9 | 2, 3, 4, 5, 6 | 12, 13 | 3 |
| 10 | 5 | — | 3 |
| 11 | 1 | 12 | 3 |
| 12 | 9, 11 | F1-F4 | 4 |
| 13 | 9, 6 | F1-F4 | 4 |
| F1-F4 | 12, 13 | — | FINAL |
Agent Dispatch Summary
- Wave 1: 4 — T1 →
quick, T2 →deep, T3 →deep, T4 →unspecified-high - Wave 2: 4 — T5 →
unspecified-high, T6 →quick, T7 →unspecified-high, T8 →unspecified-high - Wave 3: 3 — T9 →
deep, T10 →unspecified-high, T11 →quick - Wave 4: 2 — T12 →
deep, T13 →unspecified-high - FINAL: 4 — F1 →
oracle, F2 →unspecified-high, F3 →unspecified-high, F4 →deep
TODOs
Implementation + Test = ONE Task. Never separate. EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.
-
1. Project Scaffolding + Dependencies
What to do:
- Create
opengait/demo/__init__.py(empty, makes it a package) - Create
opengait/demo/__main__.py(stub:from .pipeline import main; main()) - Create
tests/demo/__init__.pyandtests/__init__.pyif missing - Create
tests/demo/conftest.pywith shared fixtures (sample tensor, mock frame) - Add dev dependencies to
pyproject.toml:pytest,nats-py,ultralytics,jaxtyping,beartype,click - Verify:
uv sync --extra torchsucceeds with new deps - Verify:
uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click"works
Must NOT do:
- Don't modify existing opengait code or imports
- Don't add runtime deps that aren't needed (no flask, no fastapi, etc.)
Recommended Agent Profile:
- Category:
quick- Reason: Boilerplate file creation and dependency management, no complex logic
- Skills: []
- Skills Evaluated but Omitted:
explore: Not needed — we know exactly what files to create
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 2, 3)
- Blocks: Tasks 6, 11
- Blocked By: None (can start immediately)
References:
Pattern References:
opengait/modeling/models/__init__.py— Example of package init in this repopyproject.toml— Current dependency structure; add to[project.optional-dependencies]or[dependency-groups]
External References:
- ultralytics pip package:
pip install ultralytics(includes YOLO + ByteTrack) - nats-py:
pip install nats-py(async NATS client)
WHY Each Reference Matters:
pyproject.toml: Must match existing dep management style (uv + groups) to avoid breakinguv syncopengait/modeling/models/__init__.py: Shows the repo's package init convention (dynamic imports vs empty)
Acceptance Criteria:
opengait/demo/__init__.pyexistsopengait/demo/__main__.pyexists with stub entry pointtests/demo/conftest.pyexists with at least one fixtureuv syncsucceeds without errorsuv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click; print('OK')"prints OK
QA Scenarios:
Scenario: Dependencies install correctly Tool: Bash Preconditions: Clean uv environment Steps: 1. Run `uv sync --extra torch` 2. Run `uv run python -c "import ultralytics; import nats; import pytest; import jaxtyping; import beartype; import click; print('ALL_DEPS_OK')"` Expected Result: Both commands exit 0, second prints 'ALL_DEPS_OK' Failure Indicators: ImportError, uv sync failure, missing package Evidence: .sisyphus/evidence/task-4-deps-install.txt Scenario: Package structure is importable Tool: Bash Preconditions: uv sync completed Steps: 1. Run `uv run python -c "from opengait.demo import __main__; print('IMPORT_OK')"` Expected Result: Prints 'IMPORT_OK' without errors Failure Indicators: ModuleNotFoundError, ImportError Evidence: .sisyphus/evidence/task-4-import-check.txtCommit: YES
- Message:
chore(demo): scaffold demo package and test infrastructure - Files:
opengait/demo/__init__.py,opengait/demo/__main__.py,tests/demo/conftest.py,tests/demo/__init__.py,tests/__init__.py,pyproject.toml - Pre-commit:
uv sync && uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click"
- Create
-
2. ScoNetDemo — DDP-Free Inference Wrapper
What to do:
- Create
opengait/demo/sconet_demo.py - Class
ScoNetDemo(nn.Module)— NOT a BaseModel subclass - Constructor takes
cfg_path: strandcheckpoint_path: str - Use
config_loaderfromopengait/utils/common.pyto parse YAML config - Build the ScoNet architecture layers directly:
Backbone(ResNet9 fromopengait/modeling/backbones/resnet.py)TemporalPool(fromopengait/modeling/modules.py)HorizontalPoolingPyramid(fromopengait/modeling/modules.py)SeparateFCs(fromopengait/modeling/modules.py)SeparateBNNecks(fromopengait/modeling/modules.py)
- Load checkpoint:
torch.load(checkpoint_path, map_location=device)→ extract state_dict →load_state_dict() - Handle checkpoint format: may be
{'model': state_dict, ...}or plain state_dict - Strip
module.prefix from DDP-wrapped keys if present - All public methods decorated with
@jaxtyped(typechecker=beartype)for runtime shape checking forward(sils: Float[Tensor, 'batch 1 seq 64 44']) -> dictwhere seq=30 (window size)- Use jaxtyping:
from jaxtyping import Float, Int, jaxtyped - Use beartype:
from beartype import beartype
- Use jaxtyping:
- Returns
{'logits': Float[Tensor, 'batch 3 16'], 'label': int, 'confidence': float} predict(sils: Float[Tensor, 'batch 1 seq 64 44']) -> tuple[str, float]convenience method: returns('positive'|'neutral'|'negative', confidence)- Prediction logic:
argmax(logits.mean(dim=-1), dim=-1)→ index → label string - Confidence:
softmax(logits.mean(dim=-1)).max()— probability of chosen class - Class mapping:
{0: 'negative', 1: 'neutral', 2: 'positive'}
Must NOT do:
- Do NOT import anything from
torch.distributed - Do NOT subclass
BaseModel - Do NOT use
ddp_all_gatherorget_ddp_module - Do NOT modify
sconet.pyor any existing model file
Recommended Agent Profile:
- Category:
deep- Reason: Requires understanding ScoNet's architecture, checkpoint format, and careful weight loading — high complexity, correctness-critical
- Skills: []
- Skills Evaluated but Omitted:
explore: Agent should read referenced files directly, not search broadly
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 2, 3, 4)
- Blocks: Tasks 8, 9
- Blocked By: None (can start immediately)
References:
Pattern References:
opengait/modeling/models/sconet.py— ScoNet model definition. Study__init__to see which submodules are built and howforward()assembles the pipeline. Lines ~10-54.opengait/modeling/base_model.py— BaseModel class. Study__init__(lines ~30-80) to see how it builds backbone/heads from config. Replicate the build logic WITHOUT DDP calls.- All-in-One-Gait
BaselineDemopattern: extendsnn.Moduledirectly, usestorch.load()+load_state_dict()withtraining=False
API/Type References:
opengait/modeling/backbones/resnet.py— ResNet9 backbone class. Constructor signature and forward signature.opengait/modeling/modules.py—TemporalPool,HorizontalPoolingPyramid,SeparateFCs,SeparateBNNecksclasses. Constructor args come from config YAML.opengait/utils/common.py::config_loader— Loads YAML config, merges with default.yaml. Returns dict.
Config References:
configs/sconet/sconet_scoliosis1k.yaml— ScoNet config specifying backbone, head, loss params. Themodel_cfgsection defines architecture hyperparams.configs/default.yaml— Default config merged by config_loader
Checkpoint Reference:
./ckpt/ScoNet-20000.pt— Trained ScoNet checkpoint. Verify format:torch.load()and inspect keys.
Inference Logic Reference:
opengait/evaluation/evaluator.py:evaluate_scoliosis()(line ~418) — Showsargmax(logits.mean(-1))prediction logic and label mapping
WHY Each Reference Matters:
sconet.py: Defines the exact forward pass we must replicate — TP → backbone → HPP → FCs → BNNecksbase_model.py: Shows how config is parsed into submodule instantiation — we copy this logic minus DDPmodules.py: Constructor signatures tell us what config keys to extractevaluator.py: The prediction aggregation (mean over parts, argmax) is the canonical inference logicsconet_scoliosis1k.yaml: Contains the exact hyperparams (channels, num_parts, etc.) for building layers
Acceptance Criteria:
opengait/demo/sconet_demo.pyexists withScoNetDemo(nn.Module)class- No
torch.distributedimports in the file ScoNetDemodoes not inherit fromBaseModeluv run python -c "from opengait.demo.sconet_demo import ScoNetDemo; print('OK')"works
QA Scenarios:
Scenario: ScoNetDemo loads checkpoint and produces correct output shape Tool: Bash Preconditions: ./ckpt/ScoNet-20000.pt exists, CUDA available Steps: 1. Run `uv run python -c "` ```python import torch from opengait.demo.sconet_demo import ScoNetDemo model = ScoNetDemo('./configs/sconet/sconet_scoliosis1k.yaml', './ckpt/ScoNet-20000.pt', device='cuda:0') model.eval() dummy = torch.rand(1, 1, 30, 64, 44, device='cuda:0') with torch.no_grad(): result = model(dummy) assert result['logits'].shape == (1, 3, 16), f'Bad shape: {result["logits"].shape}' label, conf = model.predict(dummy) assert label in ('negative', 'neutral', 'positive'), f'Bad label: {label}' assert 0.0 <= conf <= 1.0, f'Bad confidence: {conf}' print(f'SCONET_OK label={label} conf={conf:.3f}') ``` Expected Result: Prints 'SCONET_OK label=... conf=...' with valid label and confidence Failure Indicators: ImportError, state_dict key mismatch, shape error, CUDA error Evidence: .sisyphus/evidence/task-1-sconet-forward.txt Scenario: ScoNetDemo rejects DDP-wrapped usage Tool: Bash Preconditions: File exists Steps: 1. Run `grep -c 'torch.distributed' opengait/demo/sconet_demo.py` 2. Run `grep -c 'BaseModel' opengait/demo/sconet_demo.py` Expected Result: Both commands output '0' Failure Indicators: Any count > 0 Evidence: .sisyphus/evidence/task-1-no-ddp.txtCommit: YES
- Message:
feat(demo): add ScoNetDemo DDP-free inference wrapper - Files:
opengait/demo/sconet_demo.py - Pre-commit:
uv run python -c "from opengait.demo.sconet_demo import ScoNetDemo"
- Create
-
3. Silhouette Preprocessing Module
What to do:
- Create
opengait/demo/preprocess.py - All public functions decorated with
@jaxtyped(typechecker=beartype)for runtime shape checking - Function
mask_to_silhouette(mask: UInt8[ndarray, 'h w'], bbox: tuple[int,int,int,int]) -> Float[ndarray, '64 44'] | None:- Uses jaxtyping:
from jaxtyping import Float, UInt8, jaxtypedandfrom numpy import ndarray - Input: binary mask (H, W) uint8 from YOLO, bounding box (x1, y1, x2, y2)
- Crop mask to bbox region
- Find vertical extent of foreground pixels (top/bottom rows with nonzero)
- Crop to tight vertical bounding box (remove empty rows above/below)
- Resize height to 64, maintaining aspect ratio
- Center-crop or center-pad width to 64
- Cut 10px from each side → final 64×44
- Return float32 array [0.0, 1.0] (divide by 255)
- Return
Noneif mask area belowMIN_MASK_AREAthreshold (default: 500 pixels)
- Uses jaxtyping:
- Function
frame_to_person_mask(result, min_area: int = 500) -> tuple[UInt8[ndarray, 'h w'], tuple[int,int,int,int]] | None:- Extract single-person mask + bbox from YOLO result object
- Uses
result.masks.dataandresult.boxes.xyxy - Returns
Noneif no valid detection
- Constants:
SIL_HEIGHT = 64,SIL_WIDTH = 44,SIL_FULL_WIDTH = 64,SIDE_CUT = 10,MIN_MASK_AREA = 500 - Each step must match the preprocessing in
datasets/pretreatment.py(grayscale → crop → resize → center) andBaseSilCuttingTransform(cut sides → /255)
Must NOT do:
- Don't import or modify
datasets/pretreatment.py - Don't add color/texture features — binary silhouettes only
- Don't resize to arbitrary sizes — must be exactly 64×44 output
Recommended Agent Profile:
- Category:
deep- Reason: Correctness-critical — must exactly match training data preprocessing. Subtle pixel-level bugs will silently degrade accuracy.
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 2, 4)
- Blocks: Tasks 5, 7, 9
- Blocked By: None
References:
Pattern References:
datasets/pretreatment.py:18-96(functionimgs2pickle) — The canonical preprocessing pipeline. Study lines 45-80 carefully:cv2.imread(GRAYSCALE)→ find contours → crop to person bbox →cv2.resize(img, (int(64 * ratio), 64))→ center-crop width. This is the EXACT sequence to replicate for live masks.opengait/data/transform.py:46-58(BaseSilCuttingTransform) — The runtime transform applied during training/eval.cutting = int(w // 64) * 10then slices[:, :, cutting:-cutting]then divides by 255.0. For w=64 input, cutting=10, output width=44.
API/Type References:
- Ultralytics
Resultsobject:result.masks.data→Tensor[N, H, W]binary masks;result.boxes.xyxy→Tensor[N, 4]bounding boxes;result.boxes.id→ track IDs (may be None)
WHY Each Reference Matters:
pretreatment.py: Defines the ground truth preprocessing. If our live pipeline differs by even 1 pixel of padding, ScoNet accuracy degrades.BaseSilCuttingTransform: The 10px side cut + /255 is applied at runtime during training. We must apply the SAME transform.- Ultralytics masks: Need to know exact API to extract binary masks from YOLO output
Acceptance Criteria:
opengait/demo/preprocess.pyexistsmask_to_silhouette()returnsnp.ndarrayof shape(64, 44)dtypefloat32with values in[0, 1]- Returns
Nonefor masks below MIN_MASK_AREA
QA Scenarios:
Scenario: Preprocessing produces correct output shape and range Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Create a synthetic mask: 200x100 person-shaped blob mask = np.zeros((480, 640), dtype=np.uint8) mask[100:400, 250:400] = 255 # person region bbox = (250, 100, 400, 400) sil = mask_to_silhouette(mask, bbox) assert sil is not None, 'Should not be None for valid mask' assert sil.shape == (64, 44), f'Bad shape: {sil.shape}' assert sil.dtype == np.float32, f'Bad dtype: {sil.dtype}' assert 0.0 <= sil.min() and sil.max() <= 1.0, f'Bad range: [{sil.min()}, {sil.max()}]' assert sil.max() > 0, 'Should have nonzero pixels' print('PREPROCESS_OK') ``` Expected Result: Prints 'PREPROCESS_OK' Failure Indicators: Shape mismatch, dtype error, range error Evidence: .sisyphus/evidence/task-3-preprocess-shape.txt Scenario: Small masks are rejected Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Tiny mask: only 10x10 = 100 pixels (below MIN_MASK_AREA=500) mask = np.zeros((480, 640), dtype=np.uint8) mask[100:110, 100:110] = 255 bbox = (100, 100, 110, 110) sil = mask_to_silhouette(mask, bbox) assert sil is None, f'Should be None for tiny mask, got {type(sil)}' print('SMALL_MASK_REJECTED_OK') ``` Expected Result: Prints 'SMALL_MASK_REJECTED_OK' Failure Indicators: Returns non-None for tiny mask Evidence: .sisyphus/evidence/task-3-small-mask-reject.txtCommit: YES
- Message:
feat(demo): add silhouette preprocessing module - Files:
opengait/demo/preprocess.py - Pre-commit:
uv run python -c "from opengait.demo.preprocess import mask_to_silhouette"
- Create
-
4. Input Adapters (cv-mmap + OpenCV)
What to do:
- Create
opengait/demo/input.py - The pipeline contract is simple: it consumes any
Iterable[tuple[np.ndarray, dict]]— any generator or iterator that yields(frame_bgr_uint8, metadata_dict)works - Type alias:
FrameStream = Iterable[tuple[np.ndarray, dict]] - Generator function
opencv_source(path: str | int, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:pathcan be video file path or camera index (int)- Opens
cv2.VideoCapture(path) - Yields
(frame, {'frame_count': int, 'timestamp_ns': int})tuples - Handles end-of-video gracefully (just returns)
- Handles camera disconnect (log warning, return)
- Respects
max_frameslimit
- Generator function
cvmmap_source(name: str, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:- Wraps
CvMmapClientfrom/home/crosstyan/Code/cv-mmap/client/cvmmap/ - Since cv-mmap is async (anyio), this adapter must bridge async→sync:
- Run anyio event loop in a background thread, drain frames via
queue.Queue - Or use
anyio.from_thread/asyncio.run()withasync forinternally - Choose simplest correct approach
- Run anyio event loop in a background thread, drain frames via
- Yields same
(frame, metadata_dict)tuple format as opencv_source - Handles cv-mmap disconnect/offline events gracefully
- Import should be conditional (try/except ImportError) so pipeline works without cv-mmap installed
- Wraps
- Factory function
create_source(source: str, max_frames: int | None = None) -> FrameStream:- If source starts with
cvmmap://→cvmmap_source(name) - If source is a digit string →
opencv_source(int(source))(camera index) - Otherwise →
opencv_source(source)(file path)
- If source starts with
- The key design point: any user-written generator that yields
(np.ndarray, dict)plugs in directly — no class inheritance needed
Must NOT do:
- Don't build GStreamer pipelines
- Don't add async to the main pipeline loop — keep synchronous pull model
- Don't use abstract base classes or heavy OOP — plain generator functions are the interface
- Don't buffer frames internally (no unbounded queue between source and consumer)
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Integration with external library (cv-mmap) requires careful async→sync bridging
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 1 (with Tasks 1, 3, 4)
- Blocks: Task 9
- Blocked By: None
References:
Pattern References:
/home/crosstyan/Code/cv-mmap/client/cvmmap/__init__.py—CvMmapClientclass. Async iterator:async for im, meta in client. Understand the__aiter__/__anext__protocol./home/crosstyan/Code/cv-mmap/client/test_cvmmap.py— Example consumer pattern usinganyio.run()/home/crosstyan/Code/cv-mmap/client/cvmmap/msg.py—FrameMetadataandFrameInfodataclasses. Fields:frame_count,timestamp_ns,info.width,info.height,info.pixel_format
API/Type References:
cv2.VideoCapture— OpenCV video capture.cap.read()returns(bool, np.ndarray).cap.get(cv2.CAP_PROP_FRAME_COUNT)for total frames.
WHY Each Reference Matters:
CvMmapClient: The async iterator yields(numpy_array, FrameMetadata)— need to know exact types for sync bridgingmsg.py: Metadata fields must be mapped to our genericdictmetadata formattest_cvmmap.py: Shows the canonical consumer pattern we must wrap
Acceptance Criteria:
opengait/demo/input.pyexists withopencv_source,cvmmap_source,create_sourceas functions (not classes)create_source('./some/video.mp4')returns a generator/iterablecreate_source('cvmmap://default')returns a generator (or raises if cv-mmap not installed)create_source('0')returns a generator for camera index 0- Any custom generator
def my_source(): yield (frame, meta)can be used directly by the pipeline
QA Scenarios:
Scenario: opencv_source reads frames from a video file Tool: Bash Preconditions: Any .mp4 or .avi file available (use sample video or create synthetic one) Steps: 1. Create a short test video if none exists: `uv run python -c "import cv2; import numpy as np; w=cv2.VideoWriter('/tmp/test.avi', cv2.VideoWriter_fourcc(*'MJPG'), 15, (640,480)); [w.write(np.random.randint(0,255,(480,640,3),dtype=np.uint8)) for _ in range(30)]; w.release(); print('VIDEO_CREATED')"` 2. Run `uv run python -c "` ```python from opengait.demo.input import create_source src = create_source('/tmp/test.avi', max_frames=10) count = 0 for frame, meta in src: assert frame.shape[2] == 3, f'Not BGR: {frame.shape}' assert 'frame_count' in meta count += 1 assert count == 10, f'Expected 10 frames, got {count}' print('OPENCV_SOURCE_OK') ``` Expected Result: Prints 'OPENCV_SOURCE_OK' Failure Indicators: Shape error, missing metadata, wrong frame count Evidence: .sisyphus/evidence/task-2-opencv-source.txt Scenario: Custom generator works as pipeline input Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.input import FrameStream import typing # Any generator works — no class needed def my_source(): for i in range(5): yield np.zeros((480, 640, 3), dtype=np.uint8), {'frame_count': i} src = my_source() frames = list(src) assert len(frames) == 5 print('CUSTOM_GENERATOR_OK') ``` Expected Result: Prints 'CUSTOM_GENERATOR_OK' Failure Indicators: Type error, protocol mismatch Evidence: .sisyphus/evidence/task-2-custom-gen.txtCommit: YES
- Message:
feat(demo): add generator-based input adapters for cv-mmap and OpenCV - Files:
opengait/demo/input.py - Pre-commit:
uv run python -c "from opengait.demo.input import create_source"
- Create
-
5. Sliding Window / Ring Buffer Manager
What to do:
- Create
opengait/demo/window.py - Class
SilhouetteWindow:- Constructor:
__init__(self, window_size: int = 30, stride: int = 1, gap_threshold: int = 15) - Internal storage:
collections.deque(maxlen=window_size)ofnp.ndarray(64×44 float32) push(sil: np.ndarray, frame_idx: int, track_id: int) -> None:- If
track_iddiffers from current tracked ID → reset buffer, update tracked ID - If
frame_idx - last_frame_idx > gap_threshold→ reset buffer (too many missed frames) - Append silhouette to deque
- Increment internal frame counter
- If
is_ready() -> bool: returnslen(buffer) == window_sizeshould_classify() -> bool: returnsis_ready() and (frames_since_last_classify >= stride)get_tensor(device: str = 'cpu') -> torch.Tensor:- Stack buffer into
np.arrayshape[window_size, 64, 44] - Convert to
torch.Tensorshape[1, 1, window_size, 64, 44]ondevice - This is the exact input shape for ScoNetDemo
- Stack buffer into
reset() -> None: clear buffer and countersmark_classified() -> None: reset frames_since_last_classify counter- Properties:
current_track_id,frame_count,fill_level(len/window_size as float)
- Constructor:
- Single-person selection policy (function or small helper):
select_person(results) -> tuple[np.ndarray, tuple, int] | None- From YOLO results, select the detection with the largest bounding box area
- Return
(mask, bbox, track_id)orNoneif no valid detection - If
result.boxes.idis None (tracker not yet initialized), skip frame
Must NOT do:
- No unbounded buffers — deque with maxlen enforces this
- No multi-person tracking — single person only, select largest bbox
- No time-based windowing — frame-count based only
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Data structure + policy logic with edge cases (ID changes, gaps, resets)
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 6, 7, 8)
- Blocks: Tasks 9, 10
- Blocked By: Task 3 (needs silhouette shape constants from preprocess.py)
References:
Pattern References:
opengait/demo/preprocess.py(Task 3) —SIL_HEIGHT,SIL_WIDTHconstants. The window stores arrays of this shape.opengait/data/dataset.py— Shows how OpenGait's DataSet samples fixed-length sequences. TheseqLparameter controls sequence length (our window_size=30).
API/Type References:
- Ultralytics
Results.boxes.id— Track IDs tensor, may beNoneif tracker hasn't assigned IDs yet - Ultralytics
Results.boxes.xyxy— Bounding boxes[N, 4]for area calculation - Ultralytics
Results.masks.data— Binary masks[N, H, W]
WHY Each Reference Matters:
preprocess.py: Window must store silhouettes of the exact shape produced by preprocessingdataset.py: Understanding how training samples sequences helps ensure our window matches- Ultralytics API: Need to handle
Nonetrack IDs and extract correct tensors
Acceptance Criteria:
opengait/demo/window.pyexists withSilhouetteWindowclass andselect_personfunction- Buffer is bounded (deque with maxlen)
get_tensor()returns shape[1, 1, 30, 64, 44]when full- Track ID change triggers reset
- Gap exceeding threshold triggers reset
QA Scenarios:
Scenario: Window fills and produces correct tensor shape Tool: Bash Preconditions: Module importable Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.window import SilhouetteWindow win = SilhouetteWindow(window_size=30, stride=1) for i in range(30): sil = np.random.rand(64, 44).astype(np.float32) win.push(sil, frame_idx=i, track_id=1) assert win.is_ready(), 'Window should be ready after 30 frames' t = win.get_tensor() assert t.shape == (1, 1, 30, 64, 44), f'Bad shape: {t.shape}' assert t.dtype.is_floating_point, f'Bad dtype: {t.dtype}' print('WINDOW_FILL_OK') ``` Expected Result: Prints 'WINDOW_FILL_OK' Failure Indicators: Shape mismatch, not ready after 30 pushes Evidence: .sisyphus/evidence/task-5-window-fill.txt Scenario: Track ID change resets buffer Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.window import SilhouetteWindow win = SilhouetteWindow(window_size=30) for i in range(20): win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=i, track_id=1) assert win.frame_count == 20 # Switch track ID — should reset win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=20, track_id=2) assert win.frame_count == 1, f'Should reset to 1, got {win.frame_count}' assert win.current_track_id == 2 print('TRACK_RESET_OK') ``` Expected Result: Prints 'TRACK_RESET_OK' Failure Indicators: Buffer not reset, wrong track ID Evidence: .sisyphus/evidence/task-5-track-reset.txtCommit: YES
- Message:
feat(demo): add sliding window manager with single-person selection - Files:
opengait/demo/window.py - Pre-commit:
uv run python -c "from opengait.demo.window import SilhouetteWindow"
- Create
-
6. NATS JSON Publisher
What to do:
- Create
opengait/demo/output.py - Class
ResultPublisher(Protocol)— any object withpublish(result: dict) -> None - Function
console_publisher() -> Generatoror simple classConsolePublisher:- Prints JSON to stdout (default when
--nats-urlis not provided) - Format: one JSON object per line (JSONL)
- Prints JSON to stdout (default when
- Class
NatsPublisher:- Constructor:
__init__(self, url: str = 'nats://127.0.0.1:4222', subject: str = 'scoliosis.result') - Uses
nats-pyasync client, bridged to syncpublish()method - Handles connection failures gracefully (log warning, retry with backoff, don't crash pipeline)
- Handles reconnection automatically (nats-py does this by default)
publish(result: dict) -> None: serializes to JSON, publishes to subjectclose() -> None: drain and close NATS connection- Context manager support (
__enter__/__exit__)
- Constructor:
- JSON schema for results:
{ "frame": 1234, "track_id": 1, "label": "positive", "confidence": 0.82, "window": 30, "timestamp_ns": 1234567890000 } - Factory:
create_publisher(nats_url: str | None, subject: str = 'scoliosis.result') -> ResultPublisher- If
nats_urlis None → ConsolePublisher - Otherwise → NatsPublisher(url, subject)
- If
Must NOT do:
- Don't use JetStream (plain NATS PUB/SUB is sufficient)
- Don't build custom binary protocol
- Don't buffer/batch results — publish immediately
Recommended Agent Profile:
- Category:
quick- Reason: Straightforward JSON serialization + NATS client wrapper, well-documented library
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 7, 8)
- Blocks: Tasks 9, 13
- Blocked By: Task 1 (needs project scaffolding for nats-py dependency)
References:
External References:
- nats-py docs:
import nats; nc = await nats.connect(); await nc.publish(subject, data)— async API /home/crosstyan/Code/cv-mmap-gui/— Uses NATS.c for messaging; our Python publisher sends to the same broker
WHY Each Reference Matters:
- nats-py: Need to bridge async NATS client to sync
publish()call - cv-mmap-gui: Confirms NATS is the right transport for this ecosystem
Acceptance Criteria:
opengait/demo/output.pyexists withConsolePublisher,NatsPublisher,create_publisher- ConsolePublisher prints valid JSON to stdout
- NatsPublisher connects and publishes without crashing (when NATS available)
- NatsPublisher logs warning and doesn't crash when NATS unavailable
QA Scenarios:
Scenario: ConsolePublisher outputs valid JSONL Tool: Bash Steps: 1. Run `uv run python -c "` ```python import json, io, sys from opengait.demo.output import create_publisher pub = create_publisher(nats_url=None) result = {'frame': 100, 'track_id': 1, 'label': 'positive', 'confidence': 0.85, 'window': 30, 'timestamp_ns': 0} pub.publish(result) # should print to stdout print('CONSOLE_PUB_OK') ``` Expected Result: Prints a JSON line followed by 'CONSOLE_PUB_OK' Failure Indicators: Invalid JSON, missing fields, crash Evidence: .sisyphus/evidence/task-6-console-pub.txt Scenario: NatsPublisher handles missing server gracefully Tool: Bash Steps: 1. Run `uv run python -c "` ```python from opengait.demo.output import create_publisher try: pub = create_publisher(nats_url='nats://127.0.0.1:14222') # wrong port, no server pub.publish({'frame': 0, 'label': 'test'}) except SystemExit: print('SHOULD_NOT_EXIT') raise print('NATS_GRACEFUL_OK') ``` Expected Result: Prints 'NATS_GRACEFUL_OK' (warning logged, no crash) Failure Indicators: Unhandled exception, SystemExit, hang Evidence: .sisyphus/evidence/task-6-nats-graceful.txtCommit: YES
- Message:
feat(demo): add NATS JSON publisher and console fallback - Files:
opengait/demo/output.py - Pre-commit:
uv run python -c "from opengait.demo.output import create_publisher"
- Create
-
7. Unit Tests — Silhouette Preprocessing
What to do:
- Create
tests/demo/test_preprocess.py - Test
mask_to_silhouette()with:- Valid mask (person-shaped blob) → output shape (64, 44), dtype float32, range [0, 1]
- Tiny mask below MIN_MASK_AREA → returns None
- Empty mask (all zeros) → returns None
- Full-frame mask (all 255) → produces valid output (edge case: very wide person)
- Tall narrow mask → verify aspect ratio handling (resize h=64, center-crop width)
- Wide short mask → verify handling (should still produce 64×44)
- Test determinism: same input always produces same output
- Test against a reference
.pklsample if available:- Load a known
.pklfile from Scoliosis1K - Extract one frame
- Compare our preprocessing output to the stored frame (should be close/identical)
- Load a known
- Verify jaxtyping annotations are present and beartype checks fire on wrong shapes
Must NOT do:
- Don't test YOLO integration here — only test the
mask_to_silhouettefunction in isolation - Don't require GPU — all preprocessing is CPU numpy ops
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Must verify pixel-level correctness against training data contract, multiple edge cases
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 6, 8)
- Blocks: None (verification task)
- Blocked By: Task 3 (preprocess module must exist)
References:
Pattern References:
opengait/demo/preprocess.py(Task 3) — The module under testdatasets/pretreatment.py:18-96— Reference preprocessing to validate againstopengait/data/transform.py:46-58—BaseSilCuttingTransformfor expected output contract
WHY Each Reference Matters:
preprocess.py: Direct test targetpretreatment.py: Ground truth for what a correct silhouette looks likeBaseSilCuttingTransform: Defines the 64→44 cut + /255 contract we must match
Acceptance Criteria:
tests/demo/test_preprocess.pyexists with ≥5 test casesuv run pytest tests/demo/test_preprocess.py -qpasses- Tests cover: valid mask, tiny mask, empty mask, determinism
QA Scenarios:
Scenario: All preprocessing tests pass Tool: Bash Preconditions: Task 3 (preprocess.py) is complete Steps: 1. Run `uv run pytest tests/demo/test_preprocess.py -v` Expected Result: All tests pass (≥5 tests), exit code 0 Failure Indicators: Any assertion failure, import error Evidence: .sisyphus/evidence/task-7-preprocess-tests.txt Scenario: Jaxtyping annotation enforcement works Tool: Bash Steps: 1. Run `uv run python -c "` ```python import numpy as np from opengait.demo.preprocess import mask_to_silhouette # Intentionally wrong type to verify beartype catches it try: mask_to_silhouette('not_an_array', (0, 0, 10, 10)) print('BEARTYPE_MISSED') # should not reach here except Exception as e: if 'beartype' in type(e).__module__ or 'BeartypeCallHint' in type(e).__name__: print('BEARTYPE_OK') else: print(f'WRONG_ERROR: {type(e).__name__}: {e}') ``` Expected Result: Prints 'BEARTYPE_OK' Failure Indicators: Prints 'BEARTYPE_MISSED' or 'WRONG_ERROR' Evidence: .sisyphus/evidence/task-7-beartype-check.txtCommit: YES (groups with Task 8)
- Message:
test(demo): add preprocessing and model unit tests - Files:
tests/demo/test_preprocess.py - Pre-commit:
uv run pytest tests/demo/test_preprocess.py -q
- Create
-
8. Unit Tests — ScoNetDemo Forward Pass
What to do:
- Create
tests/demo/test_sconet_demo.py - Test
ScoNetDemoconstruction:- Loads config from YAML
- Loads checkpoint weights
- Model is in eval mode
- Test
forward()with dummy tensor:- Input:
torch.rand(1, 1, 30, 64, 44)on available device - Output logits shape:
(1, 3, 16) - Output dtype: float32
- Input:
- Test
predict()convenience method:- Returns
(label_str, confidence_float) label_stris one of{'negative', 'neutral', 'positive'}confidenceis in[0.0, 1.0]
- Returns
- Test with various batch sizes: N=1, N=2
- Test with various sequence lengths if model supports it (should work with 30)
- Verify no
torch.distributedcalls are made (mocktorch.distributedto raise if called) - Verify jaxtyping shape annotations on forward/predict signatures
Must NOT do:
- Don't test with real video data — dummy tensors only for unit tests
- Don't modify the checkpoint
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Requires CUDA, checkpoint loading, shape validation — moderate complexity
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 2 (with Tasks 5, 6, 7)
- Blocks: None (verification task)
- Blocked By: Task 2 (ScoNetDemo must exist)
References:
Pattern References:
opengait/demo/sconet_demo.py(Task 1) — The module under testopengait/evaluation/evaluator.py:evaluate_scoliosis()(line ~418) — Canonical prediction logic to validate against
Config/Checkpoint References:
configs/sconet/sconet_scoliosis1k.yaml— Config file to pass to ScoNetDemo./ckpt/ScoNet-20000.pt— Trained checkpoint
WHY Each Reference Matters:
sconet_demo.py: Direct test targetevaluator.py: Defines expected prediction behavior (argmax of mean logits)
Acceptance Criteria:
tests/demo/test_sconet_demo.pyexists with ≥4 test casesuv run pytest tests/demo/test_sconet_demo.py -qpasses- Tests cover: construction, forward shape, predict output, no-DDP enforcement
QA Scenarios:
Scenario: All ScoNetDemo tests pass Tool: Bash Preconditions: Task 1 (sconet_demo.py) is complete, checkpoint exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_sconet_demo.py -v` Expected Result: All tests pass (≥4 tests), exit code 0 Failure Indicators: state_dict key mismatch, shape error, CUDA OOM Evidence: .sisyphus/evidence/task-8-sconet-tests.txt Scenario: No DDP leakage in ScoNetDemo Tool: Bash Steps: 1. Run `grep -rn 'torch.distributed' opengait/demo/sconet_demo.py` 2. Run `grep -rn 'ddp_all_gather\|get_ddp_module\|get_rank' opengait/demo/sconet_demo.py` Expected Result: Both commands produce no output (exit code 1 = no matches) Failure Indicators: Any match found Evidence: .sisyphus/evidence/task-8-no-ddp.txtCommit: YES (groups with Task 7)
- Message:
test(demo): add preprocessing and model unit tests - Files:
tests/demo/test_sconet_demo.py - Pre-commit:
uv run pytest tests/demo/test_sconet_demo.py -q
- Create
-
9. Main Pipeline Application + CLI
What to do:
- Create
opengait/demo/pipeline.py— the main orchestrator - Create
opengait/demo/__main__.py— CLI entry point (replace stub from Task 4) - Pipeline class
ScoliosisPipeline:- Constructor:
__init__(self, source: FrameStream, model: ScoNetDemo, publisher: ResultPublisher, window: SilhouetteWindow, yolo_model_path: str = 'yolo11n-seg.pt', device: str = 'cuda:0') - Uses jaxtyping annotations for all tensor-bearing methods:
from jaxtyping import Float, UInt8, jaxtyped from beartype import beartype from torch import Tensor import numpy as np from numpy import ndarray run() -> None— main loop:- Load YOLO model:
ultralytics.YOLO(yolo_model_path) - For each
(frame, meta)from source: a. Runyolo_model.track(frame, persist=True, verbose=False)→ results b.select_person(results)→(mask, bbox, track_id)or None → skip if None c.mask_to_silhouette(mask, bbox)→silor None → skip if None d.window.push(sil, meta['frame_count'], track_id)e. Ifwindow.should_classify():tensor = window.get_tensor(device=self.device)label, confidence = self.model.predict(tensor)publisher.publish({...})with JSON schema fieldswindow.mark_classified()
- Log FPS every 100 frames
- Cleanup on exit (close publisher, release resources)
- Load YOLO model:
- Graceful shutdown on KeyboardInterrupt / SIGTERM
- Constructor:
- CLI via
__main__.pyusingclick:--source(required): video path, camera index, orcvmmap://name--checkpoint(required): path to ScoNet checkpoint--config(default:./configs/sconet/sconet_scoliosis1k.yaml): ScoNet config YAML--device(default:cuda:0): torch device--yolo-model(default:yolo11n-seg.pt): YOLO model path (auto-downloads)--window(default: 30): sliding window size--stride(default: 30): classify every N frames after window is full--nats-url(default: None): NATS server URL, None = console output--nats-subject(default:scoliosis.result): NATS subject--max-frames(default: None): stop after N frames--help: print usage
- Entrypoint:
uv run python -m opengait.demo ...
Must NOT do:
- No async in the main loop — synchronous pull-process-publish
- No multi-threading for inference — single-threaded pipeline
- No GUI / frame display / cv2.imshow
- No unbounded accumulation — ring buffer handles memory
- No auto-download of ScoNet checkpoint — user must provide path
Recommended Agent Profile:
- Category:
deep- Reason: Integration of all components, error handling, CLI design, FPS logging — largest single task
- Skills: []
Parallelization:
- Can Run In Parallel: NO
- Parallel Group: Wave 3 (sequential — depends on most Wave 1+2 tasks)
- Blocks: Tasks 12, 13
- Blocked By: Tasks 2, 3, 4, 5, 6 (all components must exist)
References:
Pattern References:
opengait/demo/sconet_demo.py(Task 1) —ScoNetDemoclass,predict()methodopengait/demo/preprocess.py(Task 3) —mask_to_silhouette(),frame_to_person_mask()opengait/demo/window.py(Task 5) —SilhouetteWindow,select_person()opengait/demo/input.py(Task 2) —create_source(),FrameStreamtype aliasopengait/demo/output.py(Task 6) —create_publisher(),ResultPublisher
External References:
- Ultralytics tracking API:
model.track(frame, persist=True)— returnsResultslist - Ultralytics result object:
results[0].masks.data,results[0].boxes.xyxy,results[0].boxes.id
WHY Each Reference Matters:
- All Task refs: This task composes every component — must know each API surface
- Ultralytics: The YOLO
.track()call is the only external API used directly in this file
Acceptance Criteria:
opengait/demo/pipeline.pyexists withScoliosisPipelineclassopengait/demo/__main__.pyexists with click CLIuv run python -m opengait.demo --helpprints usage without errors- All public methods have jaxtyping annotations where tensor/array args are involved
QA Scenarios:
Scenario: CLI --help works Tool: Bash Steps: 1. Run `uv run python -m opengait.demo --help` Expected Result: Prints usage showing --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames Failure Indicators: ImportError, missing arguments, crash Evidence: .sisyphus/evidence/task-9-help.txt Scenario: Pipeline runs with sample video (no NATS) Tool: Bash Preconditions: sample.mp4 exists (from Task 11), checkpoint exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --device cuda:0 --max-frames 120 2>&1 | tee /tmp/pipeline_output.txt` 2. Count prediction lines: `grep -c '"label"' /tmp/pipeline_output.txt` Expected Result: Exit code 0, at least 1 prediction line with valid JSON containing "label" field Failure Indicators: Crash, no predictions, invalid JSON, CUDA error Evidence: .sisyphus/evidence/task-9-pipeline-run.txt Scenario: Pipeline handles missing video gracefully Tool: Bash Steps: 1. Run `uv run python -m opengait.demo --source /nonexistent/video.mp4 --checkpoint ./ckpt/ScoNet-20000.pt 2>&1; echo "EXIT_CODE=$?"` Expected Result: Prints error message about missing file, exits with non-zero code (no traceback dump) Failure Indicators: Unhandled exception with full traceback, exit code 0 Evidence: .sisyphus/evidence/task-9-missing-video.txtCommit: YES
- Message:
feat(demo): add main pipeline application with CLI entry point - Files:
opengait/demo/pipeline.py,opengait/demo/__main__.py - Pre-commit:
uv run python -m opengait.demo --help
- Create
-
10. Unit Tests — Single-Person Policy + Window Reset
What to do:
- Create
tests/demo/test_window.py - Test
SilhouetteWindow:- Fill to capacity →
is_ready()returns True - Underfilled →
is_ready()returns False - Track ID change resets buffer
- Frame gap exceeding threshold resets buffer
get_tensor()returns correct shape[1, 1, window_size, 64, 44]should_classify()respects stride
- Fill to capacity →
- Test
select_person():- Single detection → returns it
- Multiple detections → returns largest bbox area
- No detections → returns None
- Detections without track IDs (tracker not initialized) → returns None
- Use mock YOLO results (don't require actual YOLO model)
Must NOT do:
- Don't require GPU — window tests are CPU-only (get_tensor can use cpu device)
- Don't require YOLO model file — mock the results
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Multiple edge cases to cover, mock YOLO results require understanding ultralytics API
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Tasks 9, 11)
- Blocks: None (verification task)
- Blocked By: Task 5 (window module must exist)
References:
Pattern References:
opengait/demo/window.py(Task 5) — Module under test
WHY Each Reference Matters:
- Direct test target
Acceptance Criteria:
tests/demo/test_window.pyexists with ≥6 test casesuv run pytest tests/demo/test_window.py -qpasses
QA Scenarios:
Scenario: All window and single-person tests pass Tool: Bash Steps: 1. Run `uv run pytest tests/demo/test_window.py -v` Expected Result: All tests pass (≥6 tests), exit code 0 Failure Indicators: Assertion failures, import errors Evidence: .sisyphus/evidence/task-10-window-tests.txtCommit: YES
- Message:
test(demo): add window manager and single-person policy tests - Files:
tests/demo/test_window.py - Pre-commit:
uv run pytest tests/demo/test_window.py -q
- Create
-
11. Sample Video for Smoke Testing
What to do:
- Acquire or create a short sample video for pipeline smoke testing
- Options (in order of preference):
- Extract a short clip (5-10 seconds, ~120 frames at 15fps) from an existing Scoliosis1K raw video if accessible
- Record a short clip using webcam via
cv2.VideoCapture(0) - Generate a synthetic video with a person-shaped blob moving across frames
- Save to
./assets/sample.mp4(or./assets/sample.avi) - Requirements: contains at least one person walking, 720p or lower, ≥60 frames
- If no real video is available, create a synthetic one:
- 120 frames, 640×480, 15fps
- White rectangle (simulating person silhouette) moving across dark background
- This won't test YOLO detection quality but will verify pipeline doesn't crash
- Add
assets/sample.mp4to.gitignoreif it's large (>10MB)
Must NOT do:
- Don't use any Scoliosis1K dataset files that are symlinked (user constraint)
- Don't commit large video files to git
Recommended Agent Profile:
- Category:
quick- Reason: Simple file creation/acquisition task
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 3 (with Tasks 9, 10)
- Blocks: Task 12
- Blocked By: Task 1 (needs OpenCV dependency from scaffolding)
References: None needed — standalone task
Acceptance Criteria:
./assets/sample.mp4(or.avi) exists- Video has ≥60 frames
- Playable with
uv run python -c "import cv2; cap=cv2.VideoCapture('./assets/sample.mp4'); print(f'frames={int(cap.get(7))}'); cap.release()"
QA Scenarios:
Scenario: Sample video is valid Tool: Bash Steps: 1. Run `uv run python -c "` ```python import cv2 cap = cv2.VideoCapture('./assets/sample.mp4') assert cap.isOpened(), 'Cannot open video' n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) assert n >= 60, f'Too few frames: {n}' ret, frame = cap.read() assert ret and frame is not None, 'Cannot read first frame' h, w = frame.shape[:2] assert h >= 240 and w >= 320, f'Too small: {w}x{h}' cap.release() print(f'SAMPLE_VIDEO_OK frames={n} size={w}x{h}') ``` Expected Result: Prints 'SAMPLE_VIDEO_OK' with frame count ≥60 Failure Indicators: Cannot open, too few frames, too small Evidence: .sisyphus/evidence/task-11-sample-video.txtCommit: YES
- Message:
chore(demo): add sample video for smoke testing - Files:
assets/sample.mp4(or add to .gitignore and document) - Pre-commit: none
-
12. Integration Tests — End-to-End Smoke Test
What to do:
- Create
tests/demo/test_pipeline.py - Integration test: run the full pipeline with sample video, no NATS
- Uses
subprocess.run()to invokepython -m opengait.demo - Captures stdout, parses JSON predictions
- Asserts: exit code 0, ≥1 prediction, valid JSON schema
- Uses
- Test graceful exit on end-of-video
- Test
--max-framesflag: run with max_frames=60, verify it stops - Test error handling: invalid source path → non-zero exit, error message
- Test error handling: invalid checkpoint path → non-zero exit, error message
- FPS benchmark (informational, not a hard assertion):
- Run pipeline on sample video, measure wall time, compute FPS
- Log FPS to evidence file (target: ≥15 FPS on desktop GPU)
Must NOT do:
- Don't require NATS server for this test — use console publisher
- Don't hardcode CUDA device — use
--device cuda:0only if CUDA available, else skip
Recommended Agent Profile:
- Category:
deep- Reason: Full integration test requiring all components working together, subprocess management, JSON parsing
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 4 (with Task 13)
- Blocks: F1-F4 (Final verification)
- Blocked By: Tasks 9 (pipeline), 11 (sample video)
References:
Pattern References:
opengait/demo/__main__.py(Task 9) — CLI flags to invokeopengait/demo/output.py(Task 6) — JSON schema to validate
WHY Each Reference Matters:
__main__.py: Need exact CLI flag names for subprocess invocationoutput.py: Need JSON schema to assert against
Acceptance Criteria:
tests/demo/test_pipeline.pyexists with ≥4 test casesCUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -qpasses- Tests cover: happy path, max-frames, invalid source, invalid checkpoint
QA Scenarios:
Scenario: Full pipeline integration test passes Tool: Bash Preconditions: All components built, sample video exists, CUDA available Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -v --timeout=120` Expected Result: All tests pass (≥4), exit code 0 Failure Indicators: Subprocess crash, JSON parse error, timeout Evidence: .sisyphus/evidence/task-12-integration.txt Scenario: FPS benchmark Tool: Bash Steps: 1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -c "` ```python import subprocess, time start = time.monotonic() result = subprocess.run( ['uv', 'run', 'python', '-m', 'opengait.demo', '--source', './assets/sample.mp4', '--checkpoint', './ckpt/ScoNet-20000.pt', '--device', 'cuda:0', '--nats-url', ''], capture_output=True, text=True, timeout=120) elapsed = time.monotonic() - start import cv2 cap = cv2.VideoCapture('./assets/sample.mp4') n_frames = int(cap.get(7)); cap.release() fps = n_frames / elapsed if elapsed > 0 else 0 print(f'FPS_BENCHMARK frames={n_frames} elapsed={elapsed:.1f}s fps={fps:.1f}') assert fps >= 5, f'FPS too low: {fps}' # conservative threshold ``` Expected Result: Prints FPS benchmark, ≥5 FPS (conservative) Failure Indicators: Timeout, crash, FPS < 5 Evidence: .sisyphus/evidence/task-12-fps-benchmark.txtCommit: YES
- Message:
test(demo): add integration and end-to-end smoke tests - Files:
tests/demo/test_pipeline.py - Pre-commit:
CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -q
- Create
-
13. NATS Integration Test
What to do:
- Create
tests/demo/test_nats.py - Test requires NATS server (use Docker:
docker run -d --rm --name nats-test -p 4222:4222 nats:2) - Mark tests with
@pytest.mark.skipifif Docker/NATS not available - Test flow:
- Start NATS container
- Start a
nats-pysubscriber onscoliosis.result - Run pipeline with
--nats-url nats://127.0.0.1:4222 --max-frames 60 - Collect received messages
- Assert: ≥1 message received, valid JSON, correct schema
- Stop NATS container
- Test NatsPublisher reconnection: start pipeline, stop NATS, restart NATS — pipeline should recover
- JSON schema validation:
frame: inttrack_id: intlabel: str in {"negative", "neutral", "positive"}confidence: float in [0, 1]window: int (should equal window_size)timestamp_ns: int
Must NOT do:
- Don't leave Docker containers running after test
- Don't hardcode NATS port — use a fixture that finds an open port
Recommended Agent Profile:
- Category:
unspecified-high- Reason: Docker orchestration + async NATS subscriber + subprocess pipeline — moderate complexity
- Skills: []
Parallelization:
- Can Run In Parallel: YES
- Parallel Group: Wave 4 (with Task 12)
- Blocks: F1-F4 (Final verification)
- Blocked By: Tasks 9 (pipeline), 6 (NATS publisher)
References:
Pattern References:
opengait/demo/output.py(Task 6) —NatsPublisherclass, JSON schema
External References:
- nats-py subscriber:
sub = await nc.subscribe('scoliosis.result'); msg = await sub.next_msg(timeout=10) - Docker NATS:
docker run -d --rm --name nats-test -p 4222:4222 nats:2
WHY Each Reference Matters:
output.py: Need to match the exact subject and JSON schema the publisher produces- nats-py: Need subscriber API to consume and validate messages
Acceptance Criteria:
tests/demo/test_nats.pyexists with ≥2 test cases- Tests are skippable when Docker/NATS not available
CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -qpasses (when Docker available)
QA Scenarios:
Scenario: NATS receives valid prediction JSON Tool: Bash Preconditions: Docker available, CUDA available, sample video exists Steps: 1. Run `docker run -d --rm --name nats-test -p 4222:4222 nats:2` 2. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -v --timeout=60` 3. Run `docker stop nats-test` Expected Result: Tests pass, at least one valid JSON message received on scoliosis.result Failure Indicators: No messages, invalid JSON, schema mismatch, timeout Evidence: .sisyphus/evidence/task-13-nats-integration.txt Scenario: NATS test is skipped when Docker unavailable Tool: Bash Preconditions: Docker NOT running or not installed Steps: 1. Run `uv run pytest tests/demo/test_nats.py -v -k 'nats' 2>&1 | head -20` Expected Result: Tests show as SKIPPED (not FAILED) Failure Indicators: Test fails instead of skipping Evidence: .sisyphus/evidence/task-13-nats-skip.txtCommit: YES
- Message:
test(demo): add NATS integration tests - Files:
tests/demo/test_nats.py - Pre-commit:
uv run pytest tests/demo/test_nats.py -q(skips if no Docker)
- Create
Final Verification Wave (MANDATORY — after ALL implementation tasks)
4 review agents run in PARALLEL. ALL must APPROVE. Rejection → fix → re-run.
-
F1. Plan Compliance Audit —
oracleRead the plan end-to-end. For each "Must Have": verify implementation exists (read file, run command). For each "Must NOT Have": search codebase for forbidden patterns (torch.distributed imports in demo/, BaseModel subclassing). Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan. Output:Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT -
F2. Code Quality Review —
unspecified-highRun linter +uv run pytest tests/demo/ -q. Review all new files inopengait/demo/for:as any/type:ignore, empty catches, print statements used instead of logging, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic variable names. Output:Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT -
F3. Real Manual QA —
unspecified-highStart from clean state. Run pipeline with sample video:uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120. Verify predictions are printed to console (no--nats-url= console output). Run with NATS: start container, run pipeline with--nats-url nats://127.0.0.1:4222, subscribe and validate JSON schema. Test edge cases: missing video file (graceful error), no checkpoint (graceful error), --help flag. Output:Scenarios [N/N pass] | Edge Cases [N tested] | VERDICT -
F4. Scope Fidelity Check —
deepFor each task: read "What to do", read actual files created. Verify 1:1 — everything in spec was built (no missing), nothing beyond spec was built (no creep). Check "Must NOT do" compliance: no torch.distributed in demo/, no BaseModel subclass, no TensorRT code, no multi-person logic. Flag unaccounted changes. Output:Tasks [N/N compliant] | Scope [CLEAN/N issues] | VERDICT
Commit Strategy
- Wave 1:
feat(demo): add ScoNetDemo inference wrapper— sconet_demo.py - Wave 1:
feat(demo): add input adapters and silhouette preprocessing— input.py, preprocess.py - Wave 1:
chore(demo): scaffold demo package and test infrastructure— __init__.py, conftest, pyproject.toml - Wave 2:
feat(demo): add sliding window manager and NATS publisher— window.py, output.py - Wave 2:
test(demo): add preprocessing and model unit tests— test_preprocess.py, test_sconet_demo.py - Wave 3:
feat(demo): add main pipeline application with CLI— pipeline.py, __main__.py - Wave 3:
test(demo): add window manager and single-person policy tests— test_window.py - Wave 4:
test(demo): add integration and NATS tests— test_pipeline.py, test_nats.py
Success Criteria
Verification Commands
# Smoke test (no NATS)
uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120
# Expected: exits 0, prints ≥3 prediction lines with label in {negative, neutral, positive}
# Unit tests
uv run pytest tests/demo/ -q
# Expected: all tests pass
# Help flag
uv run python -m opengait.demo --help
# Expected: prints usage with --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames
Final Checklist
- All "Must Have" present
- All "Must NOT Have" absent
- All tests pass
- Pipeline runs at ≥15 FPS on desktop GPU
- JSON schema matches spec
- No torch.distributed imports in opengait/demo/