Files
OpenGait/.sisyphus/plans/sconet-pipeline.md
T
2026-02-27 17:47:55 +08:00

70 KiB
Raw Blame History

Real-Time Scoliosis Screening Pipeline (ScoNet)

TL;DR

Quick Summary: Build a production-oriented, real-time scoliosis screening pipeline inside the OpenGait repo. Reads video from cv-mmap shared memory or OpenCV, runs YOLO11n-seg for person detection/segmentation/tracking, extracts 64×44 binary silhouettes, feeds a sliding window of 30 frames into a DDP-free ScoNet wrapper, and publishes classification results (positive/neutral/negative) as JSON over NATS.

Deliverables:

  • ScoNetDemo — standalone nn.Module wrapper for ScoNet inference (no DDP)
  • Silhouette preprocessing module — deterministic mask→64×44 float tensor pipeline
  • Ring buffer / sliding window manager — per-track frame accumulation with reset logic
  • Input adapters — cv-mmap async client + OpenCV VideoCapture fallback
  • NATS publisher — JSON result output
  • Main pipeline application — orchestrates all components
  • pytest test suite — preprocessing, windowing, single-person policy, recovery
  • Sample video for smoke testing

Estimated Effort: Large Parallel Execution: YES — 4 waves Critical Path: Task 1 (ScoNetDemo) → Task 3 (Silhouette Preprocess) → Task 5 (Ring Buffer) → Task 7 (Pipeline App) → Task 10 (Integration Tests)


Context

Original Request

Build a camera-to-classification pipeline for scoliosis screening using OpenGait's ScoNet model, targeting Jetson AGX Orin deployment, with cv-mmap shared-memory video framework integration.

Interview Summary

Key Discussions:

  • Input: Both camera (via cv-mmap) and video files (OpenCV fallback). Single person only.
  • CV Stack: YOLO11n-seg with built-in ByteTrack (replaces paper's BYTETracker + PP-HumanSeg)
  • Inference: Sliding window of 30 frames, continuous classification
  • Output: JSON over NATS (decided over binary protocol — simpler, cross-language)
  • DDP Bypass: Create ScoNetDemo(nn.Module) following All-in-One-Gait's BaselineDemo pattern
  • Build Location: Inside repo (opengait lacks __init__.py, config system hardcodes paths)
  • Test Strategy: pytest, tests after implementation
  • Hardware: Dev on i7-14700KF + RTX 5070 Ti/3090; deploy on Jetson AGX Orin

Research Findings:

  • ScoNet input: [N, 1, S, 64, 44] float32 [0,1]. Output: logits [N, 3, 16]argmax(mean(-1)) → class index
  • .pkl preprocessing: grayscale → crop vertical → resize h=64 → center-crop w=64 → cut 10px sides → /255.0
  • BaseSilCuttingTransform: cuts int(W // 64) * 10 px each side + divides by 255
  • All-in-One-Gait BaselineDemo: extends nn.Module, uses torch.load() + load_state_dict(), training=False
  • YOLO11n-seg: 6MB, ~50-60 FPS, model.track(frame, persist=True) → bbox + mask + track_id
  • cv-mmap Python client: async for im, meta in CvMmapClient("name") — zero-copy numpy

Metis Review

Identified Gaps (addressed):

  • Single-person policy undefined → Defined: largest-bbox selection, ignore others, reset window on ID change
  • Sliding window stride undefined → Defined: stride=1 (every frame updates buffer), classify every N frames (configurable)
  • No-detection / empty mask handling → Defined: skip frame, don't reset window unless gap exceeds threshold
  • Mask quality / partial body → Defined: minimum mask area threshold to accept frame
  • Track ID reset / re-identification → Defined: reset ring buffer on track ID change
  • YOLO letterboxing → Defined: use result.masks.data in original frame coords, not letterboxed
  • Async/sync impedance → Defined: synchronous pull-process-publish loop (no async queues in MVP)
  • Scope creep lockdown → Explicitly excluded: TensorRT, DeepStream, multi-person, GUI, recording, auto-tuning

Work Objectives

Core Objective

Create a self-contained scoliosis screening pipeline that runs standalone (no DDP, no OpenGait training infrastructure) while reusing ScoNet's trained weights and matching its exact preprocessing contract.

Prerequisites (already present in repo)

  • Checkpoint: ./ckpt/ScoNet-20000.pt — trained ScoNet weights (verified working, 80.88% accuracy on Scoliosis1K eval). Already exists in the repo, no download needed.
  • Config: ./configs/sconet/sconet_scoliosis1k.yaml — ScoNet architecture config. Already exists.

Concrete Deliverables

  • opengait/demo/sconet_demo.py — ScoNetDemo nn.Module wrapper
  • opengait/demo/preprocess.py — Silhouette extraction and normalization
  • opengait/demo/window.py — Sliding window / ring buffer manager
  • opengait/demo/input.py — Input adapters (cv-mmap + OpenCV)
  • opengait/demo/output.py — NATS JSON publisher
  • opengait/demo/pipeline.py — Main pipeline orchestrator
  • opengait/demo/__main__.py — CLI entry point
  • tests/demo/test_preprocess.py — Preprocessing unit tests
  • tests/demo/test_window.py — Ring buffer + single-person policy tests
  • tests/demo/test_pipeline.py — Integration / smoke tests
  • tests/demo/test_pipeline.py — Integration / smoke tests

Definition of Done

  • uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120 exits 0 and prints predictions (no NATS by default when --nats-url not provided)
  • uv run pytest tests/demo/ -q passes all tests
  • Pipeline processes ≥15 FPS on desktop GPU with 720p input
  • JSON schema validated: {"frame": int, "track_id": int, "label": str, "confidence": float, "window": int, "timestamp_ns": int}

Must Have

  • Deterministic preprocessing matching ScoNet training data exactly (64×44, float32, [0,1])
  • Single-person selection (largest bbox) with consistent tracking
  • Sliding window of 30 frames with reset on track loss/ID change
  • Graceful handling of: no detection, end of video, cv-mmap disconnect
  • CLI with --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames flags (using click)
  • Works without NATS server when --nats-url is omitted (console output fallback)
  • All tensor/array function signatures annotated with jaxtyping types (e.g., Float[Tensor, 'batch 1 seq 64 44']) and checked at runtime with beartype via @jaxtyped(typechecker=beartype) decorators
  • Generator-based input adapters — any Iterable[tuple[np.ndarray, dict]] works as a source

Must NOT Have (Guardrails)

  • No DDP: Demo must never import or call torch.distributed anything
  • No BaseModel subclassing: ScoNetDemo extends nn.Module directly
  • No repo restructuring: Don't touch existing opengait training/eval/data code
  • No TensorRT/DeepStream: Jetson acceleration is out of MVP scope
  • No multi-person: Single tracked person only
  • No GUI/visualization: Output is JSON, not rendered frames
  • No dataset recording/auto-labeling: This is inference only
  • No OpenCV GStreamer builds: Use pip-installed OpenCV
  • No magic preprocessing: Every transform step must be explicit and testable
  • No unbounded buffers: Every queue/buffer has a max size and drop policy

Verification Strategy

ZERO HUMAN INTERVENTION — ALL verification is agent-executed. No exceptions.

Test Decision

  • Infrastructure exists: NO (creating with this plan)
  • Automated tests: Tests after implementation (pytest)
  • Framework: pytest (via uv run pytest)
  • Setup: Add pytest to dev dependencies in pyproject.toml

QA Policy

Every task MUST include agent-executed QA scenarios. Evidence saved to .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}.

  • CLI/Pipeline: Use Bash — run pipeline with sample video, validate output
  • Unit Tests: Use Bash — uv run pytest specific test files
  • NATS Integration: Use Bash — start NATS container, run pipeline, subscribe and validate JSON

Execution Strategy

Parallel Execution Waves

Wave 1 (Foundation — all independent, start immediately):
├── Task 1: Project scaffolding (opengait/demo/, tests/demo/, deps) [quick]
├── Task 2: ScoNetDemo nn.Module wrapper [deep]
├── Task 3: Silhouette preprocessing module [deep]
└── Task 4: Input adapters (cv-mmap + OpenCV) [unspecified-high]

Wave 2 (Core logic — depends on Wave 1 foundations):
├── Task 5: Ring buffer / sliding window manager (depends: 3) [unspecified-high]
├── Task 6: NATS JSON publisher (depends: 1) [quick]
├── Task 7: Unit tests — preprocessing (depends: 3) [unspecified-high]
└── Task 8: Unit tests — ScoNetDemo forward pass (depends: 2) [unspecified-high]

Wave 3 (Integration — combines all components):
├── Task 9: Main pipeline application + CLI (depends: 2,3,4,5,6) [deep]
├── Task 10: Single-person policy tests (depends: 5) [unspecified-high]
└── Task 11: Sample video acquisition (depends: 1) [quick]

Wave 4 (Verification — end-to-end):
├── Task 12: Integration tests + smoke test (depends: 9,11) [deep]
└── Task 13: NATS integration test (depends: 9,6) [unspecified-high]

Wave FINAL (Independent review — 4 parallel):
├── Task F1: Plan compliance audit (oracle)
├── Task F2: Code quality review (unspecified-high)
├── Task F3: Real manual QA (unspecified-high)
└── Task F4: Scope fidelity check (deep)

Critical Path: Task 1 → Task 2 → Task 3 → Task 5 → Task 9 → Task 12 → F1-F4
Parallel Speedup: ~60% faster than sequential
Max Concurrent: 4 (Waves 1 & 2)

Dependency Matrix

Task Depends On Blocks Wave
1 6, 11 1
2 8, 9 1
3 5, 7, 9 1
4 9 1
5 3 9, 10 2
6 1 9, 13 2
7 3 2
8 2 2
9 2, 3, 4, 5, 6 12, 13 3
10 5 3
11 1 12 3
12 9, 11 F1-F4 4
13 9, 6 F1-F4 4
F1-F4 12, 13 FINAL

Agent Dispatch Summary

  • Wave 1: 4 — T1 → quick, T2 → deep, T3 → deep, T4 → unspecified-high
  • Wave 2: 4 — T5 → unspecified-high, T6 → quick, T7 → unspecified-high, T8 → unspecified-high
  • Wave 3: 3 — T9 → deep, T10 → unspecified-high, T11 → quick
  • Wave 4: 2 — T12 → deep, T13 → unspecified-high
  • FINAL: 4 — F1 → oracle, F2 → unspecified-high, F3 → unspecified-high, F4 → deep

TODOs

Implementation + Test = ONE Task. Never separate. EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.


  • 1. Project Scaffolding + Dependencies

    What to do:

    • Create opengait/demo/__init__.py (empty, makes it a package)
    • Create opengait/demo/__main__.py (stub: from .pipeline import main; main())
    • Create tests/demo/__init__.py and tests/__init__.py if missing
    • Create tests/demo/conftest.py with shared fixtures (sample tensor, mock frame)
    • Add dev dependencies to pyproject.toml: pytest, nats-py, ultralytics, jaxtyping, beartype, click
    • Verify: uv sync --extra torch succeeds with new deps
    • Verify: uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click" works

    Must NOT do:

    • Don't modify existing opengait code or imports
    • Don't add runtime deps that aren't needed (no flask, no fastapi, etc.)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Boilerplate file creation and dependency management, no complex logic
    • Skills: []
    • Skills Evaluated but Omitted:
      • explore: Not needed — we know exactly what files to create

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1, 2, 3)
    • Blocks: Tasks 6, 11
    • Blocked By: None (can start immediately)

    References:

    Pattern References:

    • opengait/modeling/models/__init__.py — Example of package init in this repo
    • pyproject.toml — Current dependency structure; add to [project.optional-dependencies] or [dependency-groups]

    External References:

    • ultralytics pip package: pip install ultralytics (includes YOLO + ByteTrack)
    • nats-py: pip install nats-py (async NATS client)

    WHY Each Reference Matters:

    • pyproject.toml: Must match existing dep management style (uv + groups) to avoid breaking uv sync
    • opengait/modeling/models/__init__.py: Shows the repo's package init convention (dynamic imports vs empty)

    Acceptance Criteria:

    • opengait/demo/__init__.py exists
    • opengait/demo/__main__.py exists with stub entry point
    • tests/demo/conftest.py exists with at least one fixture
    • uv sync succeeds without errors
    • uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click; print('OK')" prints OK

    QA Scenarios:

    Scenario: Dependencies install correctly
      Tool: Bash
      Preconditions: Clean uv environment
      Steps:
        1. Run `uv sync --extra torch`
        2. Run `uv run python -c "import ultralytics; import nats; import pytest; import jaxtyping; import beartype; import click; print('ALL_DEPS_OK')"`
      Expected Result: Both commands exit 0, second prints 'ALL_DEPS_OK'
      Failure Indicators: ImportError, uv sync failure, missing package
      Evidence: .sisyphus/evidence/task-4-deps-install.txt
    
    Scenario: Package structure is importable
      Tool: Bash
      Preconditions: uv sync completed
      Steps:
        1. Run `uv run python -c "from opengait.demo import __main__; print('IMPORT_OK')"`
      Expected Result: Prints 'IMPORT_OK' without errors
      Failure Indicators: ModuleNotFoundError, ImportError
      Evidence: .sisyphus/evidence/task-4-import-check.txt
    

    Commit: YES

    • Message: chore(demo): scaffold demo package and test infrastructure
    • Files: opengait/demo/__init__.py, opengait/demo/__main__.py, tests/demo/conftest.py, tests/demo/__init__.py, tests/__init__.py, pyproject.toml
    • Pre-commit: uv sync && uv run python -c "import ultralytics; import nats; import jaxtyping; import beartype; import click"
  • 2. ScoNetDemo — DDP-Free Inference Wrapper

    What to do:

    • Create opengait/demo/sconet_demo.py
    • Class ScoNetDemo(nn.Module) — NOT a BaseModel subclass
    • Constructor takes cfg_path: str and checkpoint_path: str
    • Use config_loader from opengait/utils/common.py to parse YAML config
    • Build the ScoNet architecture layers directly:
      • Backbone (ResNet9 from opengait/modeling/backbones/resnet.py)
      • TemporalPool (from opengait/modeling/modules.py)
      • HorizontalPoolingPyramid (from opengait/modeling/modules.py)
      • SeparateFCs (from opengait/modeling/modules.py)
      • SeparateBNNecks (from opengait/modeling/modules.py)
    • Load checkpoint: torch.load(checkpoint_path, map_location=device) → extract state_dict → load_state_dict()
    • Handle checkpoint format: may be {'model': state_dict, ...} or plain state_dict
    • Strip module. prefix from DDP-wrapped keys if present
    • All public methods decorated with @jaxtyped(typechecker=beartype) for runtime shape checking
    • forward(sils: Float[Tensor, 'batch 1 seq 64 44']) -> dict where seq=30 (window size)
      • Use jaxtyping: from jaxtyping import Float, Int, jaxtyped
      • Use beartype: from beartype import beartype
    • Returns {'logits': Float[Tensor, 'batch 3 16'], 'label': int, 'confidence': float}
    • predict(sils: Float[Tensor, 'batch 1 seq 64 44']) -> tuple[str, float] convenience method: returns ('positive'|'neutral'|'negative', confidence)
    • Prediction logic: argmax(logits.mean(dim=-1), dim=-1) → index → label string
    • Confidence: softmax(logits.mean(dim=-1)).max() — probability of chosen class
    • Class mapping: {0: 'negative', 1: 'neutral', 2: 'positive'}

    Must NOT do:

    • Do NOT import anything from torch.distributed
    • Do NOT subclass BaseModel
    • Do NOT use ddp_all_gather or get_ddp_module
    • Do NOT modify sconet.py or any existing model file

    Recommended Agent Profile:

    • Category: deep
      • Reason: Requires understanding ScoNet's architecture, checkpoint format, and careful weight loading — high complexity, correctness-critical
    • Skills: []
    • Skills Evaluated but Omitted:
      • explore: Agent should read referenced files directly, not search broadly

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 2, 3, 4)
    • Blocks: Tasks 8, 9
    • Blocked By: None (can start immediately)

    References:

    Pattern References:

    • opengait/modeling/models/sconet.py — ScoNet model definition. Study __init__ to see which submodules are built and how forward() assembles the pipeline. Lines ~10-54.
    • opengait/modeling/base_model.py — BaseModel class. Study __init__ (lines ~30-80) to see how it builds backbone/heads from config. Replicate the build logic WITHOUT DDP calls.
    • All-in-One-Gait BaselineDemo pattern: extends nn.Module directly, uses torch.load() + load_state_dict() with training=False

    API/Type References:

    • opengait/modeling/backbones/resnet.py — ResNet9 backbone class. Constructor signature and forward signature.
    • opengait/modeling/modules.pyTemporalPool, HorizontalPoolingPyramid, SeparateFCs, SeparateBNNecks classes. Constructor args come from config YAML.
    • opengait/utils/common.py::config_loader — Loads YAML config, merges with default.yaml. Returns dict.

    Config References:

    • configs/sconet/sconet_scoliosis1k.yaml — ScoNet config specifying backbone, head, loss params. The model_cfg section defines architecture hyperparams.
    • configs/default.yaml — Default config merged by config_loader

    Checkpoint Reference:

    • ./ckpt/ScoNet-20000.pt — Trained ScoNet checkpoint. Verify format: torch.load() and inspect keys.

    Inference Logic Reference:

    • opengait/evaluation/evaluator.py:evaluate_scoliosis() (line ~418) — Shows argmax(logits.mean(-1)) prediction logic and label mapping

    WHY Each Reference Matters:

    • sconet.py: Defines the exact forward pass we must replicate — TP → backbone → HPP → FCs → BNNecks
    • base_model.py: Shows how config is parsed into submodule instantiation — we copy this logic minus DDP
    • modules.py: Constructor signatures tell us what config keys to extract
    • evaluator.py: The prediction aggregation (mean over parts, argmax) is the canonical inference logic
    • sconet_scoliosis1k.yaml: Contains the exact hyperparams (channels, num_parts, etc.) for building layers

    Acceptance Criteria:

    • opengait/demo/sconet_demo.py exists with ScoNetDemo(nn.Module) class
    • No torch.distributed imports in the file
    • ScoNetDemo does not inherit from BaseModel
    • uv run python -c "from opengait.demo.sconet_demo import ScoNetDemo; print('OK')" works

    QA Scenarios:

    Scenario: ScoNetDemo loads checkpoint and produces correct output shape
      Tool: Bash
      Preconditions: ./ckpt/ScoNet-20000.pt exists, CUDA available
      Steps:
        1. Run `uv run python -c "`
           ```python
           import torch
           from opengait.demo.sconet_demo import ScoNetDemo
           model = ScoNetDemo('./configs/sconet/sconet_scoliosis1k.yaml', './ckpt/ScoNet-20000.pt', device='cuda:0')
           model.eval()
           dummy = torch.rand(1, 1, 30, 64, 44, device='cuda:0')
           with torch.no_grad():
               result = model(dummy)
           assert result['logits'].shape == (1, 3, 16), f'Bad shape: {result["logits"].shape}'
           label, conf = model.predict(dummy)
           assert label in ('negative', 'neutral', 'positive'), f'Bad label: {label}'
           assert 0.0 <= conf <= 1.0, f'Bad confidence: {conf}'
           print(f'SCONET_OK label={label} conf={conf:.3f}')
           ```
      Expected Result: Prints 'SCONET_OK label=... conf=...' with valid label and confidence
      Failure Indicators: ImportError, state_dict key mismatch, shape error, CUDA error
      Evidence: .sisyphus/evidence/task-1-sconet-forward.txt
    
    Scenario: ScoNetDemo rejects DDP-wrapped usage
      Tool: Bash
      Preconditions: File exists
      Steps:
        1. Run `grep -c 'torch.distributed' opengait/demo/sconet_demo.py`
        2. Run `grep -c 'BaseModel' opengait/demo/sconet_demo.py`
      Expected Result: Both commands output '0'
      Failure Indicators: Any count > 0
      Evidence: .sisyphus/evidence/task-1-no-ddp.txt
    

    Commit: YES

    • Message: feat(demo): add ScoNetDemo DDP-free inference wrapper
    • Files: opengait/demo/sconet_demo.py
    • Pre-commit: uv run python -c "from opengait.demo.sconet_demo import ScoNetDemo"
  • 3. Silhouette Preprocessing Module

    What to do:

    • Create opengait/demo/preprocess.py
    • All public functions decorated with @jaxtyped(typechecker=beartype) for runtime shape checking
    • Function mask_to_silhouette(mask: UInt8[ndarray, 'h w'], bbox: tuple[int,int,int,int]) -> Float[ndarray, '64 44'] | None:
      • Uses jaxtyping: from jaxtyping import Float, UInt8, jaxtyped and from numpy import ndarray
      • Input: binary mask (H, W) uint8 from YOLO, bounding box (x1, y1, x2, y2)
      • Crop mask to bbox region
      • Find vertical extent of foreground pixels (top/bottom rows with nonzero)
      • Crop to tight vertical bounding box (remove empty rows above/below)
      • Resize height to 64, maintaining aspect ratio
      • Center-crop or center-pad width to 64
      • Cut 10px from each side → final 64×44
      • Return float32 array [0.0, 1.0] (divide by 255)
      • Return None if mask area below MIN_MASK_AREA threshold (default: 500 pixels)
    • Function frame_to_person_mask(result, min_area: int = 500) -> tuple[UInt8[ndarray, 'h w'], tuple[int,int,int,int]] | None:
      • Extract single-person mask + bbox from YOLO result object
      • Uses result.masks.data and result.boxes.xyxy
      • Returns None if no valid detection
    • Constants: SIL_HEIGHT = 64, SIL_WIDTH = 44, SIL_FULL_WIDTH = 64, SIDE_CUT = 10, MIN_MASK_AREA = 500
    • Each step must match the preprocessing in datasets/pretreatment.py (grayscale → crop → resize → center) and BaseSilCuttingTransform (cut sides → /255)

    Must NOT do:

    • Don't import or modify datasets/pretreatment.py
    • Don't add color/texture features — binary silhouettes only
    • Don't resize to arbitrary sizes — must be exactly 64×44 output

    Recommended Agent Profile:

    • Category: deep
      • Reason: Correctness-critical — must exactly match training data preprocessing. Subtle pixel-level bugs will silently degrade accuracy.
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1, 2, 4)
    • Blocks: Tasks 5, 7, 9
    • Blocked By: None

    References:

    Pattern References:

    • datasets/pretreatment.py:18-96 (function imgs2pickle) — The canonical preprocessing pipeline. Study lines 45-80 carefully: cv2.imread(GRAYSCALE) → find contours → crop to person bbox → cv2.resize(img, (int(64 * ratio), 64)) → center-crop width. This is the EXACT sequence to replicate for live masks.
    • opengait/data/transform.py:46-58 (BaseSilCuttingTransform) — The runtime transform applied during training/eval. cutting = int(w // 64) * 10 then slices [:, :, cutting:-cutting] then divides by 255.0. For w=64 input, cutting=10, output width=44.

    API/Type References:

    • Ultralytics Results object: result.masks.dataTensor[N, H, W] binary masks; result.boxes.xyxyTensor[N, 4] bounding boxes; result.boxes.id → track IDs (may be None)

    WHY Each Reference Matters:

    • pretreatment.py: Defines the ground truth preprocessing. If our live pipeline differs by even 1 pixel of padding, ScoNet accuracy degrades.
    • BaseSilCuttingTransform: The 10px side cut + /255 is applied at runtime during training. We must apply the SAME transform.
    • Ultralytics masks: Need to know exact API to extract binary masks from YOLO output

    Acceptance Criteria:

    • opengait/demo/preprocess.py exists
    • mask_to_silhouette() returns np.ndarray of shape (64, 44) dtype float32 with values in [0, 1]
    • Returns None for masks below MIN_MASK_AREA

    QA Scenarios:

    Scenario: Preprocessing produces correct output shape and range
      Tool: Bash
      Preconditions: Module importable
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.preprocess import mask_to_silhouette
           # Create a synthetic mask: 200x100 person-shaped blob
           mask = np.zeros((480, 640), dtype=np.uint8)
           mask[100:400, 250:400] = 255  # person region
           bbox = (250, 100, 400, 400)
           sil = mask_to_silhouette(mask, bbox)
           assert sil is not None, 'Should not be None for valid mask'
           assert sil.shape == (64, 44), f'Bad shape: {sil.shape}'
           assert sil.dtype == np.float32, f'Bad dtype: {sil.dtype}'
           assert 0.0 <= sil.min() and sil.max() <= 1.0, f'Bad range: [{sil.min()}, {sil.max()}]'
           assert sil.max() > 0, 'Should have nonzero pixels'
           print('PREPROCESS_OK')
           ```
      Expected Result: Prints 'PREPROCESS_OK'
      Failure Indicators: Shape mismatch, dtype error, range error
      Evidence: .sisyphus/evidence/task-3-preprocess-shape.txt
    
    Scenario: Small masks are rejected
      Tool: Bash
      Preconditions: Module importable
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.preprocess import mask_to_silhouette
           # Tiny mask: only 10x10 = 100 pixels (below MIN_MASK_AREA=500)
           mask = np.zeros((480, 640), dtype=np.uint8)
           mask[100:110, 100:110] = 255
           bbox = (100, 100, 110, 110)
           sil = mask_to_silhouette(mask, bbox)
           assert sil is None, f'Should be None for tiny mask, got {type(sil)}'
           print('SMALL_MASK_REJECTED_OK')
           ```
      Expected Result: Prints 'SMALL_MASK_REJECTED_OK'
      Failure Indicators: Returns non-None for tiny mask
      Evidence: .sisyphus/evidence/task-3-small-mask-reject.txt
    

    Commit: YES

    • Message: feat(demo): add silhouette preprocessing module
    • Files: opengait/demo/preprocess.py
    • Pre-commit: uv run python -c "from opengait.demo.preprocess import mask_to_silhouette"
  • 4. Input Adapters (cv-mmap + OpenCV)

    What to do:

    • Create opengait/demo/input.py
    • The pipeline contract is simple: it consumes any Iterable[tuple[np.ndarray, dict]] — any generator or iterator that yields (frame_bgr_uint8, metadata_dict) works
    • Type alias: FrameStream = Iterable[tuple[np.ndarray, dict]]
    • Generator function opencv_source(path: str | int, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:
      • path can be video file path or camera index (int)
      • Opens cv2.VideoCapture(path)
      • Yields (frame, {'frame_count': int, 'timestamp_ns': int}) tuples
      • Handles end-of-video gracefully (just returns)
      • Handles camera disconnect (log warning, return)
      • Respects max_frames limit
    • Generator function cvmmap_source(name: str, max_frames: int | None = None) -> Generator[tuple[np.ndarray, dict], None, None]:
      • Wraps CvMmapClient from /home/crosstyan/Code/cv-mmap/client/cvmmap/
      • Since cv-mmap is async (anyio), this adapter must bridge async→sync:
        • Run anyio event loop in a background thread, drain frames via queue.Queue
        • Or use anyio.from_thread / asyncio.run() with async for internally
        • Choose simplest correct approach
      • Yields same (frame, metadata_dict) tuple format as opencv_source
      • Handles cv-mmap disconnect/offline events gracefully
      • Import should be conditional (try/except ImportError) so pipeline works without cv-mmap installed
    • Factory function create_source(source: str, max_frames: int | None = None) -> FrameStream:
      • If source starts with cvmmap://cvmmap_source(name)
      • If source is a digit string → opencv_source(int(source)) (camera index)
      • Otherwise → opencv_source(source) (file path)
    • The key design point: any user-written generator that yields (np.ndarray, dict) plugs in directly — no class inheritance needed

    Must NOT do:

    • Don't build GStreamer pipelines
    • Don't add async to the main pipeline loop — keep synchronous pull model
    • Don't use abstract base classes or heavy OOP — plain generator functions are the interface
    • Don't buffer frames internally (no unbounded queue between source and consumer)

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Integration with external library (cv-mmap) requires careful async→sync bridging
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1, 3, 4)
    • Blocks: Task 9
    • Blocked By: None

    References:

    Pattern References:

    • /home/crosstyan/Code/cv-mmap/client/cvmmap/__init__.pyCvMmapClient class. Async iterator: async for im, meta in client. Understand the __aiter__/__anext__ protocol.
    • /home/crosstyan/Code/cv-mmap/client/test_cvmmap.py — Example consumer pattern using anyio.run()
    • /home/crosstyan/Code/cv-mmap/client/cvmmap/msg.pyFrameMetadata and FrameInfo dataclasses. Fields: frame_count, timestamp_ns, info.width, info.height, info.pixel_format

    API/Type References:

    • cv2.VideoCapture — OpenCV video capture. cap.read() returns (bool, np.ndarray). cap.get(cv2.CAP_PROP_FRAME_COUNT) for total frames.

    WHY Each Reference Matters:

    • CvMmapClient: The async iterator yields (numpy_array, FrameMetadata) — need to know exact types for sync bridging
    • msg.py: Metadata fields must be mapped to our generic dict metadata format
    • test_cvmmap.py: Shows the canonical consumer pattern we must wrap

    Acceptance Criteria:

    • opengait/demo/input.py exists with opencv_source, cvmmap_source, create_source as functions (not classes)
    • create_source('./some/video.mp4') returns a generator/iterable
    • create_source('cvmmap://default') returns a generator (or raises if cv-mmap not installed)
    • create_source('0') returns a generator for camera index 0
    • Any custom generator def my_source(): yield (frame, meta) can be used directly by the pipeline

    QA Scenarios:

    Scenario: opencv_source reads frames from a video file
      Tool: Bash
      Preconditions: Any .mp4 or .avi file available (use sample video or create synthetic one)
      Steps:
        1. Create a short test video if none exists:
           `uv run python -c "import cv2; import numpy as np; w=cv2.VideoWriter('/tmp/test.avi', cv2.VideoWriter_fourcc(*'MJPG'), 15, (640,480)); [w.write(np.random.randint(0,255,(480,640,3),dtype=np.uint8)) for _ in range(30)]; w.release(); print('VIDEO_CREATED')"`
        2. Run `uv run python -c "`
           ```python
           from opengait.demo.input import create_source
           src = create_source('/tmp/test.avi', max_frames=10)
           count = 0
           for frame, meta in src:
               assert frame.shape[2] == 3, f'Not BGR: {frame.shape}'
               assert 'frame_count' in meta
               count += 1
           assert count == 10, f'Expected 10 frames, got {count}'
           print('OPENCV_SOURCE_OK')
           ```
      Expected Result: Prints 'OPENCV_SOURCE_OK'
      Failure Indicators: Shape error, missing metadata, wrong frame count
      Evidence: .sisyphus/evidence/task-2-opencv-source.txt
    
    Scenario: Custom generator works as pipeline input
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.input import FrameStream
           import typing
           # Any generator works — no class needed
           def my_source():
               for i in range(5):
                   yield np.zeros((480, 640, 3), dtype=np.uint8), {'frame_count': i}
           src = my_source()
           frames = list(src)
           assert len(frames) == 5
           print('CUSTOM_GENERATOR_OK')
           ```
      Expected Result: Prints 'CUSTOM_GENERATOR_OK'
      Failure Indicators: Type error, protocol mismatch
      Evidence: .sisyphus/evidence/task-2-custom-gen.txt
    

    Commit: YES

    • Message: feat(demo): add generator-based input adapters for cv-mmap and OpenCV
    • Files: opengait/demo/input.py
    • Pre-commit: uv run python -c "from opengait.demo.input import create_source"
  • 5. Sliding Window / Ring Buffer Manager

    What to do:

    • Create opengait/demo/window.py
    • Class SilhouetteWindow:
      • Constructor: __init__(self, window_size: int = 30, stride: int = 1, gap_threshold: int = 15)
      • Internal storage: collections.deque(maxlen=window_size) of np.ndarray (64×44 float32)
      • push(sil: np.ndarray, frame_idx: int, track_id: int) -> None:
        • If track_id differs from current tracked ID → reset buffer, update tracked ID
        • If frame_idx - last_frame_idx > gap_threshold → reset buffer (too many missed frames)
        • Append silhouette to deque
        • Increment internal frame counter
      • is_ready() -> bool: returns len(buffer) == window_size
      • should_classify() -> bool: returns is_ready() and (frames_since_last_classify >= stride)
      • get_tensor(device: str = 'cpu') -> torch.Tensor:
        • Stack buffer into np.array shape [window_size, 64, 44]
        • Convert to torch.Tensor shape [1, 1, window_size, 64, 44] on device
        • This is the exact input shape for ScoNetDemo
      • reset() -> None: clear buffer and counters
      • mark_classified() -> None: reset frames_since_last_classify counter
      • Properties: current_track_id, frame_count, fill_level (len/window_size as float)
    • Single-person selection policy (function or small helper):
      • select_person(results) -> tuple[np.ndarray, tuple, int] | None
      • From YOLO results, select the detection with the largest bounding box area
      • Return (mask, bbox, track_id) or None if no valid detection
      • If result.boxes.id is None (tracker not yet initialized), skip frame

    Must NOT do:

    • No unbounded buffers — deque with maxlen enforces this
    • No multi-person tracking — single person only, select largest bbox
    • No time-based windowing — frame-count based only

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Data structure + policy logic with edge cases (ID changes, gaps, resets)
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 6, 7, 8)
    • Blocks: Tasks 9, 10
    • Blocked By: Task 3 (needs silhouette shape constants from preprocess.py)

    References:

    Pattern References:

    • opengait/demo/preprocess.py (Task 3) — SIL_HEIGHT, SIL_WIDTH constants. The window stores arrays of this shape.
    • opengait/data/dataset.py — Shows how OpenGait's DataSet samples fixed-length sequences. The seqL parameter controls sequence length (our window_size=30).

    API/Type References:

    • Ultralytics Results.boxes.id — Track IDs tensor, may be None if tracker hasn't assigned IDs yet
    • Ultralytics Results.boxes.xyxy — Bounding boxes [N, 4] for area calculation
    • Ultralytics Results.masks.data — Binary masks [N, H, W]

    WHY Each Reference Matters:

    • preprocess.py: Window must store silhouettes of the exact shape produced by preprocessing
    • dataset.py: Understanding how training samples sequences helps ensure our window matches
    • Ultralytics API: Need to handle None track IDs and extract correct tensors

    Acceptance Criteria:

    • opengait/demo/window.py exists with SilhouetteWindow class and select_person function
    • Buffer is bounded (deque with maxlen)
    • get_tensor() returns shape [1, 1, 30, 64, 44] when full
    • Track ID change triggers reset
    • Gap exceeding threshold triggers reset

    QA Scenarios:

    Scenario: Window fills and produces correct tensor shape
      Tool: Bash
      Preconditions: Module importable
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.window import SilhouetteWindow
           win = SilhouetteWindow(window_size=30, stride=1)
           for i in range(30):
               sil = np.random.rand(64, 44).astype(np.float32)
               win.push(sil, frame_idx=i, track_id=1)
           assert win.is_ready(), 'Window should be ready after 30 frames'
           t = win.get_tensor()
           assert t.shape == (1, 1, 30, 64, 44), f'Bad shape: {t.shape}'
           assert t.dtype.is_floating_point, f'Bad dtype: {t.dtype}'
           print('WINDOW_FILL_OK')
           ```
      Expected Result: Prints 'WINDOW_FILL_OK'
      Failure Indicators: Shape mismatch, not ready after 30 pushes
      Evidence: .sisyphus/evidence/task-5-window-fill.txt
    
    Scenario: Track ID change resets buffer
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.window import SilhouetteWindow
           win = SilhouetteWindow(window_size=30)
           for i in range(20):
               win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=i, track_id=1)
           assert win.frame_count == 20
           # Switch track ID — should reset
           win.push(np.random.rand(64, 44).astype(np.float32), frame_idx=20, track_id=2)
           assert win.frame_count == 1, f'Should reset to 1, got {win.frame_count}'
           assert win.current_track_id == 2
           print('TRACK_RESET_OK')
           ```
      Expected Result: Prints 'TRACK_RESET_OK'
      Failure Indicators: Buffer not reset, wrong track ID
      Evidence: .sisyphus/evidence/task-5-track-reset.txt
    

    Commit: YES

    • Message: feat(demo): add sliding window manager with single-person selection
    • Files: opengait/demo/window.py
    • Pre-commit: uv run python -c "from opengait.demo.window import SilhouetteWindow"
  • 6. NATS JSON Publisher

    What to do:

    • Create opengait/demo/output.py
    • Class ResultPublisher(Protocol) — any object with publish(result: dict) -> None
    • Function console_publisher() -> Generator or simple class ConsolePublisher:
      • Prints JSON to stdout (default when --nats-url is not provided)
      • Format: one JSON object per line (JSONL)
    • Class NatsPublisher:
      • Constructor: __init__(self, url: str = 'nats://127.0.0.1:4222', subject: str = 'scoliosis.result')
      • Uses nats-py async client, bridged to sync publish() method
      • Handles connection failures gracefully (log warning, retry with backoff, don't crash pipeline)
      • Handles reconnection automatically (nats-py does this by default)
      • publish(result: dict) -> None: serializes to JSON, publishes to subject
      • close() -> None: drain and close NATS connection
      • Context manager support (__enter__/__exit__)
    • JSON schema for results:
      {
        "frame": 1234,
        "track_id": 1,
        "label": "positive",
        "confidence": 0.82,
        "window": 30,
        "timestamp_ns": 1234567890000
      }
      
    • Factory: create_publisher(nats_url: str | None, subject: str = 'scoliosis.result') -> ResultPublisher
      • If nats_url is None → ConsolePublisher
      • Otherwise → NatsPublisher(url, subject)

    Must NOT do:

    • Don't use JetStream (plain NATS PUB/SUB is sufficient)
    • Don't build custom binary protocol
    • Don't buffer/batch results — publish immediately

    Recommended Agent Profile:

    • Category: quick
      • Reason: Straightforward JSON serialization + NATS client wrapper, well-documented library
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 5, 7, 8)
    • Blocks: Tasks 9, 13
    • Blocked By: Task 1 (needs project scaffolding for nats-py dependency)

    References:

    External References:

    • nats-py docs: import nats; nc = await nats.connect(); await nc.publish(subject, data) — async API
    • /home/crosstyan/Code/cv-mmap-gui/ — Uses NATS.c for messaging; our Python publisher sends to the same broker

    WHY Each Reference Matters:

    • nats-py: Need to bridge async NATS client to sync publish() call
    • cv-mmap-gui: Confirms NATS is the right transport for this ecosystem

    Acceptance Criteria:

    • opengait/demo/output.py exists with ConsolePublisher, NatsPublisher, create_publisher
    • ConsolePublisher prints valid JSON to stdout
    • NatsPublisher connects and publishes without crashing (when NATS available)
    • NatsPublisher logs warning and doesn't crash when NATS unavailable

    QA Scenarios:

    Scenario: ConsolePublisher outputs valid JSONL
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           import json, io, sys
           from opengait.demo.output import create_publisher
           pub = create_publisher(nats_url=None)
           result = {'frame': 100, 'track_id': 1, 'label': 'positive', 'confidence': 0.85, 'window': 30, 'timestamp_ns': 0}
           pub.publish(result)  # should print to stdout
           print('CONSOLE_PUB_OK')
           ```
      Expected Result: Prints a JSON line followed by 'CONSOLE_PUB_OK'
      Failure Indicators: Invalid JSON, missing fields, crash
      Evidence: .sisyphus/evidence/task-6-console-pub.txt
    
    Scenario: NatsPublisher handles missing server gracefully
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           from opengait.demo.output import create_publisher
           try:
               pub = create_publisher(nats_url='nats://127.0.0.1:14222')  # wrong port, no server
               pub.publish({'frame': 0, 'label': 'test'})
           except SystemExit:
               print('SHOULD_NOT_EXIT')
               raise
           print('NATS_GRACEFUL_OK')
           ```
      Expected Result: Prints 'NATS_GRACEFUL_OK' (warning logged, no crash)
      Failure Indicators: Unhandled exception, SystemExit, hang
      Evidence: .sisyphus/evidence/task-6-nats-graceful.txt
    

    Commit: YES

    • Message: feat(demo): add NATS JSON publisher and console fallback
    • Files: opengait/demo/output.py
    • Pre-commit: uv run python -c "from opengait.demo.output import create_publisher"
  • 7. Unit Tests — Silhouette Preprocessing

    What to do:

    • Create tests/demo/test_preprocess.py
    • Test mask_to_silhouette() with:
      • Valid mask (person-shaped blob) → output shape (64, 44), dtype float32, range [0, 1]
      • Tiny mask below MIN_MASK_AREA → returns None
      • Empty mask (all zeros) → returns None
      • Full-frame mask (all 255) → produces valid output (edge case: very wide person)
      • Tall narrow mask → verify aspect ratio handling (resize h=64, center-crop width)
      • Wide short mask → verify handling (should still produce 64×44)
    • Test determinism: same input always produces same output
    • Test against a reference .pkl sample if available:
      • Load a known .pkl file from Scoliosis1K
      • Extract one frame
      • Compare our preprocessing output to the stored frame (should be close/identical)
    • Verify jaxtyping annotations are present and beartype checks fire on wrong shapes

    Must NOT do:

    • Don't test YOLO integration here — only test the mask_to_silhouette function in isolation
    • Don't require GPU — all preprocessing is CPU numpy ops

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Must verify pixel-level correctness against training data contract, multiple edge cases
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 5, 6, 8)
    • Blocks: None (verification task)
    • Blocked By: Task 3 (preprocess module must exist)

    References:

    Pattern References:

    • opengait/demo/preprocess.py (Task 3) — The module under test
    • datasets/pretreatment.py:18-96 — Reference preprocessing to validate against
    • opengait/data/transform.py:46-58BaseSilCuttingTransform for expected output contract

    WHY Each Reference Matters:

    • preprocess.py: Direct test target
    • pretreatment.py: Ground truth for what a correct silhouette looks like
    • BaseSilCuttingTransform: Defines the 64→44 cut + /255 contract we must match

    Acceptance Criteria:

    • tests/demo/test_preprocess.py exists with ≥5 test cases
    • uv run pytest tests/demo/test_preprocess.py -q passes
    • Tests cover: valid mask, tiny mask, empty mask, determinism

    QA Scenarios:

    Scenario: All preprocessing tests pass
      Tool: Bash
      Preconditions: Task 3 (preprocess.py) is complete
      Steps:
        1. Run `uv run pytest tests/demo/test_preprocess.py -v`
      Expected Result: All tests pass (≥5 tests), exit code 0
      Failure Indicators: Any assertion failure, import error
      Evidence: .sisyphus/evidence/task-7-preprocess-tests.txt
    
    Scenario: Jaxtyping annotation enforcement works
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           import numpy as np
           from opengait.demo.preprocess import mask_to_silhouette
           # Intentionally wrong type to verify beartype catches it
           try:
               mask_to_silhouette('not_an_array', (0, 0, 10, 10))
               print('BEARTYPE_MISSED')  # should not reach here
           except Exception as e:
               if 'beartype' in type(e).__module__ or 'BeartypeCallHint' in type(e).__name__:
                   print('BEARTYPE_OK')
               else:
                   print(f'WRONG_ERROR: {type(e).__name__}: {e}')
           ```
      Expected Result: Prints 'BEARTYPE_OK'
      Failure Indicators: Prints 'BEARTYPE_MISSED' or 'WRONG_ERROR'
      Evidence: .sisyphus/evidence/task-7-beartype-check.txt
    

    Commit: YES (groups with Task 8)

    • Message: test(demo): add preprocessing and model unit tests
    • Files: tests/demo/test_preprocess.py
    • Pre-commit: uv run pytest tests/demo/test_preprocess.py -q
  • 8. Unit Tests — ScoNetDemo Forward Pass

    What to do:

    • Create tests/demo/test_sconet_demo.py
    • Test ScoNetDemo construction:
      • Loads config from YAML
      • Loads checkpoint weights
      • Model is in eval mode
    • Test forward() with dummy tensor:
      • Input: torch.rand(1, 1, 30, 64, 44) on available device
      • Output logits shape: (1, 3, 16)
      • Output dtype: float32
    • Test predict() convenience method:
      • Returns (label_str, confidence_float)
      • label_str is one of {'negative', 'neutral', 'positive'}
      • confidence is in [0.0, 1.0]
    • Test with various batch sizes: N=1, N=2
    • Test with various sequence lengths if model supports it (should work with 30)
    • Verify no torch.distributed calls are made (mock torch.distributed to raise if called)
    • Verify jaxtyping shape annotations on forward/predict signatures

    Must NOT do:

    • Don't test with real video data — dummy tensors only for unit tests
    • Don't modify the checkpoint

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Requires CUDA, checkpoint loading, shape validation — moderate complexity
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 5, 6, 7)
    • Blocks: None (verification task)
    • Blocked By: Task 2 (ScoNetDemo must exist)

    References:

    Pattern References:

    • opengait/demo/sconet_demo.py (Task 1) — The module under test
    • opengait/evaluation/evaluator.py:evaluate_scoliosis() (line ~418) — Canonical prediction logic to validate against

    Config/Checkpoint References:

    • configs/sconet/sconet_scoliosis1k.yaml — Config file to pass to ScoNetDemo
    • ./ckpt/ScoNet-20000.pt — Trained checkpoint

    WHY Each Reference Matters:

    • sconet_demo.py: Direct test target
    • evaluator.py: Defines expected prediction behavior (argmax of mean logits)

    Acceptance Criteria:

    • tests/demo/test_sconet_demo.py exists with ≥4 test cases
    • uv run pytest tests/demo/test_sconet_demo.py -q passes
    • Tests cover: construction, forward shape, predict output, no-DDP enforcement

    QA Scenarios:

    Scenario: All ScoNetDemo tests pass
      Tool: Bash
      Preconditions: Task 1 (sconet_demo.py) is complete, checkpoint exists, CUDA available
      Steps:
        1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_sconet_demo.py -v`
      Expected Result: All tests pass (≥4 tests), exit code 0
      Failure Indicators: state_dict key mismatch, shape error, CUDA OOM
      Evidence: .sisyphus/evidence/task-8-sconet-tests.txt
    
    Scenario: No DDP leakage in ScoNetDemo
      Tool: Bash
      Steps:
        1. Run `grep -rn 'torch.distributed' opengait/demo/sconet_demo.py`
        2. Run `grep -rn 'ddp_all_gather\|get_ddp_module\|get_rank' opengait/demo/sconet_demo.py`
      Expected Result: Both commands produce no output (exit code 1 = no matches)
      Failure Indicators: Any match found
      Evidence: .sisyphus/evidence/task-8-no-ddp.txt
    

    Commit: YES (groups with Task 7)

    • Message: test(demo): add preprocessing and model unit tests
    • Files: tests/demo/test_sconet_demo.py
    • Pre-commit: uv run pytest tests/demo/test_sconet_demo.py -q
  • 9. Main Pipeline Application + CLI

    What to do:

    • Create opengait/demo/pipeline.py — the main orchestrator
    • Create opengait/demo/__main__.py — CLI entry point (replace stub from Task 4)
    • Pipeline class ScoliosisPipeline:
      • Constructor: __init__(self, source: FrameStream, model: ScoNetDemo, publisher: ResultPublisher, window: SilhouetteWindow, yolo_model_path: str = 'yolo11n-seg.pt', device: str = 'cuda:0')
      • Uses jaxtyping annotations for all tensor-bearing methods:
        from jaxtyping import Float, UInt8, jaxtyped
        from beartype import beartype
        from torch import Tensor
        import numpy as np
        from numpy import ndarray
        
      • run() -> None — main loop:
        1. Load YOLO model: ultralytics.YOLO(yolo_model_path)
        2. For each (frame, meta) from source: a. Run yolo_model.track(frame, persist=True, verbose=False) → results b. select_person(results)(mask, bbox, track_id) or None → skip if None c. mask_to_silhouette(mask, bbox)sil or None → skip if None d. window.push(sil, meta['frame_count'], track_id) e. If window.should_classify():
          • tensor = window.get_tensor(device=self.device)
          • label, confidence = self.model.predict(tensor)
          • publisher.publish({...}) with JSON schema fields
          • window.mark_classified()
        3. Log FPS every 100 frames
        4. Cleanup on exit (close publisher, release resources)
      • Graceful shutdown on KeyboardInterrupt / SIGTERM
    • CLI via __main__.py using click:
      • --source (required): video path, camera index, or cvmmap://name
      • --checkpoint (required): path to ScoNet checkpoint
      • --config (default: ./configs/sconet/sconet_scoliosis1k.yaml): ScoNet config YAML
      • --device (default: cuda:0): torch device
      • --yolo-model (default: yolo11n-seg.pt): YOLO model path (auto-downloads)
      • --window (default: 30): sliding window size
      • --stride (default: 30): classify every N frames after window is full
      • --nats-url (default: None): NATS server URL, None = console output
      • --nats-subject (default: scoliosis.result): NATS subject
      • --max-frames (default: None): stop after N frames
      • --help: print usage
    • Entrypoint: uv run python -m opengait.demo ...

    Must NOT do:

    • No async in the main loop — synchronous pull-process-publish
    • No multi-threading for inference — single-threaded pipeline
    • No GUI / frame display / cv2.imshow
    • No unbounded accumulation — ring buffer handles memory
    • No auto-download of ScoNet checkpoint — user must provide path

    Recommended Agent Profile:

    • Category: deep
      • Reason: Integration of all components, error handling, CLI design, FPS logging — largest single task
    • Skills: []

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Wave 3 (sequential — depends on most Wave 1+2 tasks)
    • Blocks: Tasks 12, 13
    • Blocked By: Tasks 2, 3, 4, 5, 6 (all components must exist)

    References:

    Pattern References:

    • opengait/demo/sconet_demo.py (Task 1) — ScoNetDemo class, predict() method
    • opengait/demo/preprocess.py (Task 3) — mask_to_silhouette(), frame_to_person_mask()
    • opengait/demo/window.py (Task 5) — SilhouetteWindow, select_person()
    • opengait/demo/input.py (Task 2) — create_source(), FrameStream type alias
    • opengait/demo/output.py (Task 6) — create_publisher(), ResultPublisher

    External References:

    • Ultralytics tracking API: model.track(frame, persist=True) — returns Results list
    • Ultralytics result object: results[0].masks.data, results[0].boxes.xyxy, results[0].boxes.id

    WHY Each Reference Matters:

    • All Task refs: This task composes every component — must know each API surface
    • Ultralytics: The YOLO .track() call is the only external API used directly in this file

    Acceptance Criteria:

    • opengait/demo/pipeline.py exists with ScoliosisPipeline class
    • opengait/demo/__main__.py exists with click CLI
    • uv run python -m opengait.demo --help prints usage without errors
    • All public methods have jaxtyping annotations where tensor/array args are involved

    QA Scenarios:

    Scenario: CLI --help works
      Tool: Bash
      Steps:
        1. Run `uv run python -m opengait.demo --help`
      Expected Result: Prints usage showing --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames
      Failure Indicators: ImportError, missing arguments, crash
      Evidence: .sisyphus/evidence/task-9-help.txt
    
    Scenario: Pipeline runs with sample video (no NATS)
      Tool: Bash
      Preconditions: sample.mp4 exists (from Task 11), checkpoint exists, CUDA available
      Steps:
        1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --device cuda:0 --max-frames 120 2>&1 | tee /tmp/pipeline_output.txt`
        2. Count prediction lines: `grep -c '"label"' /tmp/pipeline_output.txt`
      Expected Result: Exit code 0, at least 1 prediction line with valid JSON containing "label" field
      Failure Indicators: Crash, no predictions, invalid JSON, CUDA error
      Evidence: .sisyphus/evidence/task-9-pipeline-run.txt
    
    Scenario: Pipeline handles missing video gracefully
      Tool: Bash
      Steps:
        1. Run `uv run python -m opengait.demo --source /nonexistent/video.mp4 --checkpoint ./ckpt/ScoNet-20000.pt 2>&1; echo "EXIT_CODE=$?"`
      Expected Result: Prints error message about missing file, exits with non-zero code (no traceback dump)
      Failure Indicators: Unhandled exception with full traceback, exit code 0
      Evidence: .sisyphus/evidence/task-9-missing-video.txt
    

    Commit: YES

    • Message: feat(demo): add main pipeline application with CLI entry point
    • Files: opengait/demo/pipeline.py, opengait/demo/__main__.py
    • Pre-commit: uv run python -m opengait.demo --help
  • 10. Unit Tests — Single-Person Policy + Window Reset

    What to do:

    • Create tests/demo/test_window.py
    • Test SilhouetteWindow:
      • Fill to capacity → is_ready() returns True
      • Underfilled → is_ready() returns False
      • Track ID change resets buffer
      • Frame gap exceeding threshold resets buffer
      • get_tensor() returns correct shape [1, 1, window_size, 64, 44]
      • should_classify() respects stride
    • Test select_person():
      • Single detection → returns it
      • Multiple detections → returns largest bbox area
      • No detections → returns None
      • Detections without track IDs (tracker not initialized) → returns None
    • Use mock YOLO results (don't require actual YOLO model)

    Must NOT do:

    • Don't require GPU — window tests are CPU-only (get_tensor can use cpu device)
    • Don't require YOLO model file — mock the results

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Multiple edge cases to cover, mock YOLO results require understanding ultralytics API
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 9, 11)
    • Blocks: None (verification task)
    • Blocked By: Task 5 (window module must exist)

    References:

    Pattern References:

    • opengait/demo/window.py (Task 5) — Module under test

    WHY Each Reference Matters:

    • Direct test target

    Acceptance Criteria:

    • tests/demo/test_window.py exists with ≥6 test cases
    • uv run pytest tests/demo/test_window.py -q passes

    QA Scenarios:

    Scenario: All window and single-person tests pass
      Tool: Bash
      Steps:
        1. Run `uv run pytest tests/demo/test_window.py -v`
      Expected Result: All tests pass (≥6 tests), exit code 0
      Failure Indicators: Assertion failures, import errors
      Evidence: .sisyphus/evidence/task-10-window-tests.txt
    

    Commit: YES

    • Message: test(demo): add window manager and single-person policy tests
    • Files: tests/demo/test_window.py
    • Pre-commit: uv run pytest tests/demo/test_window.py -q
  • 11. Sample Video for Smoke Testing

    What to do:

    • Acquire or create a short sample video for pipeline smoke testing
    • Options (in order of preference):
      1. Extract a short clip (5-10 seconds, ~120 frames at 15fps) from an existing Scoliosis1K raw video if accessible
      2. Record a short clip using webcam via cv2.VideoCapture(0)
      3. Generate a synthetic video with a person-shaped blob moving across frames
    • Save to ./assets/sample.mp4 (or ./assets/sample.avi)
    • Requirements: contains at least one person walking, 720p or lower, ≥60 frames
    • If no real video is available, create a synthetic one:
      • 120 frames, 640×480, 15fps
      • White rectangle (simulating person silhouette) moving across dark background
      • This won't test YOLO detection quality but will verify pipeline doesn't crash
    • Add assets/sample.mp4 to .gitignore if it's large (>10MB)

    Must NOT do:

    • Don't use any Scoliosis1K dataset files that are symlinked (user constraint)
    • Don't commit large video files to git

    Recommended Agent Profile:

    • Category: quick
      • Reason: Simple file creation/acquisition task
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 9, 10)
    • Blocks: Task 12
    • Blocked By: Task 1 (needs OpenCV dependency from scaffolding)

    References: None needed — standalone task

    Acceptance Criteria:

  • ./assets/sample.mp4 (or .avi) exists

  • Video has ≥60 frames

  • Playable with uv run python -c "import cv2; cap=cv2.VideoCapture('./assets/sample.mp4'); print(f'frames={int(cap.get(7))}'); cap.release()"

    QA Scenarios:

    Scenario: Sample video is valid
      Tool: Bash
      Steps:
        1. Run `uv run python -c "`
           ```python
           import cv2
           cap = cv2.VideoCapture('./assets/sample.mp4')
           assert cap.isOpened(), 'Cannot open video'
           n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
           assert n >= 60, f'Too few frames: {n}'
           ret, frame = cap.read()
           assert ret and frame is not None, 'Cannot read first frame'
           h, w = frame.shape[:2]
           assert h >= 240 and w >= 320, f'Too small: {w}x{h}'
           cap.release()
           print(f'SAMPLE_VIDEO_OK frames={n} size={w}x{h}')
           ```
      Expected Result: Prints 'SAMPLE_VIDEO_OK' with frame count ≥60
      Failure Indicators: Cannot open, too few frames, too small
      Evidence: .sisyphus/evidence/task-11-sample-video.txt
    

    Commit: YES

    • Message: chore(demo): add sample video for smoke testing
    • Files: assets/sample.mp4 (or add to .gitignore and document)
    • Pre-commit: none

  • 12. Integration Tests — End-to-End Smoke Test

    What to do:

    • Create tests/demo/test_pipeline.py
    • Integration test: run the full pipeline with sample video, no NATS
      • Uses subprocess.run() to invoke python -m opengait.demo
      • Captures stdout, parses JSON predictions
      • Asserts: exit code 0, ≥1 prediction, valid JSON schema
    • Test graceful exit on end-of-video
    • Test --max-frames flag: run with max_frames=60, verify it stops
    • Test error handling: invalid source path → non-zero exit, error message
    • Test error handling: invalid checkpoint path → non-zero exit, error message
    • FPS benchmark (informational, not a hard assertion):
      • Run pipeline on sample video, measure wall time, compute FPS
      • Log FPS to evidence file (target: ≥15 FPS on desktop GPU)

    Must NOT do:

    • Don't require NATS server for this test — use console publisher
    • Don't hardcode CUDA device — use --device cuda:0 only if CUDA available, else skip

    Recommended Agent Profile:

    • Category: deep
      • Reason: Full integration test requiring all components working together, subprocess management, JSON parsing
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 4 (with Task 13)
    • Blocks: F1-F4 (Final verification)
    • Blocked By: Tasks 9 (pipeline), 11 (sample video)

    References:

    Pattern References:

    • opengait/demo/__main__.py (Task 9) — CLI flags to invoke
    • opengait/demo/output.py (Task 6) — JSON schema to validate

    WHY Each Reference Matters:

    • __main__.py: Need exact CLI flag names for subprocess invocation
    • output.py: Need JSON schema to assert against

    Acceptance Criteria:

    • tests/demo/test_pipeline.py exists with ≥4 test cases
    • CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -q passes
    • Tests cover: happy path, max-frames, invalid source, invalid checkpoint

    QA Scenarios:

    Scenario: Full pipeline integration test passes
      Tool: Bash
      Preconditions: All components built, sample video exists, CUDA available
      Steps:
        1. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -v --timeout=120`
      Expected Result: All tests pass (≥4), exit code 0
      Failure Indicators: Subprocess crash, JSON parse error, timeout
      Evidence: .sisyphus/evidence/task-12-integration.txt
    
    Scenario: FPS benchmark
      Tool: Bash
      Steps:
        1. Run `CUDA_VISIBLE_DEVICES=0 uv run python -c "`
           ```python
           import subprocess, time
           start = time.monotonic()
           result = subprocess.run(
               ['uv', 'run', 'python', '-m', 'opengait.demo',
                '--source', './assets/sample.mp4',
                '--checkpoint', './ckpt/ScoNet-20000.pt',
                '--device', 'cuda:0', '--nats-url', ''],
               capture_output=True, text=True, timeout=120)
           elapsed = time.monotonic() - start
           import cv2
           cap = cv2.VideoCapture('./assets/sample.mp4')
           n_frames = int(cap.get(7)); cap.release()
           fps = n_frames / elapsed if elapsed > 0 else 0
           print(f'FPS_BENCHMARK frames={n_frames} elapsed={elapsed:.1f}s fps={fps:.1f}')
           assert fps >= 5, f'FPS too low: {fps}'  # conservative threshold
           ```
      Expected Result: Prints FPS benchmark, ≥5 FPS (conservative)
      Failure Indicators: Timeout, crash, FPS < 5
      Evidence: .sisyphus/evidence/task-12-fps-benchmark.txt
    

    Commit: YES

    • Message: test(demo): add integration and end-to-end smoke tests
    • Files: tests/demo/test_pipeline.py
    • Pre-commit: CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_pipeline.py -q
  • 13. NATS Integration Test

    What to do:

    • Create tests/demo/test_nats.py
    • Test requires NATS server (use Docker: docker run -d --rm --name nats-test -p 4222:4222 nats:2)
    • Mark tests with @pytest.mark.skipif if Docker/NATS not available
    • Test flow:
      1. Start NATS container
      2. Start a nats-py subscriber on scoliosis.result
      3. Run pipeline with --nats-url nats://127.0.0.1:4222 --max-frames 60
      4. Collect received messages
      5. Assert: ≥1 message received, valid JSON, correct schema
      6. Stop NATS container
    • Test NatsPublisher reconnection: start pipeline, stop NATS, restart NATS — pipeline should recover
    • JSON schema validation:
      • frame: int
      • track_id: int
      • label: str in {"negative", "neutral", "positive"}
      • confidence: float in [0, 1]
      • window: int (should equal window_size)
      • timestamp_ns: int

    Must NOT do:

    • Don't leave Docker containers running after test
    • Don't hardcode NATS port — use a fixture that finds an open port

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Docker orchestration + async NATS subscriber + subprocess pipeline — moderate complexity
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 4 (with Task 12)
    • Blocks: F1-F4 (Final verification)
    • Blocked By: Tasks 9 (pipeline), 6 (NATS publisher)

    References:

    Pattern References:

    • opengait/demo/output.py (Task 6) — NatsPublisher class, JSON schema

    External References:

    • nats-py subscriber: sub = await nc.subscribe('scoliosis.result'); msg = await sub.next_msg(timeout=10)
    • Docker NATS: docker run -d --rm --name nats-test -p 4222:4222 nats:2

    WHY Each Reference Matters:

    • output.py: Need to match the exact subject and JSON schema the publisher produces
    • nats-py: Need subscriber API to consume and validate messages

    Acceptance Criteria:

    • tests/demo/test_nats.py exists with ≥2 test cases
    • Tests are skippable when Docker/NATS not available
    • CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -q passes (when Docker available)

    QA Scenarios:

    Scenario: NATS receives valid prediction JSON
      Tool: Bash
      Preconditions: Docker available, CUDA available, sample video exists
      Steps:
        1. Run `docker run -d --rm --name nats-test -p 4222:4222 nats:2`
        2. Run `CUDA_VISIBLE_DEVICES=0 uv run pytest tests/demo/test_nats.py -v --timeout=60`
        3. Run `docker stop nats-test`
      Expected Result: Tests pass, at least one valid JSON message received on scoliosis.result
      Failure Indicators: No messages, invalid JSON, schema mismatch, timeout
      Evidence: .sisyphus/evidence/task-13-nats-integration.txt
    
    Scenario: NATS test is skipped when Docker unavailable
      Tool: Bash
      Preconditions: Docker NOT running or not installed
      Steps:
        1. Run `uv run pytest tests/demo/test_nats.py -v -k 'nats' 2>&1 | head -20`
      Expected Result: Tests show as SKIPPED (not FAILED)
      Failure Indicators: Test fails instead of skipping
      Evidence: .sisyphus/evidence/task-13-nats-skip.txt
    

    Commit: YES

    • Message: test(demo): add NATS integration tests
    • Files: tests/demo/test_nats.py
    • Pre-commit: uv run pytest tests/demo/test_nats.py -q (skips if no Docker)

Final Verification Wave (MANDATORY — after ALL implementation tasks)

4 review agents run in PARALLEL. ALL must APPROVE. Rejection → fix → re-run.

  • F1. Plan Compliance Auditoracle Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, run command). For each "Must NOT Have": search codebase for forbidden patterns (torch.distributed imports in demo/, BaseModel subclassing). Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan. Output: Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT

  • F2. Code Quality Reviewunspecified-high Run linter + uv run pytest tests/demo/ -q. Review all new files in opengait/demo/ for: as any/type:ignore, empty catches, print statements used instead of logging, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic variable names. Output: Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT

  • F3. Real Manual QAunspecified-high Start from clean state. Run pipeline with sample video: uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120. Verify predictions are printed to console (no --nats-url = console output). Run with NATS: start container, run pipeline with --nats-url nats://127.0.0.1:4222, subscribe and validate JSON schema. Test edge cases: missing video file (graceful error), no checkpoint (graceful error), --help flag. Output: Scenarios [N/N pass] | Edge Cases [N tested] | VERDICT

  • F4. Scope Fidelity Checkdeep For each task: read "What to do", read actual files created. Verify 1:1 — everything in spec was built (no missing), nothing beyond spec was built (no creep). Check "Must NOT do" compliance: no torch.distributed in demo/, no BaseModel subclass, no TensorRT code, no multi-person logic. Flag unaccounted changes. Output: Tasks [N/N compliant] | Scope [CLEAN/N issues] | VERDICT


Commit Strategy

  • Wave 1: feat(demo): add ScoNetDemo inference wrapper — sconet_demo.py
  • Wave 1: feat(demo): add input adapters and silhouette preprocessing — input.py, preprocess.py
  • Wave 1: chore(demo): scaffold demo package and test infrastructure — __init__.py, conftest, pyproject.toml
  • Wave 2: feat(demo): add sliding window manager and NATS publisher — window.py, output.py
  • Wave 2: test(demo): add preprocessing and model unit tests — test_preprocess.py, test_sconet_demo.py
  • Wave 3: feat(demo): add main pipeline application with CLI — pipeline.py, __main__.py
  • Wave 3: test(demo): add window manager and single-person policy tests — test_window.py
  • Wave 4: test(demo): add integration and NATS tests — test_pipeline.py, test_nats.py

Success Criteria

Verification Commands

# Smoke test (no NATS)
uv run python -m opengait.demo --source ./assets/sample.mp4 --checkpoint ./ckpt/ScoNet-20000.pt --max-frames 120
# Expected: exits 0, prints ≥3 prediction lines with label in {negative, neutral, positive}

# Unit tests
uv run pytest tests/demo/ -q
# Expected: all tests pass

# Help flag
uv run python -m opengait.demo --help
# Expected: prints usage with --source, --checkpoint, --device, --window, --stride, --nats-url, --max-frames

Final Checklist

  • All "Must Have" present
  • All "Must NOT Have" absent
  • All tests pass
  • Pipeline runs at ≥15 FPS on desktop GPU
  • JSON schema matches spec
  • No torch.distributed imports in opengait/demo/