Files
OpenGait/.sisyphus/notepads/demo-pipeline-schema/learnings.md
T
2026-02-27 17:47:55 +08:00

7.1 KiB

Demo Pipeline Schema and Contracts

Overview

This document describes the input/output schema, flags/arguments, and positive detection indicators for the OpenGait demo pipeline (ScoliosisPipeline).

Source Files

  • Pipeline: /home/crosstyan/Code/OpenGait/opengait/demo/pipeline.py
  • Input adapters: /home/crosstyan/Code/OpenGait/opengait/demo/input.py
  • Output publishers: /home/crosstyan/Code/OpenGait/opengait/demo/output.py
  • Window management: /home/crosstyan/Code/OpenGait/opengait/demo/window.py
  • Classifier: /home/crosstyan/Code/OpenGait/opengait/demo/sconet_demo.py

Input Schema

Video Source (--source)

The source parameter accepts three formats (validated in validate_runtime_inputs()):

  1. Camera index: Single digit string (e.g., "0", "1") - uses OpenCV VideoCapture
  2. cv-mmap shared memory: cvmmap://<name> - uses shared memory stream (e.g., cvmmap://default)
  3. Video file path: Any other string treated as file path (e.g., /path/to/video.mp4)

Source validation (lines 251-264 in pipeline.py):

  • Camera indices and cv-mmap URLs pass without file check
  • File paths must exist (Path.is_file())

FrameStream Contract (input.py)

FrameStream = Iterable[tuple[np.ndarray, dict[str, object]]]

Each iteration yields:

  • frame: np.ndarray - Raw frame array (H, W, C) in uint8
  • metadata: dict[str, object] containing:
    • frame_count: int - Frame index (0-based)
    • timestamp_ns: int - Monotonic timestamp in nanoseconds
    • source: str - The source path/identifier

Windowing Parameters

SilhouetteWindow Class (window.py)

Manages a sliding window of silhouettes for classification:

Constructor parameters:

  • window_size: int (default: 30) - Maximum buffer size (number of frames)
  • stride: int (default: 1) - Frames between classifications
  • gap_threshold: int (default: 15) - Max frame gap before reset

CLI flags:

  • --window: int, min=1, default=30 - Sets window_size
  • --stride: int, min=1, default=30 - Sets classification stride

Behavior:

  • Window is "ready" when buffer has window_size frames
  • Classification triggers when should_classify() returns True (respects stride)
  • Track ID change or frame gap > gap_threshold resets the buffer
  • Silhouette shape must be (64, 44) float32

Output tensor shape: [1, 1, window_size, 64, 44] (batch, channel, seq, height, width)

Required Flags/Arguments

CLI Arguments (pipeline.py lines 267-287)

Flag Type Required Default Description
--source str Yes - Video source (file, camera index, or cvmmap://)
--checkpoint str Yes - Model checkpoint path (.pt file)
--config str No configs/sconet/sconet_scoliosis1k.yaml Model config YAML
--device str No cuda:0 Device for inference
--yolo-model str No yolo11n-seg.pt YOLO segmentation model
--window int No 30 Window size (frames)
--stride int No 30 Classification stride
--nats-url str No None NATS server URL (e.g., nats://localhost:4222)
--nats-subject str No scoliosis.result NATS subject for publishing
--max-frames int No None Maximum frames to process

Validation

  • Source must exist (file) or be valid camera index/cv-mmap URL
  • Checkpoint file must exist
  • Config file must exist

Output Schema

Result Format (output.py create_result())

{
    "frame": int,           # Frame number where classification occurred
    "track_id": int,        # Person/track identifier
    "label": str,           # Classification label
    "confidence": float,    # Confidence score [0.0, 1.0]
    "window": int,          # End frame of window (or window size)
    "timestamp_ns": int     # Timestamp in nanoseconds
}

Publishers (output.py)

  1. ConsolePublisher: Outputs JSON Lines to stdout
  2. NatsPublisher: Publishes to NATS message broker (async, background thread)

Label Values (sconet_demo.py line 60)

LABEL_MAP = {0: "negative", 1: "neutral", 2: "positive"}

Positive Detection Indicator

Positive detection is indicated when:

result["label"] == "positive"

The confidence field indicates the model's confidence in the prediction (0.0 to 1.0).

Test Validation (test_pipeline.py lines 89-106)

def _assert_prediction_schema(prediction: dict[str, object]) -> None:
    assert isinstance(prediction["frame"], int)
    assert isinstance(prediction["track_id"], int)
    
    label = prediction["label"]
    assert isinstance(label, str)
    assert label in {"negative", "neutral", "positive"}  # Valid labels
    
    confidence = prediction["confidence"]
    assert isinstance(confidence, (int, float))
    assert 0.0 <= float(confidence) <= 1.0
    
    window_obj = prediction["window"]
    assert isinstance(window_obj, int)
    assert window_obj >= 0
    
    assert isinstance(prediction["timestamp_ns"], int)

Test References

Pipeline Tests (/home/crosstyan/Code/OpenGait/tests/demo/test_pipeline.py)

  • test_pipeline_cli_happy_path_outputs_json_predictions: Validates full pipeline outputs JSON predictions
  • test_pipeline_cli_fps_benchmark_smoke: FPS benchmark with predictions
  • test_pipeline_cli_max_frames_caps_output_frames: Validates max-frames behavior
  • test_pipeline_cli_invalid_source_path_returns_user_error: Source validation
  • test_pipeline_cli_invalid_checkpoint_path_returns_user_error: Checkpoint validation

Window Tests (/home/crosstyan/Code/OpenGait/tests/demo/test_window.py)

  • test_window_fill_and_ready_behavior: Window readiness logic
  • test_track_id_change_resets_buffer: Track change handling
  • test_frame_gap_reset_behavior: Gap threshold behavior
  • test_get_tensor_shape: Output tensor shape validation
  • test_should_classify_stride_behavior: Stride logic
  • test_push_invalid_shape_raises: Silhouette shape validation

ScoNetDemo Tests (/home/crosstyan/Code/OpenGait/tests/demo/test_sconet_demo.py)

  • test_predict_returns_tuple_with_valid_types: Predict output validation
  • test_predict_confidence_range: Confidence range [0, 1]
  • test_label_map_has_three_classes: Label map validation
  • test_forward_label_range: Label indices {0, 1, 2}

Processing Flow

  1. Input: Video source → FrameStream (frame, metadata)
  2. Detection: YOLO track() → Detection results with boxes, masks, track IDs
  3. Selection: select_person() → Largest bbox person or fallback
  4. Preprocessing: Mask → Silhouette (64, 44) float32
  5. Windowing: SilhouetteWindow.push() → Buffer management
  6. Classification: When should_classify() True → ScoNetDemo.predict()
  7. Output: create_result() → Publisher (Console or NATS)

Error Handling

  • Invalid source: Exit code 2, "Error: Video source not found"
  • Invalid checkpoint: Exit code 2, "Error: Checkpoint not found"
  • Runtime errors: Exit code 1, "Runtime error: ..."
  • Frame processing errors: Logged as warning, frame skipped
  • NATS unavailable: Graceful degradation (logs debug, continues)