diff --git a/.sisyphus/notepads/demo-visualizer/issues.md b/.sisyphus/notepads/demo-visualizer/issues.md index f189726..8077fa5 100644 --- a/.sisyphus/notepads/demo-visualizer/issues.md +++ b/.sisyphus/notepads/demo-visualizer/issues.md @@ -110,4 +110,191 @@ Updated opengait/demo/__main__.py to restore behavior parity: ### Verification - `uv run pytest tests/demo/test_pipeline.py -q` -> 12 passed, 1 skipped - `uv run python -m opengait.demo --help` -> --visualize flag present -- lsp_diagnostics -> only acceptable warnings (reportAny, unused call results) \ No newline at end of file +- lsp_diagnostics -> only acceptable warnings (reportAny, unused call results) +## Fixed: NameError for _FrameMetadata in input.py + +**Status:** RESOLVED + +**Error:** +``` +NameError: name '_FrameMetadata' is not defined +File: opengait/demo/input.py +Location: cvmmap_source() nested async generator return annotation +``` + +**Resolution:** +Moved `_FrameMetadata` and `_CvMmapClient` Protocol classes from inside `if TYPE_CHECKING:` block to module level in `opengait/demo/input.py`. + +**Verification:** +- Import test passes +- All 74 demo tests pass (3 skipped) +- Runtime test confirms `cvmmap_source()` returns generator without NameError + +## Fixed: Window Freeze on No-Detection Frames + +**Status:** RESOLVED + +**Issue:** When running demo with `--visualize` flag on video streams with intermittent or no person detections, the OpenCV window would freeze because `viz_payload` was `None` when no detection occurred, and the visualizer update was skipped. + +**Root Cause:** In `ScoliosisPipeline.run()`, the visualizer update was conditional on both `self._visualizer is not None` AND `viz_payload is not None`. When YOLO returned no detections, `process_frame()` returned `None`, skipping the visualizer update entirely. + +**Fix:** Modified the condition to update visualizer whenever it's enabled, passing `None` values for missing detection data. The `OpenCVVisualizer.update()` method already handles `None` values gracefully by displaying placeholder views. + +**Files Changed:** +- `opengait/demo/pipeline.py`: Lines 344-393 +- `tests/demo/test_pipeline.py`: Added `MockVisualizer` class and `test_pipeline_visualizer_updates_on_no_detection()` + +**Testing:** +- New regression test: `test_pipeline_visualizer_updates_on_no_detection()` - PASSED +- All existing pipeline tests: 13 passed, 1 skipped +- LSP diagnostics: No new errors + +## Task 1: Stabilize overlapped duplicate code blocks - RESOLVED + +### Issues Fixed + +1. **Tuple unpacking mismatch in _select_silhouette**: + - Issue: `select_person()` returns 4-tuple `(mask, bbox_mask, bbox_frame, track_id)`, but code was unpacking as 3-tuple + - Resolution: Updated unpacking to handle all 4 values and use correct bbox for mask_to_silhouette vs return value + +2. **Window freeze on no-detection frames**: + - Issue: Visualizer only updated when `viz_payload is not None`, causing window freeze when no person detected + - Resolution: Update visualizer every frame when enabled, using cached payload for no-detection frames + +3. **Cached payload mutation risk**: + - Issue: Storing direct reference to viz_payload could allow mutation of cached data + - Resolution: Create copies of mutable values (numpy arrays) when caching using `v.copy() if hasattr(v, 'copy')` + +### Pre-existing Issues (Not Fixed) +- LSP type errors with `DemoResult` vs `dict[str, object]` in `_result_buffer` (pre-existing) +- PyArrow type warnings (pre-existing, acceptable) + +### Tools Used +- sed for targeted line replacements +- Python script for complex multi-line block replacement +- Native Edit tool for simple single-line changes +- pytest for verification + + +## Task: BBoxXYXY Type Alias Introduction - COMPLETED + +### Scope +Type-semantics cleanup only. No new features, no behavior changes. + +### Changes Summary +- **opengait/demo/pipeline.py**: 3 occurrences of `tuple[int, int, int, int]` converted to `BBoxXYXY` + - Import statement updated to include BBoxXYXY from .preprocess + - Function return type annotation in `_select_silhouette()` + - Cast expression type annotation in `run()` method + +### No Issues Encountered +- All other scoped files (window.py, visualizer.py, preprocess.py) already used BBoxXYXY correctly +- Tests pass without modification +- No semantic regressions detected + +### Pre-existing Conditions Preserved +- LSP warnings in pipeline.py (DemoResult type mismatch, PyArrow Any types) - unchanged +- LSP warnings in window.py (torch/numpy Any types) - unchanged +- LSP warnings in visualizer.py (cv2.Any types) - unchanged + +All warnings existed before this task and are unrelated to bbox type annotations. + +## 2026-02-28: TypedDict Contract Application + +### Pre-existing Issues Encountered +1. **Visualization payload type error** (line 353): Dictionary comprehension with `hasattr(v, 'copy')` check caused basedpyright error because type checker cannot verify attribute existence through runtime introspection + - Fixed by extracting to explicit for-loop with `Callable[[], object]` cast pattern + +2. **Type narrowing in comprehensions**: Type narrowing (`is not None` checks) doesn't propagate into dictionary comprehensions, requiring explicit cast before the comprehension + +### Remaining Warnings (Pre-existing, Not Related to Task) +- pyarrow types show as `Any` (dynamic import) +- Some `list.append()` methods show as partially unknown +- These are in export code paths unrelated to DemoResult typing + +## Fixed: BBox Scale Bug in Fallback Path (2026-02-28) + +**Status:** RESOLVED + +**Issue:** In `_select_silhouette()` fallback path, bbox returned from `frame_to_person_mask()` was in mask-space coordinates, causing visualization overlay to appear at wrong scale. + +**Resolution:** +Added coordinate space conversion using `result.orig_shape` to scale bbox from mask-space to frame-space: +- Extract frame dimensions from `result.orig_shape` with safe fallback +- Calculate scale factors between mask and frame dimensions +- Apply scaling to bbox coordinates for frame-space positioning +- Graceful fallback to mask-space bbox if conversion not possible + +**Files Changed:** +- `opengait/demo/pipeline.py`: Lines 206-226 in `_select_silhouette()` method + +**Verification:** +- `uv run pytest tests/demo/test_pipeline.py -q` - PASSED (14 passed, 1 skipped) +- LSP diagnostics: No new errors + +## Fixed: Both-Mode Raw Mask Empty Display (2026-02-28) + +**Status:** RESOLVED + +**Issue:** In `_prepare_both_view()`, raw mask pane appeared empty because primary path returns float32 [0,1] mask while display expects uint8 [0,255]. + +**Resolution:** +Added dtype normalization to ensure mask is always uint8 before display: +- Check if mask is float32/float64 dtype +- If so, scale by 255 and convert to uint8 +- Handles both primary path (float) and fallback path (uint8) consistently + +**Files Changed:** +- `opengait/demo/visualizer.py`: Lines 330-332 in `_prepare_both_view()` method + +**Verification:** +- `uv run pytest tests/demo/test_pipeline.py -q` - PASSED (14 passed, 1 skipped) +- LSP diagnostics: No new errors + + + +## 2026-02-28: Tuple-to-TypedDict Migration Issues + +### Regressions Encountered + +1. **Duplicate code blocks in pipeline.py** + - Root cause: Incremental edits left stale tuple unpacking alongside new dict access + - Fix: Used Python script for atomic replacement instead of multiple small edits + +2. **Stale tuple assertions in test_window.py** + - Root cause: Tests had both dict assertions AND tuple unpacking/assertions + - Fix: Systematically removed all `assert mask...`, `assert bbox...`, `assert tid...` lines + +3. **Mock return value overwrite in test_pipeline.py** + - Root cause: Dict mock was immediately overwritten by tuple mock + - Fix: Removed the second tuple mock assignment + +### Lessons Learned + +- Use Python scripts for complex multi-line replacements to avoid file corruption +- Always grep for ALL tuple unpacking patterns (`a, b = result`, `a, b, c = result`, etc.) +- Tests may have BOTH dict and tuple assertions - must remove both +- Mock return values may be defined twice - check for overwrites + +### Pre-existing Issues (not introduced) + +- `BBoxXYXY` import unused in window.py (pre-existing) +- Various pyright warnings about Any types (pre-existing) +- Unused imports in pipeline.py (pre-existing) + + +## 2026-02-28: bbox_frame Documentation (Minor) + +### Issue +User requested clarification of `bbox_frame` semantics in docstrings. + +### Resolution +Updated `select_person` docstring in `window.py` to explicitly document: +- `bbox_mask`: mask-space XYXY (downsampled, for processing) +- `bbox_frame`: frame-space XYXY (full resolution, for visualization) + +### Files Changed +- `opengait/demo/window.py` - Docstring only + +### Note +The git checkout restored the tuple return; updated to dict return to match TypedDict contract. diff --git a/.sisyphus/notepads/demo-visualizer/learnings.md b/.sisyphus/notepads/demo-visualizer/learnings.md index 61c8883..4a098fa 100644 --- a/.sisyphus/notepads/demo-visualizer/learnings.md +++ b/.sisyphus/notepads/demo-visualizer/learnings.md @@ -286,3 +286,396 @@ Added `cvmmap-client` as a dependency sourced from local path `/home/crosstyan/C - Import path: `from cvmmap import CvMmapClient` (module name) - uv path sources require absolute paths - Lockfile captures the path dependency for reproducibility + +## Bugfix: _FrameMetadata NameError in cvmmap_source() + +### Root Cause +The `_FrameMetadata` Protocol class was defined inside a `TYPE_CHECKING` block, making it available only during type checking but not at runtime. However, `cvmmap_source()` used `_FrameMetadata` in a runtime function annotation: + +```python +async def _async_generator() -> AsyncIterator[tuple[np.ndarray, _FrameMetadata]]: +``` + +This caused `NameError: name '_FrameMetadata' is not defined` when the function was called. + +### Fix +Moved both `_FrameMetadata` and `_CvMmapClient` Protocol classes out of the `TYPE_CHECKING` block to module level. These protocols are lightweight (just method signatures) and don't require heavy imports, so there's minimal runtime cost. + +### Verification +- `lsp_diagnostics`: Only 1 warning (unnecessary type: ignore comment) +- `uv run python -c "from opengait.demo.input import cvmmap_source"`: Import successful +- `uv run pytest tests/demo/`: 74 passed, 3 skipped +- Runtime test confirms generator creation works without NameError + +### Pattern to Remember +When using Protocol classes for type annotations: +- If the annotation is used in a nested function (defined inside another function), it needs to be available at runtime +- TYPE_CHECKING-guarded types can only be used in annotations that are evaluated at type-check time +- Module-level function annotations are evaluated at definition time (runtime) + +## Fix: Window Freeze on No-Detection Frames + +### Problem +When YOLO doesn't detect any person in a frame, `process_frame()` returns `None`. The visualizer update was conditional on `viz_payload is not None`, causing the OpenCV window to freeze when no detections were present for extended periods. + +### Solution +Modified `ScoliosisPipeline.run()` in `opengait/demo/pipeline.py` to always call `self._visualizer.update()` when visualization is enabled, even when `viz_payload` is `None`. When no detection data is available, the visualizer receives `None` values for bbox, mask_raw, silhouette, label, and confidence, which it handles gracefully by displaying placeholder views. + +### Changes Made +1. **opengait/demo/pipeline.py** (lines 344-393): + - Changed condition from `if self._visualizer is not None and viz_payload is not None:` to `if self._visualizer is not None:` + - Added else branch to set default values (all None/0) when `viz_payload is None` + - Visualizer now receives frame and FPS data every frame, preventing window freeze + +2. **tests/demo/test_pipeline.py**: + - Added `MockVisualizer` class to track update calls + - Added `test_pipeline_visualizer_updates_on_no_detection()` regression test + - Test verifies visualizer is called for all frames even when YOLO returns no detections + +### Verification +- New regression test passes (RED -> GREEN TDD cycle) +- All 13 existing pipeline tests pass (1 skipped) +- No new LSP errors introduced +- Visualizer handles None values gracefully (existing behavior) + +## Task 1: Stabilize overlapped duplicate code blocks - COMPLETED + +### Changes Made to opengait/demo/pipeline.py + +1. **Fixed tuple unpacking in _select_silhouette (lines 181-187)**: + - Changed `mask_raw, bbox, track_id = selected` to `mask_raw, bbox_mask, bbox_frame, track_id = selected` + - Updated `mask_to_silhouette()` call to use `bbox_mask` (mask coordinate space) + - Changed return statement to use `bbox_frame` (frame coordinate space) + - This fixes the 4-tuple unpacking from `select_person()` which returns `(mask, bbox_mask, bbox_frame, track_id)` + +2. **Added _last_viz_payload attribute**: + - Added `_last_viz_payload: dict[str, object] | None` class field after `_visualizer` + - Initialized to `None` in `__init__` after visualizer initialization + +3. **Updated visualizer logic in run() for no-detection caching**: + - Changed condition from `if self._visualizer is not None and viz_payload is not None:` to just `if self._visualizer is not None:` + - Added caching logic: when `viz_payload is not None`, cache a copy with `v.copy() if hasattr(v, 'copy') else v` for each value + - Use cached payload when current frame has no detection: `viz_data = viz_payload if viz_payload is not None else self._last_viz_payload` + - Handle case when no cache exists: use default None/0 values + - Visualizer now updates every frame when enabled, preventing window freeze + +### Verification Results +- LSP diagnostics: No new critical errors (only pre-existing warnings) +- Tests: 14 passed, 1 skipped +- Import test: OK + +### Key Implementation Detail +The caching logic uses a dict comprehension with `.copy()` for mutable values (numpy arrays) to prevent mutation issues, satisfying the test requirement that cached masks should be copies, not the same object reference. + + +## Task: BBoxXYXY Type Alias Introduction - COMPLETED + +### Summary +Introduced and applied `BBoxXYXY` alias for bbox semantics across demo module files. +The alias was already defined in `preprocess.py`; this task applied it consistently to +`pipeline.py` which had raw `tuple[int, int, int, int]` annotations. + +### Changes Made +1. **opengait/demo/pipeline.py**: + - Added import: `from .preprocess import BBoxXYXY, frame_to_person_mask, mask_to_silhouette` + - Changed `_select_silhouette` return type annotation (line 176): + - Before: `tuple[int, int, int, int]` + - After: `BBoxXYXY` + - Changed fallback cast annotation (line 192): + - Before: `tuple[UInt8[ndarray, "h w"], tuple[int, int, int, int]] | None` + - After: `tuple[UInt8[ndarray, "h w"], BBoxXYXY] | None` + - Changed bbox cast in run() method (line 372): + - Before: `cast(tuple[int, int, int, int] | None, bbox_obj)` + - After: `cast(BBoxXYXY | None, bbox_obj)` + +### Files Already Using BBoxXYXY (No Changes Needed) +- **opengait/demo/window.py**: Already imports and uses `BBoxXYXY` for `select_person()` return type +- **opengait/demo/visualizer.py**: Already imports and uses `BBoxXYXY` for `_draw_bbox()` and `update()` parameters +- **opengait/demo/preprocess.py**: Defines `BBoxXYXY = tuple[int, int, int, int]` with docstring + +### Verification +- `grep 'tuple\[int, int, int, int\]'` in demo/ scope: Only matches alias definition (preprocess.py line 27) +- `lsp_diagnostics`: No new errors introduced from changes +- `uv run pytest tests/demo/test_pipeline.py -q`: 14 passed, 1 skipped + +### Semantic Preservation +- No runtime behavior changes - purely type annotation cleanup +- Mask-space vs frame-space bbox semantics unchanged +- All bbox coordinate calculations preserved exactly + +## 2026-02-28: TypedDict Contract Applied for DemoResult + +### Changes Made +1. **opengait/demo/output.py**: `create_result()` already returns `DemoResult` TypedDict (confirmed) +2. **opengait/demo/pipeline.py**: + - Added `DemoResult` to imports from `.output` + - Changed `_result_buffer` type from `list[dict[str, object]]` to `list[DemoResult]` + - Fixed pre-existing type error in visualization payload caching by extracting dict comprehension to explicit loop with `Callable[[], object]` cast + +### Key Patterns +- TypedDict provides structural typing for dictionary results +- `DemoResult` fields: `frame`, `track_id`, `label`, `confidence`, `window`, `timestamp_ns` +- Pipeline buffer now enforces consistent result typing throughout the pipeline + +### Type Safety +- `output.py`: No diagnostics (clean) +- `pipeline.py`: Only warnings (no errors) - warnings are pre-existing pyarrow/dynamic import related +- All 14 tests pass + +## 2026-02-28: Visualization Regression Fixes - BBox Scale and Both-Mode Raw Mask + +### Issue A: BBox Scale Bug in Fallback Path + +**Problem:** In `_select_silhouette()` fallback path (when `select_person()` returns None), the bbox from `frame_to_person_mask()` was in mask-space coordinates but was being drawn on the full-size frame, causing the overlay to appear much smaller than actual. + +**Root Cause:** +- `frame_to_person_mask()` returns bbox in mask coordinate space (scaled down from original frame) +- Primary path through `select_person()` correctly returns both `bbox_mask` and `bbox_frame` +- Fallback path was returning mask-space bbox directly without conversion + +**Fix:** +- Added frame-space bbox conversion in fallback path using `result.orig_shape` +- Calculates scale factors: `scale_x = frame_w / mask_w`, `scale_y = frame_h / mask_h` +- Applies scaling to bbox coordinates: `bbox_frame = (int(bbox_mask[0] * scale_x), ...)` +- Safely falls back to mask-space bbox if `orig_shape` unavailable or dimensions invalid + +**Code Location:** `opengait/demo/pipeline.py` lines 206-226 in `_select_silhouette()` + +### Issue B: Both-Mode Raw Mask Appears Empty + +**Problem:** In `_prepare_both_view()`, the raw mask pane appeared empty/black in "Both" display mode. + +**Root Cause:** +- Primary path returns mask as float32 with values in [0, 1] range +- Fallback path returns mask as uint8 with values in [0, 255] range +- OpenCV display expects uint8 [0, 255] for proper visualization +- Float32 [0, 1] values displayed as nearly black + +**Fix:** +- Added dtype normalization in `_prepare_both_view()` before resizing +- Check: `if mask_gray.dtype == np.float32 or mask_gray.dtype == np.float64:` +- Convert: `mask_gray = (mask_gray * 255).astype(np.uint8)` +- Ensures consistent uint8 [0, 255] display format regardless of input dtype + +**Code Location:** `opengait/demo/visualizer.py` lines 330-332 in `_prepare_both_view()` + +### Verification +- All 14 pipeline tests pass (1 skipped) +- LSP diagnostics show only pre-existing warnings +- No new tests needed - existing test coverage sufficient + + + +## 2026-02-28: Tuple-to-TypedDict Migration for select_person and _select_silhouette + +### Summary +Successfully migrated two function return contracts from anonymous tuples to explicit TypedDict structures: +1. `window.select_person()` → `PersonSelection | None` +2. `pipeline._select_silhouette()` → `SilhouetteSelection | None` + +### Key TypedDict Definitions + +**PersonSelection** (in `opengait/demo/window.py`): +```python +class PersonSelection(TypedDict): + mask: "Float[ndarray, 'h w']" # jaxtyping annotated + bbox_mask: "tuple[int, int, int, int]" + bbox_frame: "tuple[int, int, int, int]" + track_id: int +``` + +**SilhouetteSelection** (in `opengait/demo/pipeline.py`): +```python +class SilhouetteSelection(TypedDict): + silhouette: "Float[ndarray, '64 44']" # jaxtyping annotated + mask_raw: "UInt8[ndarray, 'h w']" # jaxtyping annotated + bbox: "BBoxXYXY" + track_id: int +``` + +### Migration Pattern +- Replace `tuple[X, Y, Z] | None` return type with `TypedDictName | None` +- Replace tuple unpacking at call sites with dict key access +- Update all return statements to return dict literals +- Ensure jaxtyping annotations are preserved on ndarray fields + +### Files Modified +- `opengait/demo/window.py` - Added PersonSelection TypedDict, updated select_person +- `opengait/demo/pipeline.py` - Added SilhouetteSelection TypedDict, updated _select_silhouette +- `tests/demo/test_window.py` - Updated assertions to use dict keys +- `tests/demo/test_pipeline.py` - Updated mock return to use dict + +### Verification +- `uv run pytest tests/demo/test_window.py -q` → 22 passed, 1 skipped +- `uv run pytest tests/demo/test_pipeline.py -q` → 14 passed, 1 skipped +- No errors in lsp_diagnostics (only pre-existing warnings) + +### Behavior Confirmation +No behavior change - purely a typing/contract refactor. All visualization, caching, and bbox scaling logic remains unchanged. + + +## 2026-02-28: ProcessFrameResult TypedDict Added + +### Summary +Added `ProcessFrameResult` TypedDict for `ScoliosisPipeline.process_frame()` return contract. + +### ProcessFrameResult Definition (pipeline.py) +```python +class ProcessFrameResult(TypedDict): + """Structured return type for process_frame method. + + Contains visualization payload and optional classification result. + Uses jaxtyping for array fields to preserve shape information. + """ + + mask_raw: "UInt8[ndarray, 'h w'] | None" + bbox: "BBoxXYXY | None" + silhouette: "Float[ndarray, '64 44'] | None" + track_id: int + label: "str | None" + confidence: "float | None" + result: "DemoResult | None" +``` + +### Files Modified +- `opengait/demo/pipeline.py` - Added ProcessFrameResult TypedDict, updated process_frame return type + +### Changes Made +1. Added `ProcessFrameResult` TypedDict with jaxtyping annotations for array fields +2. Updated `process_frame() -> ProcessFrameResult | None` +3. Updated all return statements to include `result` field (None for non-classified paths) +4. Updated `_last_viz_payload` type from `dict[str, object]` to `ProcessFrameResult` +5. Updated cast in `run()` from `dict[str, object]` to `ProcessFrameResult` + +### Verification +- `uv run pytest tests/demo/test_pipeline.py -q` → 14 passed, 1 skipped +- No errors in lsp_diagnostics (only pre-existing warnings) + +### Behavior Confirmation +No behavior change - purely a typing/contract refactor. + + +## 2026-02-28: Type Quality Cleanup (Oracle Feedback) + +### Changes Made + +1. **window.py - PersonSelection bbox type consistency** + - Changed `bbox_mask` and `bbox_frame` types from `"tuple[int, int, int, int]"` to `"BBoxXYXY"` + - Uses the existing type alias for semantic clarity and consistency + +2. **pipeline.py - run() visualization payload cleanup** + - Replaced generic `cached: dict[str, object]` with `cached: ProcessFrameResult` + - Replaced `.get()` access with direct key access on `ProcessFrameResult` + - Simplified extraction: removed intermediate `*_obj` variables and casting cascade + - Maintained copy semantics for mutable numpy arrays (mask_raw, silhouette) + +### Benefits +- Type-safe caching with explicit ProcessFrameResult structure +- Direct key access eliminates unnecessary `.get()` calls and type casts +- Consistent use of BBoxXYXY alias across window and pipeline modules + +### Verification +- `uv run pytest tests/demo/test_window.py -q` → 22 passed, 1 skipped +- `uv run pytest tests/demo/test_pipeline.py -q` → 14 passed, 1 skipped +- No runtime behavior changes + + +## 2026-02-28: bbox_frame Documentation Added + +### Documentation Changes + +Updated docstrings to clarify coordinate space semantics for bbox fields: + +**PersonSelection (window.py)**: +``` +Coordinate spaces: + - bbox_mask: Bounding box in mask coordinate space (XYXY: x1, y1, x2, y2). + This is the downsampled mask resolution (e.g., 100x100), used for + silhouette extraction and mask-space operations. + - bbox_frame: Bounding box in original frame pixel coordinates (XYXY: x1, y1, x2, y2). + This is the full-resolution frame size (e.g., 640x480), used for + visualization overlay on the original video frame. +``` + +**SilhouetteSelection (pipeline.py)**: +``` +The bbox field is in original frame pixel coordinates (XYXY: x1, y1, x2, y2), +suitable for visualization overlay on the original video frame. +``` + +**ProcessFrameResult (pipeline.py)**: +``` +The bbox field is in original frame pixel coordinates (XYXY: x1, y1, x2, y2), +suitable for visualization overlay on the original video frame. +``` + +**select_person function (window.py)**: +``` +- bbox_mask: bounding box in mask coordinate space (XYXY format: x1, y1, x2, y2). + This is the downsampled mask resolution, used for silhouette extraction. +- bbox_frame: bounding box in original frame pixel coordinates (XYXY format: x1, y1, x2, y2). + This is the full-resolution frame size, used for visualization overlay. +``` + +### Key Distinction +- `bbox_mask`: Mask-space coordinates (downsampled, for processing) +- `bbox_frame`: Frame-space coordinates (full resolution, for visualization) + +### Verification +- `uv run pytest tests/demo/test_window.py tests/demo/test_pipeline.py -q` → 36 passed, 1 skipped + + +## 2026-02-28: bbox_frame Documentation in select_person + +### Docstring Updated (window.py) + +Updated `select_person` function docstring to explicitly document coordinate spaces: + +```python +Returns: + PersonSelection dictionary with keys: + - mask: the person's segmentation mask + - bbox_mask: bounding box in mask coordinate space (XYXY format: x1, y1, x2, y2). + This is the downsampled mask resolution (e.g., 100x100), used for + silhouette extraction and mask-space operations. + - bbox_frame: bounding box in original frame pixel coordinates (XYXY format: x1, y1, x2, y2). + This is the full-resolution frame size (e.g., 640x480), used for + visualization overlay on the original video frame. + - track_id: the person's track ID +``` + +### Key Distinction +- `bbox_mask`: Mask-space coordinates (downsampled resolution, for processing) +- `bbox_frame`: Frame-space coordinates (full resolution, for visualization) + +### Verification +- `uv run pytest tests/demo/test_window.py -q` → 22 passed + + +## 2026-02-28: pkl Alias for silhouette-export-format + +### Summary +Added `pkl` as an alias for `pickle` in `--silhouette-export-format` CLI option. + +### Implementation +In `opengait/demo/pipeline.py` `__init__` method: +```python +self._silhouette_export_format = silhouette_export_format +# Normalize format alias: pkl -> pickle +if self._silhouette_export_format == "pkl": + self._silhouette_export_format = "pickle" +``` + +### Behavior +- `pkl` → normalized to `pickle` → pickle export +- `pickle` → unchanged → pickle export +- `parquet` → unchanged → parquet export +- `invalid` → unchanged → ValueError on export + +### Verification +- All existing tests pass +- Custom validation test confirms: + - pkl alias works + - pickle still works + - parquet still works + - invalid formats still rejected diff --git a/.sisyphus/notepads/sconet-preprocess-research/learnings.md b/.sisyphus/notepads/sconet-preprocess-research/learnings.md new file mode 100644 index 0000000..119fd89 --- /dev/null +++ b/.sisyphus/notepads/sconet-preprocess-research/learnings.md @@ -0,0 +1,115 @@ +# ScoNet Preprocessing Research - Learnings + +## Official Sources Identified + +### 1. Primary Implementation (HIGHEST TRUST) +- **Repository**: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait) +- **ScoNet Model Code**: opengait/modeling/models/sconet.py +- **Preprocessing Code**: datasets/pretreatment.py (lines 18-95) +- **Config**: configs/sconet/sconet_scoliosis1k.yaml + +### 2. Dataset Source +- **Scoliosis1K Dataset**: https://zhouzi180.github.io/Scoliosis1K/ +- **Raw silhouettes**: Extracted using PP-HumanSeg v2 +- **Raw pose**: Extracted using ViTPose + +### 3. Academic Papers +- **MICCAI 2024**: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.) + - PDF: https://arxiv.org/pdf/2407.05726 + - Introduces ScoNet and Scoliosis1K dataset + +- **MICCAI 2025**: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.) + - PDF: https://arxiv.org/abs/2509.00872 + - Extends ScoNet with pose annotations + +## Preprocessing Pipeline (Confirmed from Official Code) + +### From `datasets/pretreatment.py` (imgs2pickle function): + +```python +# Step 1: Filter empty images +if img.sum() <= 10000: + continue + +# Step 2: VERTICAL TIGHT CROP (y-axis projection) +y_sum = img.sum(axis=1) +y_top = (y_sum != 0).argmax(axis=0) +y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0) +img = img[y_top: y_btm + 1, :] # <-- TIGHT CROP TO PERSON HEIGHT + +# Step 3: Resize based on height (maintain aspect ratio) +ratio = img.shape[1] / img.shape[0] +img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC) + +# Step 4: Find x-center by cumulative sum +x_csum = img.sum(axis=0).cumsum() +for idx, csum in enumerate(x_csum): + if csum > img.sum() / 2: + x_center = idx + break + +# Step 5: Horizontal crop to img_size width (centered) +half_width = img_size // 2 +left = x_center - half_width +right = x_center + half_width + +# Step 6: Padding if needed +if left <= 0 or right >= img.shape[1]: + left += half_width + right += half_width + _ = np.zeros((img.shape[0], half_width)) + img = np.concatenate([_, img, _], axis=1) + +# Final crop +to_pickle.append(img[:, left: right].astype('uint8')) +``` + +### Key Parameters +- **Default img_size**: 64 (configurable via `--img_size`) +- **Interpolation**: cv2.INTER_CUBIC +- **Output format**: uint8 grayscale, pickle files +- **Normalization**: None during preprocessing (happens later in BaseSilTransform) + +## Alignment with Local Implementation + +### CONFIRMED: Vertical tight-crop before resize is OFFICIAL +- **Evidence**: Line 50 in pretreatment.py: `img = img[y_top: y_btm + 1, :]` +- **Purpose**: Removes vertical padding, focuses on actual person silhouette +- **Resize behavior**: Height-based resize maintains aspect ratio + +### Transform Pipeline (from config) +```yaml +evaluator_cfg: + transform: + - type: BaseSilCuttingTransform # Optional cutting + +trainer_cfg: + transform: + - type: BaseSilCuttingTransform # Optional cutting +``` + +From `opengait/data/transform.py`: +- `BaseSilCuttingTransform`: Applies optional cutting + divides by 255.0 +- Default cutting: `int(x.shape[-1] // 64) * 10` pixels from sides +- If cutting=0, only normalization is applied + +## Differences from Standard Gait Recognition + +1. **No horizontal flip augmentation** in ScoNet config +2. **Evaluation uses**: `evaluate_scoliosis` function (not standard gait metrics) +3. **Class num**: 3 (Positive, Neutral, Negative) vs 74+ for gait ID +4. **Metric**: euclidean distance (not cosine) + +## Critical Finding for User Concern + +**Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.** + +This is NOT a bug - it's the intended preprocessing pipeline: +1. Crop to person's actual height (remove empty vertical space) +2. Resize to fixed height (64px) maintaining aspect ratio +3. Center crop/pad horizontally to get 64x64 output + +This ensures: +- Consistent scale across different camera distances +- Person fills the frame vertically +- Aspect ratio is preserved