Files
OpenGait/docs/sconet-drf-status-and-training.md
T

87 lines
5.8 KiB
Markdown

# ScoNet and DRF: Status, Architecture, and Training Guide
This document provides a technical overview of the Scoliosis screening models in OpenGait, mapping paper concepts to the repository's implementation status.
## DRF implementation status in OpenGait
As of the current version, the **Dual Representation Framework (DRF)** described in the MICCAI 2025 paper *"Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening"* is **not yet explicitly implemented** as a standalone model in this repository.
### Current State
- **ScoNet-MT (Functional Implementation)**: While the class in `opengait/modeling/models/sconet.py` is named `ScoNet`, it is functionally the **ScoNet-MT** (Multi-Task) variant described in the MICCAI 2024 paper. It utilizes both classification and triplet losses.
- **Dual Representation (DRF)**: While `opengait/modeling/models/skeletongait++.py` implements a dual-representation (silhouette + pose heatmap) architecture for gait recognition, the specific DRF screening model (MICCAI 2025) is not yet explicitly implemented as a standalone class.
- **Naming Note**: The repository uses the base name `ScoNet` for the multi-task implementation, as it is the high-performance variant recommended for use.
### Implementation Blueprint for DRF
To implement DRF within the OpenGait framework, follow this structure:
1. **Model Location**: Create `opengait/modeling/models/drf.py` inheriting from `BaseModel`.
2. **Input Handling**: Extend `inputs_pretreament` to handle both silhouettes and pose heatmaps (refer to `SkeletonGaitPP.inputs_pretreament` in `skeletongait++.py`).
3. **Dual-Branch Backbone**: Use separate early layers for silhouette and skeleton map streams, then fuse via `AttentionFusion` (from `skeletongait++.py:135`) or a PAV-Guided Attention module as described in the DRF paper.
4. **Forward Contract**:
- `training_feat`: Must include `triplet` (for identity/feature consistency) and `softmax` (for screening classification).
- `visual_summary`: Include `image/sils` and `image/heatmaps` for TensorBoard visualization.
- `inference_feat`: Return `logits` for classification.
5. **Config**: Create `configs/drf/drf_scoliosis1k.yaml` specifying `model: DRF` and configuring the dual-stream backbone.
6. **Evaluator**: Use `eval_func: evaluate_scoliosis` in the config to leverage the existing screening metrics (Accuracy, Precision, Recall, F1).
7. **Dataset**: Requires the **Scoliosis1K-Pose** dataset which provides 17 anatomical keypoints in MS-COCO format alongside the existing silhouettes.
---
## ScoNet/ScoNet-MT architecture mapping
> [!IMPORTANT]
> **Naming Clarification**: The implementation in this repository is **ScoNet-MT**, not the single-task ScoNet.
> - **ScoNet (Single-Task)**: Defined in the paper as using only CrossEntropyLoss.
> - **ScoNet-MT (Multi-Task)**: Defined as using $L_{total} = L_{ce} + L_{triplet}$.
>
> **Evidence for ScoNet-MT in this repo:**
> 1. **Dual Loss Configuration**: `configs/sconet/sconet_scoliosis1k.yaml` (lines 24-33) defines both `TripletLoss` (margin: 0.2) and `CrossEntropyLoss`.
> 2. **Dual-Key Forward Pass**: `sconet.py` (lines 42-46) returns both `'triplet'` and `'softmax'` keys in the `training_feat` dictionary.
> 3. **Triplet Sampling**: The trainer uses `TripletSampler` with `batch_size: [8, 8]` (P=8, K=8) to support triplet mining (config lines 92-99).
>
> A "pure" ScoNet implementation would require removing the `TripletLoss`, switching to a standard `InferenceSampler`, and removing the `triplet` key from the model's `forward` return.
The `ScoNet` (functionally ScoNet-MT) implementation in `opengait/modeling/models/sconet.py` maps to the paper as follows:
| Paper Component | Code Reference | Description |
| :--- | :--- | :--- |
| **Backbone** | `ResNet9` in `backbones/resnet.py` | A customized ResNet with 4 layers and configurable channels. |
| **Temporal Aggregation** | `self.TP` (Temporal Pooling) | Uses `PackSequenceWrapper(torch.max)` to aggregate frame features. |
| **Spatial Features** | `self.HPP` (Horizontal Pooling) | `HorizontalPoolingPyramid` with `bin_num: 16`. |
| **Feature Mapping** | `self.FCs` (`SeparateFCs`) | Maps pooled features to a latent embedding space. |
| **Classification Head** | `self.BNNecks` (`SeparateBNNecks`) | Produces logits for the 3-class screening task. |
| **Label Mapping** | `sconet.py` lines 21-23 | `negative: 0`, `neutral: 1`, `positive: 2`. |
---
## Training guide (dataloader, optimizer, logging)
### Dataloader Setup
The training configuration is defined in `configs/sconet/sconet_scoliosis1k.yaml`:
- **Sampler**: `TripletSampler` (standard for OpenGait).
- **Batch Size**: `[8, 8]` (8 identities, 8 sequences per identity).
- **Sequence Sampling**: `fixed_unordered` with `frames_num_fixed: 30`.
- **Transform**: `BaseSilCuttingTransform` for silhouette preprocessing.
### Optimizer and Scheduler
- **Optimizer**: SGD
- `lr: 0.1`
- `momentum: 0.9`
- `weight_decay: 0.0005`
- **Scheduler**: `MultiStepLR`
- `milestones: [10000, 14000, 18000]`
- `gamma: 0.1`
- **Total Iterations**: 20,000.
### Logging
- **TensorBoard**: OpenGait natively supports TensorBoard logging. Training losses (`triplet`, `softmax`) and accuracies are logged every `log_iter: 100`.
- **WandB**: There is **no native Weights & Biases (WandB) integration** in the current codebase. Users wishing to use WandB must manually integrate it into `opengait/utils/msg_manager.py` or `opengait/main.py`.
- **Evaluation**: Metrics (Accuracy, Precision, Recall, F1) are computed by `evaluate_scoliosis` in `opengait/evaluation/evaluator.py` and logged to the console/file.
---
## Evidence References
- **Model Implementation**: `opengait/modeling/models/sconet.py`
- **Training Config**: `configs/sconet/sconet_scoliosis1k.yaml`
- **Evaluation Logic**: `opengait/evaluation/evaluator.py::evaluate_scoliosis`
- **Backbone Definition**: `opengait/modeling/backbones/resnet.py::ResNet9`