Files
OpenGait/docs/sconet-drf-status-and-training.md
T

5.8 KiB

ScoNet and DRF: Status, Architecture, and Training Guide

This document provides a technical overview of the Scoliosis screening models in OpenGait, mapping paper concepts to the repository's implementation status.

DRF implementation status in OpenGait

As of the current version, the Dual Representation Framework (DRF) described in the MICCAI 2025 paper "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" is not yet explicitly implemented as a standalone model in this repository.

Current State

  • ScoNet-MT (Functional Implementation): While the class in opengait/modeling/models/sconet.py is named ScoNet, it is functionally the ScoNet-MT (Multi-Task) variant described in the MICCAI 2024 paper. It utilizes both classification and triplet losses.
  • Dual Representation (DRF): While opengait/modeling/models/skeletongait++.py implements a dual-representation (silhouette + pose heatmap) architecture for gait recognition, the specific DRF screening model (MICCAI 2025) is not yet explicitly implemented as a standalone class.
  • Naming Note: The repository uses the base name ScoNet for the multi-task implementation, as it is the high-performance variant recommended for use.

Implementation Blueprint for DRF

To implement DRF within the OpenGait framework, follow this structure:

  1. Model Location: Create opengait/modeling/models/drf.py inheriting from BaseModel.
  2. Input Handling: Extend inputs_pretreament to handle both silhouettes and pose heatmaps (refer to SkeletonGaitPP.inputs_pretreament in skeletongait++.py).
  3. Dual-Branch Backbone: Use separate early layers for silhouette and skeleton map streams, then fuse via AttentionFusion (from skeletongait++.py:135) or a PAV-Guided Attention module as described in the DRF paper.
  4. Forward Contract:
    • training_feat: Must include triplet (for identity/feature consistency) and softmax (for screening classification).
    • visual_summary: Include image/sils and image/heatmaps for TensorBoard visualization.
    • inference_feat: Return logits for classification.
  5. Config: Create configs/drf/drf_scoliosis1k.yaml specifying model: DRF and configuring the dual-stream backbone.
  6. Evaluator: Use eval_func: evaluate_scoliosis in the config to leverage the existing screening metrics (Accuracy, Precision, Recall, F1).
  7. Dataset: Requires the Scoliosis1K-Pose dataset which provides 17 anatomical keypoints in MS-COCO format alongside the existing silhouettes.

ScoNet/ScoNet-MT architecture mapping

Important

Naming Clarification: The implementation in this repository is ScoNet-MT, not the single-task ScoNet.

  • ScoNet (Single-Task): Defined in the paper as using only CrossEntropyLoss.
  • ScoNet-MT (Multi-Task): Defined as using L_{total} = L_{ce} + L_{triplet}.

Evidence for ScoNet-MT in this repo:

  1. Dual Loss Configuration: configs/sconet/sconet_scoliosis1k.yaml (lines 24-33) defines both TripletLoss (margin: 0.2) and CrossEntropyLoss.
  2. Dual-Key Forward Pass: sconet.py (lines 42-46) returns both 'triplet' and 'softmax' keys in the training_feat dictionary.
  3. Triplet Sampling: The trainer uses TripletSampler with batch_size: [8, 8] (P=8, K=8) to support triplet mining (config lines 92-99).

A "pure" ScoNet implementation would require removing the TripletLoss, switching to a standard InferenceSampler, and removing the triplet key from the model's forward return.

The ScoNet (functionally ScoNet-MT) implementation in opengait/modeling/models/sconet.py maps to the paper as follows:

Paper Component Code Reference Description
Backbone ResNet9 in backbones/resnet.py A customized ResNet with 4 layers and configurable channels.
Temporal Aggregation self.TP (Temporal Pooling) Uses PackSequenceWrapper(torch.max) to aggregate frame features.
Spatial Features self.HPP (Horizontal Pooling) HorizontalPoolingPyramid with bin_num: 16.
Feature Mapping self.FCs (SeparateFCs) Maps pooled features to a latent embedding space.
Classification Head self.BNNecks (SeparateBNNecks) Produces logits for the 3-class screening task.
Label Mapping sconet.py lines 21-23 negative: 0, neutral: 1, positive: 2.

Training guide (dataloader, optimizer, logging)

Dataloader Setup

The training configuration is defined in configs/sconet/sconet_scoliosis1k.yaml:

  • Sampler: TripletSampler (standard for OpenGait).
  • Batch Size: [8, 8] (8 identities, 8 sequences per identity).
  • Sequence Sampling: fixed_unordered with frames_num_fixed: 30.
  • Transform: BaseSilCuttingTransform for silhouette preprocessing.

Optimizer and Scheduler

  • Optimizer: SGD
    • lr: 0.1
    • momentum: 0.9
    • weight_decay: 0.0005
  • Scheduler: MultiStepLR
    • milestones: [10000, 14000, 18000]
    • gamma: 0.1
  • Total Iterations: 20,000.

Logging

  • TensorBoard: OpenGait natively supports TensorBoard logging. Training losses (triplet, softmax) and accuracies are logged every log_iter: 100.
  • WandB: There is no native Weights & Biases (WandB) integration in the current codebase. Users wishing to use WandB must manually integrate it into opengait/utils/msg_manager.py or opengait/main.py.
  • Evaluation: Metrics (Accuracy, Precision, Recall, F1) are computed by evaluate_scoliosis in opengait/evaluation/evaluator.py and logged to the console/file.

Evidence References

  • Model Implementation: opengait/modeling/models/sconet.py
  • Training Config: configs/sconet/sconet_scoliosis1k.yaml
  • Evaluation Logic: opengait/evaluation/evaluator.py::evaluate_scoliosis
  • Backbone Definition: opengait/modeling/backbones/resnet.py::ResNet9