Files
OpenGait/docs/sconet-drf-status-and-training.md
T

8.6 KiB

ScoNet and DRF: Status, Architecture, and Reproduction Notes

This note records the current Scoliosis1K implementation status in this repo and the main conclusions from the recent reproduction/debugging work.

For a stricter paper-vs-local reproducibility breakdown, see scoliosis_reproducibility_audit.md. For the recommended long-running local launch workflow, see systemd-run-training.md.

Current status

  • opengait/modeling/models/sconet.py is still the standard Scoliosis1K baseline in this repo.
  • The class is named ScoNet, but functionally it is the paper's multi-task variant because training uses both CrossEntropyLoss and TripletLoss.
  • opengait/modeling/models/drf.py is now implemented as a standalone DRF model in this repo.
  • Logging supports TensorBoard and optional Weights & Biases through opengait/utils/msg_manager.py.

Naming clarification

The name ScoNet is overloaded across the paper, config files, and checkpoints. Use the mapping below when reading this repo:

Local name What it means here Closest paper name
ScoNet model class opengait/modeling/models/sconet.py with both CE and triplet losses ScoNet-MT
configs/sconet/sconet_scoliosis1k.yaml standard Scoliosis1K silhouette training recipe in this repo ScoNet-MT training recipe
ScoNet-*.pt checkpoint filenames local checkpoint naming inherited from the repo/config usually ScoNet-MT if trained with the default config
ScoNet-MT-ske in these docs same ScoNet code path, but fed 2-channel skeleton maps paper notation ScoNet-MT^{ske}
DRF ScoNet-MT-ske plus PGA/PAV guidance DRF

So:

  • paper ScoNet means the single-task CE-only model
  • repo ScoNet usually means the multi-task variant unless someone explicitly removes triplet loss
  • a checkpoint named ScoNet-...pt is not enough to tell the modality by itself; check input channels and dataset root

Important modality note

The strongest local ScoNet checkpoint we checked, ckpt/ScoNet-20000-better.pt, is a silhouette checkpoint, not a skeleton-map checkpoint.

Evidence:

  • its first convolution weight has shape (64, 1, 3, 3), so it expects 1-channel input
  • the matching eval config points to Scoliosis1K-sil-pkl
  • the skeleton-map configs in this repo use in_channel: 2

This matters because a good result from ScoNet-20000-better.pt only validates the silhouette path. It does not validate the heatmap/skeleton-map preprocessing used by DRF or by a ScoNet-MT-ske-style control.

What was checked against f754f6f3831e9f83bb28f4e2f63dd43d8bcf9dc4

The upstream ScoNet training recipe itself is effectively unchanged:

  • configs/sconet/sconet_scoliosis1k.yaml is unchanged
  • opengait/modeling/models/sconet.py is unchanged
  • opengait/main.py, opengait/modeling/base_model.py, opengait/data/dataset.py, opengait/data/collate_fn.py, and opengait/evaluation/evaluator.py only differ in import cleanup and logging hooks

So the current failure is not explained by a changed optimizer, scheduler, sampler, train loop, or evaluator.

For the skeleton-map control, the only required functional drift from the upstream ScoNet config was:

  • use a heatmap dataset root instead of Scoliosis1K-sil-pkl
  • switch the partition to Scoliosis1K_118.json
  • set model_cfg.backbone_cfg.in_channel: 2
  • reduce test batch_size to match the local 2-GPU DDP evaluator constraint

Local reproduction findings

The main findings so far are:

  • ScoNet-20000-better.pt on the 1:1:2 silhouette split reproduced cleanly at 95.05% accuracy and 85.12% macro-F1.
  • The 1:1:8 skeleton-map control trained with healthy optimization metrics but evaluated very poorly.
  • A recent ScoNet-MT-ske-style control on Scoliosis1K_sigma_8.0/pkl finished with 36.45% accuracy and 32.78% macro-F1.
  • That result is far below the paper's 1:1:8 ScoNet-MT range and far below the silhouette baseline behavior.
  • On the easier 1:1:2 split, the skeleton branch is clearly learnable:
    • body-only + weighted CE reached 81.82% accuracy and 65.96% macro-F1 on the full test set at 7000
    • body-only + plain CE improved that to 83.16% accuracy and 68.47% macro-F1 at 7000
    • a later full-test rerun confirmed the body-only + plain CE 7000 result exactly
    • an AdamW cosine finetune from that same plain-CE checkpoint improved the practical best further; the retained 27000 checkpoint reproduced at 92.38% accuracy and 88.70% macro-F1 on the full test set
    • a head-lite + plain CE variant looked promising on the fixed proxy subset but underperformed on the full test set at 7000 (78.07% accuracy, 62.08% macro-F1)

The current working conclusion is:

  • the core ScoNet trainer is not the problem
  • the strong silhouette checkpoint is not evidence that the skeleton-map path works
  • the main remaining suspect is the skeleton-map representation and preprocessing path
  • for practical model development, 1:1:2 is currently the better working split than 1:1:8
  • for practical model development, the current best skeleton recipe is body-only + plain CE, and the current best retained checkpoint comes from a later AdamW cosine finetune on 1:1:2
  • the first practical DRF bridge on that same winning 1:1:2 recipe did not improve on the plain skeleton baseline:
    • best retained DRF checkpoint (2000) on the full test set: 80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1
    • current best plain skeleton checkpoint (7000) on the full test set: 83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1

For readability in this repo's docs, ScoNet-MT-ske refers to the skeleton-map variant that the DRF paper writes as ScoNet-MT^{ske}.

Architecture mapping

ScoNet in this repo maps to the paper as follows:

Paper Component Code Reference Description
Backbone ResNet9 in opengait/modeling/backbones/resnet.py Four residual stages with channels [64, 128, 256, 512].
Temporal aggregation PackSequenceWrapper(torch.max) Temporal max pooling over frames.
Spatial pooling HorizontalPoolingPyramid 16-bin horizontal partition.
Feature mapping SeparateFCs Maps pooled features into the embedding space.
Classification head SeparateBNNecks Produces screening logits.
Losses TripletLoss + CrossEntropyLoss This is why the repo implementation is functionally ScoNet-MT.

Training path summary

The standard Scoliosis1K ScoNet recipe is:

  • sampler: TripletSampler
  • train batch layout: 8 x 8
  • train sample type: fixed_unordered
  • train frames: 30
  • transform: BaseSilCuttingTransform
  • optimizer: SGD(lr=0.1, momentum=0.9, weight_decay=5e-4)
  • scheduler: MultiStepLR with milestones [10000, 14000, 18000]
  • total iterations: 20000

The skeleton-map control used the same recipe, except for the modality-specific changes listed above.

  1. Train a pure silhouette 1:1:8 baseline from the upstream ScoNet config as a clean sanity control.
  2. Treat skeleton-map preprocessing as the primary debugging target until a ScoNet-MT-ske-style run gets close to the paper.
  3. Only after the skeleton baseline is credible should DRF/PAV-specific conclusions be treated as decisive.

Practical conclusion

For practical use in this repo, the current winning path is:

  • split: 1:1:2
  • representation: body-only skeleton map
  • losses: plain CE + triplet
  • baseline training: SGD
  • best retained finetune: AdamW + cosine decay

The strongest verified checkpoint so far is:

  • ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k
  • retained best checkpoint at 27000
  • verified full-test result: 92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1

So the current local recommendation is:

  • keep body-only as the default practical skeleton representation
  • keep 1:1:2 as the main practical split
  • treat DRF as an optional research branch, not the mainline model

Remaining useful experiments

At this point, there are only a few experiments that still look high-value:

  1. one clean full-body finetune under the same successful 1:1:2 recipe, just to confirm that body-only is really the best practical representation
  2. one DRF rerun on top of the now-stronger practical baseline recipe, only if the goal is to test whether DRF can add value once the skeleton branch is already strong
  3. a final packaging/evaluation pass around the retained best checkpoints, rather than more broad preprocessing churn

Everything else looks lower value than simply using the retained best 27000 checkpoint.