Files

T

crosstyan fbc0696dc4 feat: archive best scoliosis checkpoints

2026-03-11 10:23:38 +08:00

8.6 KiB

Raw Blame History

ScoNet and DRF: Status, Architecture, and Reproduction Notes

This note records the current Scoliosis1K implementation status in this repo and the main conclusions from the recent reproduction/debugging work.

For a stricter paper-vs-local reproducibility breakdown, see scoliosis_reproducibility_audit.md. For the recommended long-running local launch workflow, see systemd-run-training.md.

Current status

opengait/modeling/models/sconet.py is still the standard Scoliosis1K baseline in this repo.
The class is named ScoNet, but functionally it is the paper's multi-task variant because training uses both CrossEntropyLoss and TripletLoss.
opengait/modeling/models/drf.py is now implemented as a standalone DRF model in this repo.
Logging supports TensorBoard and optional Weights & Biases through opengait/utils/msg_manager.py.

Naming clarification

The name ScoNet is overloaded across the paper, config files, and checkpoints. Use the mapping below when reading this repo:

Local name	What it means here	Closest paper name
`ScoNet` model class	`opengait/modeling/models/sconet.py` with both CE and triplet losses	`ScoNet-MT`
`configs/sconet/sconet_scoliosis1k.yaml`	standard Scoliosis1K silhouette training recipe in this repo	`ScoNet-MT` training recipe
`ScoNet-*.pt` checkpoint filenames	local checkpoint naming inherited from the repo/config	usually `ScoNet-MT` if trained with the default config
`ScoNet-MT-ske` in these docs	same ScoNet code path, but fed 2-channel skeleton maps	paper notation `ScoNet-MT^{ske}`
`DRF`	`ScoNet-MT-ske` plus PGA/PAV guidance	`DRF`

So:

paper ScoNet means the single-task CE-only model
repo ScoNet usually means the multi-task variant unless someone explicitly removes triplet loss
a checkpoint named ScoNet-...pt is not enough to tell the modality by itself; check input channels and dataset root

Important modality note

The strongest local ScoNet checkpoint we checked, ckpt/ScoNet-20000-better.pt, is a silhouette checkpoint, not a skeleton-map checkpoint.

Evidence:

its first convolution weight has shape (64, 1, 3, 3), so it expects 1-channel input
the matching eval config points to Scoliosis1K-sil-pkl
the skeleton-map configs in this repo use in_channel: 2

This matters because a good result from ScoNet-20000-better.pt only validates the silhouette path. It does not validate the heatmap/skeleton-map preprocessing used by DRF or by a ScoNet-MT-ske-style control.

What was checked against `f754f6f3831e9f83bb28f4e2f63dd43d8bcf9dc4`

The upstream ScoNet training recipe itself is effectively unchanged:

configs/sconet/sconet_scoliosis1k.yaml is unchanged
opengait/modeling/models/sconet.py is unchanged
opengait/main.py, opengait/modeling/base_model.py, opengait/data/dataset.py, opengait/data/collate_fn.py, and opengait/evaluation/evaluator.py only differ in import cleanup and logging hooks

So the current failure is not explained by a changed optimizer, scheduler, sampler, train loop, or evaluator.

For the skeleton-map control, the only required functional drift from the upstream ScoNet config was:

use a heatmap dataset root instead of Scoliosis1K-sil-pkl
switch the partition to Scoliosis1K_118.json
set model_cfg.backbone_cfg.in_channel: 2
reduce test batch_size to match the local 2-GPU DDP evaluator constraint

Local reproduction findings

The main findings so far are:

ScoNet-20000-better.pt on the 1:1:2 silhouette split reproduced cleanly at 95.05% accuracy and 85.12% macro-F1.
The 1:1:8 skeleton-map control trained with healthy optimization metrics but evaluated very poorly.
A recent ScoNet-MT-ske-style control on Scoliosis1K_sigma_8.0/pkl finished with 36.45% accuracy and 32.78% macro-F1.
That result is far below the paper's 1:1:8 ScoNet-MT range and far below the silhouette baseline behavior.
On the easier 1:1:2 split, the skeleton branch is clearly learnable:
- body-only + weighted CE reached 81.82% accuracy and 65.96% macro-F1 on the full test set at 7000
- body-only + plain CE improved that to 83.16% accuracy and 68.47% macro-F1 at 7000
- a later full-test rerun confirmed the body-only + plain CE 7000 result exactly
- an AdamW cosine finetune from that same plain-CE checkpoint improved the practical best further; the retained 27000 checkpoint reproduced at 92.38% accuracy and 88.70% macro-F1 on the full test set
- a head-lite + plain CE variant looked promising on the fixed proxy subset but underperformed on the full test set at 7000 (78.07% accuracy, 62.08% macro-F1)

The current working conclusion is:

the core ScoNet trainer is not the problem
the strong silhouette checkpoint is not evidence that the skeleton-map path works
the main remaining suspect is the skeleton-map representation and preprocessing path
for practical model development, 1:1:2 is currently the better working split than 1:1:8
for practical model development, the current best skeleton recipe is body-only + plain CE, and the current best retained checkpoint comes from a later AdamW cosine finetune on 1:1:2
the first practical DRF bridge on that same winning 1:1:2 recipe did not improve on the plain skeleton baseline:
- best retained DRF checkpoint (2000) on the full test set: 80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1
- current best plain skeleton checkpoint (7000) on the full test set: 83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1

For readability in this repo's docs, ScoNet-MT-ske refers to the skeleton-map variant that the DRF paper writes as ScoNet-MT^{ske}.

Architecture mapping

ScoNet in this repo maps to the paper as follows:

Paper Component	Code Reference	Description
Backbone	`ResNet9` in `opengait/modeling/backbones/resnet.py`	Four residual stages with channels `[64, 128, 256, 512]`.
Temporal aggregation	`PackSequenceWrapper(torch.max)`	Temporal max pooling over frames.
Spatial pooling	`HorizontalPoolingPyramid`	16-bin horizontal partition.
Feature mapping	`SeparateFCs`	Maps pooled features into the embedding space.
Classification head	`SeparateBNNecks`	Produces screening logits.
Losses	`TripletLoss` + `CrossEntropyLoss`	This is why the repo implementation is functionally ScoNet-MT.

Training path summary

The standard Scoliosis1K ScoNet recipe is:

sampler: TripletSampler
train batch layout: 8 x 8
train sample type: fixed_unordered
train frames: 30
transform: BaseSilCuttingTransform
optimizer: SGD(lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler: MultiStepLR with milestones [10000, 14000, 18000]
total iterations: 20000

The skeleton-map control used the same recipe, except for the modality-specific changes listed above.

Recommended next checks

Train a pure silhouette 1:1:8 baseline from the upstream ScoNet config as a clean sanity control.
Treat skeleton-map preprocessing as the primary debugging target until a ScoNet-MT-ske-style run gets close to the paper.
Only after the skeleton baseline is credible should DRF/PAV-specific conclusions be treated as decisive.

Practical conclusion

For practical use in this repo, the current winning path is:

split: 1:1:2
representation: body-only skeleton map
losses: plain CE + triplet
baseline training: SGD
best retained finetune: AdamW + cosine decay

The strongest verified checkpoint so far is:

ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k
retained best checkpoint at 27000
verified full-test result: 92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1

So the current local recommendation is:

keep body-only as the default practical skeleton representation
keep 1:1:2 as the main practical split
treat DRF as an optional research branch, not the mainline model

Remaining useful experiments

At this point, there are only a few experiments that still look high-value:

one clean full-body finetune under the same successful 1:1:2 recipe, just to confirm that body-only is really the best practical representation
one DRF rerun on top of the now-stronger practical baseline recipe, only if the goal is to test whether DRF can add value once the skeleton branch is already strong
a final packaging/evaluation pass around the retained best checkpoints, rather than more broad preprocessing churn

Everything else looks lower value than simply using the retained best 27000 checkpoint.

8.6 KiB Raw Blame History