feat: archive best scoliosis checkpoints

2026-03-11 10:23:38 +08:00
parent a0150c791f
commit fbc0696dc4
10 changed files with 489 additions and 4 deletions
@@ -71,6 +71,7 @@ The main findings so far are:
  - `body-only + weighted CE` reached `81.82%` accuracy and `65.96%` macro-F1 on the full test set at `7000`
  - `body-only + plain CE` improved that to `83.16%` accuracy and `68.47%` macro-F1 at `7000`
  - a later full-test rerun confirmed the `body-only + plain CE` `7000` result exactly
+  - an `AdamW` cosine finetune from that same plain-CE checkpoint improved the practical best further; the retained `27000` checkpoint reproduced at `92.38%` accuracy and `88.70%` macro-F1 on the full test set
  - a `head-lite + plain CE` variant looked promising on the fixed proxy subset but underperformed on the full test set at `7000` (`78.07%` accuracy, `62.08%` macro-F1)

 The current working conclusion is:
@@ -79,7 +80,7 @@ The current working conclusion is:
 - the strong silhouette checkpoint is not evidence that the skeleton-map path works
 - the main remaining suspect is the skeleton-map representation and preprocessing path
 - for practical model development, `1:1:2` is currently the better working split than `1:1:8`
- for practical model development, the current best skeleton recipe is still `body-only + plain CE + SGD` on `1:1:2`
+- for practical model development, the current best skeleton recipe is `body-only + plain CE`, and the current best retained checkpoint comes from a later `AdamW` cosine finetune on `1:1:2`
 - the first practical DRF bridge on that same winning `1:1:2` recipe did not improve on the plain skeleton baseline:
  - best retained DRF checkpoint (`2000`) on the full test set: `80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1`
  - current best plain skeleton checkpoint (`7000`) on the full test set: `83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1`
@@ -119,3 +120,35 @@ The skeleton-map control used the same recipe, except for the modality-specific
 1. Train a pure silhouette `1:1:8` baseline from the upstream ScoNet config as a clean sanity control.
 2. Treat skeleton-map preprocessing as the primary debugging target until a `ScoNet-MT-ske`-style run gets close to the paper.
 3. Only after the skeleton baseline is credible should DRF/PAV-specific conclusions be treated as decisive.
+
+## Practical conclusion
+
+For practical use in this repo, the current winning path is:
+
+- split: `1:1:2`
+- representation: `body-only` skeleton map
+- losses: plain CE + triplet
+- baseline training: `SGD`
+- best retained finetune: `AdamW` + cosine decay
+
+The strongest verified checkpoint so far is:
+
+- `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k`
+- retained best checkpoint at `27000`
+- verified full-test result: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
+
+So the current local recommendation is:
+
+- keep `body-only` as the default practical skeleton representation
+- keep `1:1:2` as the main practical split
+- treat DRF as an optional research branch, not the mainline model
+
+## Remaining useful experiments
+
+At this point, there are only a few experiments that still look high-value:
+
+1. one clean `full-body` finetune under the same successful `1:1:2` recipe, just to confirm that `body-only` is really the best practical representation
+2. one DRF rerun on top of the now-stronger practical baseline recipe, only if the goal is to test whether DRF can add value once the skeleton branch is already strong
+3. a final packaging/evaluation pass around the retained best checkpoints, rather than more broad preprocessing churn
+
+Everything else looks lower value than simply using the retained best `27000` checkpoint.