feat: archive best scoliosis checkpoints

This commit is contained in:
2026-03-11 10:23:38 +08:00
parent a0150c791f
commit fbc0696dc4
10 changed files with 489 additions and 4 deletions
+18
View File
@@ -163,6 +163,9 @@ Conclusion:
- on `1:1:2`, `body-only + weighted CE` reached `81.82 Acc / 66.21 Prec / 88.50 Rec / 65.96 F1` on the full test set
- on the same split, `body-only + plain CE` improved that further to `83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1` at `7000`
- a later explicit rerun of the `body-only + plain CE` `7000` full-test eval reproduced that same `83.16 / 68.24 / 80.02 / 68.47` result
- a later `AdamW` cosine finetune from that same `10k` plain-CE checkpoint improved the practical result further:
- verified retained best checkpoint at `27000`: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
- final `80000` checkpoint still remained strong: `90.64 Acc / 72.87 Prec / 93.19 Rec / 75.74 F1`
- adding back limited head context via `head-lite` did not improve the full-test score; its `7000` checkpoint reached only `78.07 Acc / 65.42 Prec / 80.50 Rec / 62.08 F1`
- the first practical DRF bridge on the same `1:1:2` body-only recipe peaked early and still underperformed the plain skeleton baseline; its best retained `2000` checkpoint reached only `80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1` on the full test set
@@ -180,6 +183,7 @@ Conclusion:
- the `1:1:8` class ratio is not just a nuisance; it appears to be a major driver of the current skeleton/DRF failure mode
- on the easier `1:1:2` split, weighted CE is not currently the winning recipe; the best local full-test result so far came from plain CE
- `head-lite` may help the small fixed proxy subset, but that gain did not transfer to the full `TEST_SET`, so `body-only + plain CE` remains the best practical skeleton recipe
- once the practical `1:1:2` body-only plain-CE recipe was established, the branch still appeared underfit enough that optimizer/schedule mattered again. A later `AdamW` cosine finetune beat the earlier SGD bridge by a large margin at its retained best checkpoint, which means the earlier `83.16 / 68.47` result was a stable baseline but not the ceiling of this skeleton recipe
- DRF currently looks worse than the plain skeleton baseline not because the skeleton path is dead, but because the additional prior branch is not yet providing a selective or stable complement. The current local evidence points to three likely causes:
- the body-only skeleton baseline already captures most of the useful torso signal on `1:1:2`, so PAV may be largely redundant in this setting
- the current PGA/PAV path appears weakly selective in local diagnostics, so the prior is not clearly emphasizing a few clinically relevant parts
@@ -210,3 +214,17 @@ At the moment, this repo does not yet support:
- claiming a successful independent reproduction of the DRF papers quantitative results
- claiming that the papers skeleton-map preprocessing is fully specified
- treating the papers qualitative feature-response visualizations as reproduced
For practical model selection, the current conclusion is simpler:
- stop treating DRF as the default winner
- keep the practical mainline on `1:1:2`
- use the retained `body-only + plain CE` skeleton checkpoint family as the working solution
- the strongest verified practical checkpoint is the later `AdamW` cosine finetune checkpoint at `27000`, with:
- `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
That means the remaining work is no longer broad reproduction debugging. It is mostly optional refinement:
- confirm whether `body-only` really beats `full-body` under the same successful training recipe
- optionally retry DRF only after the strong practical skeleton baseline is fixed
- package and use the retained best checkpoint rather than continuing to churn the whole search space