feat: archive best scoliosis checkpoints
This commit is contained in:
@@ -163,6 +163,9 @@ Conclusion:
|
||||
- on `1:1:2`, `body-only + weighted CE` reached `81.82 Acc / 66.21 Prec / 88.50 Rec / 65.96 F1` on the full test set
|
||||
- on the same split, `body-only + plain CE` improved that further to `83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1` at `7000`
|
||||
- a later explicit rerun of the `body-only + plain CE` `7000` full-test eval reproduced that same `83.16 / 68.24 / 80.02 / 68.47` result
|
||||
- a later `AdamW` cosine finetune from that same `10k` plain-CE checkpoint improved the practical result further:
|
||||
- verified retained best checkpoint at `27000`: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
|
||||
- final `80000` checkpoint still remained strong: `90.64 Acc / 72.87 Prec / 93.19 Rec / 75.74 F1`
|
||||
- adding back limited head context via `head-lite` did not improve the full-test score; its `7000` checkpoint reached only `78.07 Acc / 65.42 Prec / 80.50 Rec / 62.08 F1`
|
||||
- the first practical DRF bridge on the same `1:1:2` body-only recipe peaked early and still underperformed the plain skeleton baseline; its best retained `2000` checkpoint reached only `80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1` on the full test set
|
||||
|
||||
@@ -180,6 +183,7 @@ Conclusion:
|
||||
- the `1:1:8` class ratio is not just a nuisance; it appears to be a major driver of the current skeleton/DRF failure mode
|
||||
- on the easier `1:1:2` split, weighted CE is not currently the winning recipe; the best local full-test result so far came from plain CE
|
||||
- `head-lite` may help the small fixed proxy subset, but that gain did not transfer to the full `TEST_SET`, so `body-only + plain CE` remains the best practical skeleton recipe
|
||||
- once the practical `1:1:2` body-only plain-CE recipe was established, the branch still appeared underfit enough that optimizer/schedule mattered again. A later `AdamW` cosine finetune beat the earlier SGD bridge by a large margin at its retained best checkpoint, which means the earlier `83.16 / 68.47` result was a stable baseline but not the ceiling of this skeleton recipe
|
||||
- DRF currently looks worse than the plain skeleton baseline not because the skeleton path is dead, but because the additional prior branch is not yet providing a selective or stable complement. The current local evidence points to three likely causes:
|
||||
- the body-only skeleton baseline already captures most of the useful torso signal on `1:1:2`, so PAV may be largely redundant in this setting
|
||||
- the current PGA/PAV path appears weakly selective in local diagnostics, so the prior is not clearly emphasizing a few clinically relevant parts
|
||||
@@ -210,3 +214,17 @@ At the moment, this repo does not yet support:
|
||||
- claiming a successful independent reproduction of the DRF paper’s quantitative results
|
||||
- claiming that the paper’s skeleton-map preprocessing is fully specified
|
||||
- treating the paper’s qualitative feature-response visualizations as reproduced
|
||||
|
||||
For practical model selection, the current conclusion is simpler:
|
||||
|
||||
- stop treating DRF as the default winner
|
||||
- keep the practical mainline on `1:1:2`
|
||||
- use the retained `body-only + plain CE` skeleton checkpoint family as the working solution
|
||||
- the strongest verified practical checkpoint is the later `AdamW` cosine finetune checkpoint at `27000`, with:
|
||||
- `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
|
||||
|
||||
That means the remaining work is no longer broad reproduction debugging. It is mostly optional refinement:
|
||||
|
||||
- confirm whether `body-only` really beats `full-body` under the same successful training recipe
|
||||
- optionally retry DRF only after the strong practical skeleton baseline is fixed
|
||||
- package and use the retained best checkpoint rather than continuing to churn the whole search space
|
||||
|
||||
Reference in New Issue
Block a user