docs: add scoliosis next-experiment plan
This commit is contained in:
@@ -0,0 +1,202 @@
|
||||
# Scoliosis: Next Experiments
|
||||
|
||||
This note is the short operational plan for the next scoliosis experiments.
|
||||
It is written to be runnable by someone who did not follow the full debugging history.
|
||||
|
||||
Related notes:
|
||||
- [ScoNet and DRF: Status, Architecture, and Reproduction Notes](sconet-drf-status-and-training.md)
|
||||
- [Scoliosis Training Change Log](scoliosis_training_change_log.md)
|
||||
- [Scoliosis Reproducibility Audit](scoliosis_reproducibility_audit.md)
|
||||
- [systemd-run Training](systemd-run-training.md)
|
||||
|
||||
## Current Best Known Result
|
||||
|
||||
Current practical winner:
|
||||
- model family: `ScoNet-MT-ske`
|
||||
- split: `1:1:2` (`Scoliosis1K_112.json`)
|
||||
- representation: `body-only`
|
||||
- loss: plain CE + triplet
|
||||
- optimizer path: later `AdamW` cosine finetune
|
||||
|
||||
Best verified checkpoints:
|
||||
- best macro-F1:
|
||||
- checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-27000-score-0.8870-scalar_test_f1.pt`
|
||||
- full test: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
|
||||
- best accuracy:
|
||||
- checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-64000-score-0.9425-scalar_test_accuracy.pt`
|
||||
- full test: `94.25 Acc / 83.24 Prec / 95.76 Rec / 87.63 F1`
|
||||
|
||||
## Branches Already Tried
|
||||
|
||||
These are the main branches already explored, so the next person does not rerun the same ideas blindly.
|
||||
|
||||
### Representation branches
|
||||
|
||||
- `full-body`
|
||||
- original all-joints skeleton-map path
|
||||
- useful as a reference, but the practical winner did not come from this branch
|
||||
- `body-only`
|
||||
- removed face/head keypoints and head limbs
|
||||
- this became the strongest practical representation
|
||||
- current best checkpoint family comes from this branch
|
||||
- `head-lite`
|
||||
- added back limited head information, mainly `nose` and shoulder-linked head context
|
||||
- looked promising on the small fixed proxy subset
|
||||
- lost on the full test set, so it is not the current winner
|
||||
- `geom-fix`
|
||||
- aspect-ratio-preserving geometry/padding experiment instead of the older square-warp behavior
|
||||
- useful for debugging, but not the branch that produced the best practical result
|
||||
- `no-cut`
|
||||
- removed the silhouette-style width cut at runtime
|
||||
- not the branch that ultimately won
|
||||
|
||||
### Loss / optimization branches
|
||||
|
||||
- `weighted CE`
|
||||
- helped on harder imbalance settings and some short proxies
|
||||
- did not beat `plain CE` on the practical `1:1:2` path
|
||||
- `plain CE`
|
||||
- better than weighted CE on the working `1:1:2` setup
|
||||
- remains the default practical choice
|
||||
- `SGD bridge`
|
||||
- gave the first strong practical skeleton baseline
|
||||
- `AdamW` multistep finetune
|
||||
- unstable and often worse
|
||||
- `AdamW` cosine finetune
|
||||
- this is the branch that finally produced the retained best checkpoints
|
||||
|
||||
### Model branches
|
||||
|
||||
- `ScoNet-MT-ske`
|
||||
- practical winner so far
|
||||
- `DRF`
|
||||
- implemented and tested
|
||||
- underperformed the plain skeleton baseline in the first serious practical run
|
||||
- still worth one final warm-start retry, but not the default winner
|
||||
|
||||
## What Not To Do First
|
||||
|
||||
Do not start here:
|
||||
- do not treat `DRF` as the mainline model yet
|
||||
- do not go back to `1:1:8` first unless the goal is specifically hard-imbalance study
|
||||
- do not spend time on more visualization work first
|
||||
- do not keep changing preprocessing and optimizer at the same time
|
||||
|
||||
Reason:
|
||||
- the current plain skeleton baseline is already strong
|
||||
- the first practical DRF run underperformed that baseline badly
|
||||
- broad exploratory tuning is lower-value than a few controlled confirmation experiments
|
||||
|
||||
## Priority Order
|
||||
|
||||
Run experiments in this order.
|
||||
|
||||
### 1. Full-body control under the winning recipe
|
||||
|
||||
Goal:
|
||||
- verify that `body-only` is truly the best practical representation
|
||||
|
||||
Keep fixed:
|
||||
- split: `1:1:2`
|
||||
- optimizer path: same successful `AdamW` cosine finetune style
|
||||
- loss: plain CE + triplet
|
||||
- scheduler style: cosine, not the old hard multistep decay
|
||||
|
||||
Change only:
|
||||
- representation: `full-body` instead of `body-only`
|
||||
|
||||
Success criterion:
|
||||
- only keep `full-body` if it beats the current best `body-only` checkpoint on full-test macro-F1
|
||||
|
||||
If `full-body` loses:
|
||||
- stop
|
||||
- keep `body-only` as the default practical representation
|
||||
|
||||
### 2. One serious DRF retry, but only on top of the strong baseline recipe
|
||||
|
||||
Goal:
|
||||
- test whether DRF can add value once the skeleton baseline is already strong
|
||||
|
||||
Recommended setup:
|
||||
- split: `1:1:2`
|
||||
- representation: start from the same `body-only` skeleton maps
|
||||
- initialization: warm-start from the strong skeleton model, not from random init
|
||||
- optimizer: small-LR `AdamW`
|
||||
- scheduler: cosine
|
||||
|
||||
Recommended strategy:
|
||||
1. restore the strong skeleton checkpoint weights
|
||||
2. initialize DRF from that visual branch
|
||||
3. use a smaller LR than the plain baseline finetune
|
||||
4. if needed, freeze or partially freeze the backbone for a short warmup so the PAV/PGA branch learns without immediately disturbing the already-good visual branch
|
||||
|
||||
Why:
|
||||
- the earlier DRF bridge peaked early and then degraded
|
||||
- local evidence suggests the prior branch is currently weak or noisy
|
||||
- DRF deserves one fair test from a strong starting point, not another scratch run
|
||||
|
||||
Success criterion:
|
||||
- DRF must beat the current best plain skeleton checkpoint on full-test macro-F1
|
||||
- if it does not, stop treating DRF as the practical winner
|
||||
|
||||
### 3. Only after that, optional optimizer confirmation
|
||||
|
||||
Goal:
|
||||
- confirm that the `AdamW` cosine win was not just a lucky branch
|
||||
|
||||
Options:
|
||||
- rerun the winning `body-only` recipe once more with the same finetune schedule
|
||||
- or compare one cleaner `SGD` continuation versus `AdamW` cosine finetune from the same checkpoint
|
||||
|
||||
This is lower priority because:
|
||||
- we already have a verified strong artifact checkpoint
|
||||
- the practical problem is solved well enough for use
|
||||
|
||||
## Recommended Commands Pattern
|
||||
|
||||
For long detached runs, use [systemd-run Training](systemd-run-training.md).
|
||||
|
||||
Use:
|
||||
- `output_root: /mnt/hddl/data/OpenGait-output`
|
||||
- `save_iter: 500`
|
||||
- `eval_iter: 1000`
|
||||
- `best_ckpt_cfg.keep_n: 3`
|
||||
- metrics:
|
||||
- `scalar/test_f1/`
|
||||
- `scalar/test_accuracy/`
|
||||
|
||||
This avoids losing a strong checkpoint between evals and keeps large checkpoints off the SSD.
|
||||
|
||||
## Decision Rules
|
||||
|
||||
Use these rules so the experiment search does not drift again.
|
||||
|
||||
Promote a new run only if:
|
||||
- it improves full-test macro-F1 over the current best retained checkpoint
|
||||
- and the gain survives a standalone eval rerun from the saved checkpoint
|
||||
|
||||
Stop a branch if:
|
||||
- it is clearly below the current best baseline
|
||||
- and its later checkpoints are not trending upward
|
||||
|
||||
Do not declare a winner from:
|
||||
- a proxy subset alone
|
||||
- TensorBoard scalars alone
|
||||
- an unsaved checkpoint implied only by logs
|
||||
|
||||
Always verify the claimed winner by:
|
||||
1. standalone eval from the checkpoint file
|
||||
2. writing the result into the changelog and audit
|
||||
3. copying the final selected checkpoint into `artifact/` if it becomes the new best
|
||||
|
||||
## Short Recommendation
|
||||
|
||||
If only one more experiment is going to be run, run this:
|
||||
|
||||
- `full-body` under the same successful `1:1:2 + plain CE + AdamW cosine` recipe
|
||||
|
||||
If two more experiments are going to be run, run:
|
||||
1. `full-body` control
|
||||
2. DRF warm-start finetune from the strong `body-only` skeleton checkpoint
|
||||
|
||||
That is the highest-value next work from the current state.
|
||||
Reference in New Issue
Block a user