diff --git a/docs/scoliosis_next_experiments.md b/docs/scoliosis_next_experiments.md new file mode 100644 index 0000000..4411ae4 --- /dev/null +++ b/docs/scoliosis_next_experiments.md @@ -0,0 +1,202 @@ +# Scoliosis: Next Experiments + +This note is the short operational plan for the next scoliosis experiments. +It is written to be runnable by someone who did not follow the full debugging history. + +Related notes: +- [ScoNet and DRF: Status, Architecture, and Reproduction Notes](sconet-drf-status-and-training.md) +- [Scoliosis Training Change Log](scoliosis_training_change_log.md) +- [Scoliosis Reproducibility Audit](scoliosis_reproducibility_audit.md) +- [systemd-run Training](systemd-run-training.md) + +## Current Best Known Result + +Current practical winner: +- model family: `ScoNet-MT-ske` +- split: `1:1:2` (`Scoliosis1K_112.json`) +- representation: `body-only` +- loss: plain CE + triplet +- optimizer path: later `AdamW` cosine finetune + +Best verified checkpoints: +- best macro-F1: + - checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-27000-score-0.8870-scalar_test_f1.pt` + - full test: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1` +- best accuracy: + - checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-64000-score-0.9425-scalar_test_accuracy.pt` + - full test: `94.25 Acc / 83.24 Prec / 95.76 Rec / 87.63 F1` + +## Branches Already Tried + +These are the main branches already explored, so the next person does not rerun the same ideas blindly. + +### Representation branches + +- `full-body` + - original all-joints skeleton-map path + - useful as a reference, but the practical winner did not come from this branch +- `body-only` + - removed face/head keypoints and head limbs + - this became the strongest practical representation + - current best checkpoint family comes from this branch +- `head-lite` + - added back limited head information, mainly `nose` and shoulder-linked head context + - looked promising on the small fixed proxy subset + - lost on the full test set, so it is not the current winner +- `geom-fix` + - aspect-ratio-preserving geometry/padding experiment instead of the older square-warp behavior + - useful for debugging, but not the branch that produced the best practical result +- `no-cut` + - removed the silhouette-style width cut at runtime + - not the branch that ultimately won + +### Loss / optimization branches + +- `weighted CE` + - helped on harder imbalance settings and some short proxies + - did not beat `plain CE` on the practical `1:1:2` path +- `plain CE` + - better than weighted CE on the working `1:1:2` setup + - remains the default practical choice +- `SGD bridge` + - gave the first strong practical skeleton baseline +- `AdamW` multistep finetune + - unstable and often worse +- `AdamW` cosine finetune + - this is the branch that finally produced the retained best checkpoints + +### Model branches + +- `ScoNet-MT-ske` + - practical winner so far +- `DRF` + - implemented and tested + - underperformed the plain skeleton baseline in the first serious practical run + - still worth one final warm-start retry, but not the default winner + +## What Not To Do First + +Do not start here: +- do not treat `DRF` as the mainline model yet +- do not go back to `1:1:8` first unless the goal is specifically hard-imbalance study +- do not spend time on more visualization work first +- do not keep changing preprocessing and optimizer at the same time + +Reason: +- the current plain skeleton baseline is already strong +- the first practical DRF run underperformed that baseline badly +- broad exploratory tuning is lower-value than a few controlled confirmation experiments + +## Priority Order + +Run experiments in this order. + +### 1. Full-body control under the winning recipe + +Goal: +- verify that `body-only` is truly the best practical representation + +Keep fixed: +- split: `1:1:2` +- optimizer path: same successful `AdamW` cosine finetune style +- loss: plain CE + triplet +- scheduler style: cosine, not the old hard multistep decay + +Change only: +- representation: `full-body` instead of `body-only` + +Success criterion: +- only keep `full-body` if it beats the current best `body-only` checkpoint on full-test macro-F1 + +If `full-body` loses: +- stop +- keep `body-only` as the default practical representation + +### 2. One serious DRF retry, but only on top of the strong baseline recipe + +Goal: +- test whether DRF can add value once the skeleton baseline is already strong + +Recommended setup: +- split: `1:1:2` +- representation: start from the same `body-only` skeleton maps +- initialization: warm-start from the strong skeleton model, not from random init +- optimizer: small-LR `AdamW` +- scheduler: cosine + +Recommended strategy: +1. restore the strong skeleton checkpoint weights +2. initialize DRF from that visual branch +3. use a smaller LR than the plain baseline finetune +4. if needed, freeze or partially freeze the backbone for a short warmup so the PAV/PGA branch learns without immediately disturbing the already-good visual branch + +Why: +- the earlier DRF bridge peaked early and then degraded +- local evidence suggests the prior branch is currently weak or noisy +- DRF deserves one fair test from a strong starting point, not another scratch run + +Success criterion: +- DRF must beat the current best plain skeleton checkpoint on full-test macro-F1 +- if it does not, stop treating DRF as the practical winner + +### 3. Only after that, optional optimizer confirmation + +Goal: +- confirm that the `AdamW` cosine win was not just a lucky branch + +Options: +- rerun the winning `body-only` recipe once more with the same finetune schedule +- or compare one cleaner `SGD` continuation versus `AdamW` cosine finetune from the same checkpoint + +This is lower priority because: +- we already have a verified strong artifact checkpoint +- the practical problem is solved well enough for use + +## Recommended Commands Pattern + +For long detached runs, use [systemd-run Training](systemd-run-training.md). + +Use: +- `output_root: /mnt/hddl/data/OpenGait-output` +- `save_iter: 500` +- `eval_iter: 1000` +- `best_ckpt_cfg.keep_n: 3` +- metrics: + - `scalar/test_f1/` + - `scalar/test_accuracy/` + +This avoids losing a strong checkpoint between evals and keeps large checkpoints off the SSD. + +## Decision Rules + +Use these rules so the experiment search does not drift again. + +Promote a new run only if: +- it improves full-test macro-F1 over the current best retained checkpoint +- and the gain survives a standalone eval rerun from the saved checkpoint + +Stop a branch if: +- it is clearly below the current best baseline +- and its later checkpoints are not trending upward + +Do not declare a winner from: +- a proxy subset alone +- TensorBoard scalars alone +- an unsaved checkpoint implied only by logs + +Always verify the claimed winner by: +1. standalone eval from the checkpoint file +2. writing the result into the changelog and audit +3. copying the final selected checkpoint into `artifact/` if it becomes the new best + +## Short Recommendation + +If only one more experiment is going to be run, run this: + +- `full-body` under the same successful `1:1:2 + plain CE + AdamW cosine` recipe + +If two more experiments are going to be run, run: +1. `full-body` control +2. DRF warm-start finetune from the strong `body-only` skeleton checkpoint + +That is the highest-value next work from the current state.