docs: add scoliosis next-experiment plan

2026-03-11 10:44:33 +08:00
parent 1c3aa1f1a3
commit c62bdee1f9
1 changed files with 202 additions and 0 deletions
@@ -0,0 +1,202 @@
+# Scoliosis: Next Experiments
+
+This note is the short operational plan for the next scoliosis experiments.
+It is written to be runnable by someone who did not follow the full debugging history.
+
+Related notes:
+- [ScoNet and DRF: Status, Architecture, and Reproduction Notes](sconet-drf-status-and-training.md)
+- [Scoliosis Training Change Log](scoliosis_training_change_log.md)
+- [Scoliosis Reproducibility Audit](scoliosis_reproducibility_audit.md)
+- [systemd-run Training](systemd-run-training.md)
+
+## Current Best Known Result
+
+Current practical winner:
+- model family: `ScoNet-MT-ske`
+- split: `1:1:2` (`Scoliosis1K_112.json`)
+- representation: `body-only`
+- loss: plain CE + triplet
+- optimizer path: later `AdamW` cosine finetune
+
+Best verified checkpoints:
+- best macro-F1:
+  - checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-27000-score-0.8870-scalar_test_f1.pt`
+  - full test: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
+- best accuracy:
+  - checkpoint: `artifact/scoliosis_sconet_112_bodyonly_plaince_adamw_cosine/ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k-iter-64000-score-0.9425-scalar_test_accuracy.pt`
+  - full test: `94.25 Acc / 83.24 Prec / 95.76 Rec / 87.63 F1`
+
+## Branches Already Tried
+
+These are the main branches already explored, so the next person does not rerun the same ideas blindly.
+
+### Representation branches
+
+- `full-body`
+  - original all-joints skeleton-map path
+  - useful as a reference, but the practical winner did not come from this branch
+- `body-only`
+  - removed face/head keypoints and head limbs
+  - this became the strongest practical representation
+  - current best checkpoint family comes from this branch
+- `head-lite`
+  - added back limited head information, mainly `nose` and shoulder-linked head context
+  - looked promising on the small fixed proxy subset
+  - lost on the full test set, so it is not the current winner
+- `geom-fix`
+  - aspect-ratio-preserving geometry/padding experiment instead of the older square-warp behavior
+  - useful for debugging, but not the branch that produced the best practical result
+- `no-cut`
+  - removed the silhouette-style width cut at runtime
+  - not the branch that ultimately won
+
+### Loss / optimization branches
+
+- `weighted CE`
+  - helped on harder imbalance settings and some short proxies
+  - did not beat `plain CE` on the practical `1:1:2` path
+- `plain CE`
+  - better than weighted CE on the working `1:1:2` setup
+  - remains the default practical choice
+- `SGD bridge`
+  - gave the first strong practical skeleton baseline
+- `AdamW` multistep finetune
+  - unstable and often worse
+- `AdamW` cosine finetune
+  - this is the branch that finally produced the retained best checkpoints
+
+### Model branches
+
+- `ScoNet-MT-ske`
+  - practical winner so far
+- `DRF`
+  - implemented and tested
+  - underperformed the plain skeleton baseline in the first serious practical run
+  - still worth one final warm-start retry, but not the default winner
+
+## What Not To Do First
+
+Do not start here:
+- do not treat `DRF` as the mainline model yet
+- do not go back to `1:1:8` first unless the goal is specifically hard-imbalance study
+- do not spend time on more visualization work first
+- do not keep changing preprocessing and optimizer at the same time
+
+Reason:
+- the current plain skeleton baseline is already strong
+- the first practical DRF run underperformed that baseline badly
+- broad exploratory tuning is lower-value than a few controlled confirmation experiments
+
+## Priority Order
+
+Run experiments in this order.
+
+### 1. Full-body control under the winning recipe
+
+Goal:
+- verify that `body-only` is truly the best practical representation
+
+Keep fixed:
+- split: `1:1:2`
+- optimizer path: same successful `AdamW` cosine finetune style
+- loss: plain CE + triplet
+- scheduler style: cosine, not the old hard multistep decay
+
+Change only:
+- representation: `full-body` instead of `body-only`
+
+Success criterion:
+- only keep `full-body` if it beats the current best `body-only` checkpoint on full-test macro-F1
+
+If `full-body` loses:
+- stop
+- keep `body-only` as the default practical representation
+
+### 2. One serious DRF retry, but only on top of the strong baseline recipe
+
+Goal:
+- test whether DRF can add value once the skeleton baseline is already strong
+
+Recommended setup:
+- split: `1:1:2`
+- representation: start from the same `body-only` skeleton maps
+- initialization: warm-start from the strong skeleton model, not from random init
+- optimizer: small-LR `AdamW`
+- scheduler: cosine
+
+Recommended strategy:
+1. restore the strong skeleton checkpoint weights
+2. initialize DRF from that visual branch
+3. use a smaller LR than the plain baseline finetune
+4. if needed, freeze or partially freeze the backbone for a short warmup so the PAV/PGA branch learns without immediately disturbing the already-good visual branch
+
+Why:
+- the earlier DRF bridge peaked early and then degraded
+- local evidence suggests the prior branch is currently weak or noisy
+- DRF deserves one fair test from a strong starting point, not another scratch run
+
+Success criterion:
+- DRF must beat the current best plain skeleton checkpoint on full-test macro-F1
+- if it does not, stop treating DRF as the practical winner
+
+### 3. Only after that, optional optimizer confirmation
+
+Goal:
+- confirm that the `AdamW` cosine win was not just a lucky branch
+
+Options:
+- rerun the winning `body-only` recipe once more with the same finetune schedule
+- or compare one cleaner `SGD` continuation versus `AdamW` cosine finetune from the same checkpoint
+
+This is lower priority because:
+- we already have a verified strong artifact checkpoint
+- the practical problem is solved well enough for use
+
+## Recommended Commands Pattern
+
+For long detached runs, use [systemd-run Training](systemd-run-training.md).
+
+Use:
+- `output_root: /mnt/hddl/data/OpenGait-output`
+- `save_iter: 500`
+- `eval_iter: 1000`
+- `best_ckpt_cfg.keep_n: 3`
+- metrics:
+  - `scalar/test_f1/`
+  - `scalar/test_accuracy/`
+
+This avoids losing a strong checkpoint between evals and keeps large checkpoints off the SSD.
+
+## Decision Rules
+
+Use these rules so the experiment search does not drift again.
+
+Promote a new run only if:
+- it improves full-test macro-F1 over the current best retained checkpoint
+- and the gain survives a standalone eval rerun from the saved checkpoint
+
+Stop a branch if:
+- it is clearly below the current best baseline
+- and its later checkpoints are not trending upward
+
+Do not declare a winner from:
+- a proxy subset alone
+- TensorBoard scalars alone
+- an unsaved checkpoint implied only by logs
+
+Always verify the claimed winner by:
+1. standalone eval from the checkpoint file
+2. writing the result into the changelog and audit
+3. copying the final selected checkpoint into `artifact/` if it becomes the new best
+
+## Short Recommendation
+
+If only one more experiment is going to be run, run this:
+
+- `full-body` under the same successful `1:1:2 + plain CE + AdamW cosine` recipe
+
+If two more experiments are going to be run, run:
+1. `full-body` control
+2. DRF warm-start finetune from the strong `body-only` skeleton checkpoint
+
+That is the highest-value next work from the current state.