Files
OpenGait/docs/scoliosis_training_change_log.md
T

78 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Scoliosis Training Change Log
This file is the single run-by-run changelog for Scoliosis1K training and evaluation in this repo.
Use it for:
- what changed between runs
- which dataset/config/checkpoint was used
- what the resulting metrics were
- whether a run is still in progress
## Conventions
- Add one entry before launching a new training run.
- Update the same entry after training/eval completes.
- Record only the delta from the previous relevant run, not a full config dump.
- For skeleton-map control runs, use plain-text `ScoNet-MT-ske` naming in the notes even though the code class is `ScoNet`.
## Runs
| Date | Run | Model | Dataset | Main change vs previous relevant run | Status | Eval result |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 2026-03-07 | `DRF` | DRF | `Scoliosis1K-drf-pkl-118` | First OpenGait DRF integration on paper `1:1:8` split using shared OpenGait skeleton/PAV path | complete | `58.08 Acc / 78.80 Prec / 60.22 Rec / 56.99 F1` |
| 2026-03-08 | `DRF_paper` | DRF | `Scoliosis1K-drf-pkl-118-paper` | More paper-literal preprocessing: summed/sparser heatmap path, dataset-level PAV normalization, body-prior path refinement | complete | `51.67 Acc / 72.37 Prec / 56.22 Rec / 50.92 F1` |
| 2026-03-08 | `ScoNet_skeleton_118` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-paper` | Plain skeleton-map baseline on the paper-literal export to isolate DRF vs skeleton-path failure | complete | `38.85 Acc / 61.23 Prec / 46.75 Rec / 35.96 F1` |
| 2026-03-08 | `ScoNet_skeleton_118_sigma8` | ScoNet-MT-ske control | `Scoliosis1K_sigma_8.0/pkl` | Reused upstream/default sigma-8 heatmap export instead of the DRF paper-literal export | complete | `36.45 Acc / 69.17 Prec / 43.82 Rec / 32.78 F1` |
| 2026-03-08 | `ScoNet_skeleton_118_sigma15_bs12x8` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-sigma15` | Lowered skeleton-map sigma from `8.0` to `1.5` to tighten the pose rasterization | complete | `46.33 Acc / 68.09 Prec / 51.92 Rec / 44.69 F1` |
| 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_2gpu_bs12x8` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Fixed limb/joint channel misalignment, used mixed sigma `limb=1.5 / joint=8.0`, kept SGD | complete | `50.47 Acc / 69.31 Prec / 54.58 Rec / 48.63 F1` |
| 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_limb4_adamw_2gpu_bs12x8` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign-limb4` | Rebalanced channel intensity with `limb_gain=4.0`; switched optimizer from `SGD` to `AdamW` | complete | `48.60 Acc / 65.97 Prec / 53.19 Rec / 46.41 F1` |
| 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_1gpu_bs8x8` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Switched runtime transform from `BaseSilCuttingTransform` to `BaseSilTransform` (`no-cut`), kept `AdamW`, reduced `8x8` due to 5070 Ti OOM at `12x8` | interrupted | superseded by proxy route before eval |
| 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Fast proxy route: `no-cut`, `AdamW`, `8x8`, `total_iter=2000`, `eval_iter=500`, `test_seq_subset_size=128` | interrupted | superseded by geometry-fixed proxy before completion |
| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_geomfix_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-geomfix` | Geometry ablation: aspect-ratio-preserving crop+pad instead of square-warp resize; `AdamW`, `no-cut`, `8x8`, `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | complete | proxy subset unstable: `500 24.22/8.07/33.33/13.00`, `1000 60.16/68.05/58.13/55.25`, `1500 26.56/58.33/35.64/17.68`, `2000 27.34/63.96/37.02/20.14` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_weightedce_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Training-side imbalance ablation: kept the current best shared-align geometry, restored `SGD` baseline settings, and applied weighted CE with class weights `[1.0, 4.0, 4.0]`; `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | complete | `500 24.22/8.07/33.33/13.00`, `1000 71.09/48.12/53.93/50.19`, `1500 46.09/52.26/52.34/43.72`, `2000 37.50/47.03/45.45/34.28` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_bodyonly_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` | Representation ablation: dropped face keypoints and head limbs from the skeleton-map export while keeping the current shared-align `sigma_limb=1.5 / sigma_joint=8.0` setup; `SGD`, `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | complete | `500 53.91/40.60/50.68/40.26`, `1000 65.62/42.86/56.30/47.36`, `1500 28.12/50.28/36.55/19.53`, `2000 26.56/74.93/35.64/17.69` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_bodyonly_weightedce_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` | Combined the two strongest partial clues: body-only skeleton map plus weighted CE with class weights `[1.0, 4.0, 4.0]`; `SGD`, `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | interrupted | superseded at `100` iterations by the 2-GPU proxy variant |
| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_bodyonly_weightedce_proxy_2gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` | Same body-only + weighted-CE proxy, relaunched on the `5070 Ti + 3090` pair for faster iteration while keeping the fixed subset seed `118` | complete | `500 68.75/44.46/58.12/49.59`, `1000 57.81/52.49/34.41/26.42`, `1500 41.41/53.84/51.46/40.77`, `2000 39.84/53.76/52.10/39.24` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_weightedce_bridge_2gpu_10k` | ScoNet-MT-ske bridge | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Diagnostic bridge on the easier `1:1:2` split using the current best skeleton recipe (`body-only + weighted CE`), extended to `10000` iterations with proportional LR milestones and eval/save every `1000` | complete | best proxy subset at `7000`: `82.03/66.53/88.84/64.91`; full test at `7000`: `81.82/66.21/88.50/65.96` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_bridge_2gpu_10k` | ScoNet-MT-ske bridge | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Same `1:1:2` body-only bridge as above, but removed weighted CE to test whether class weighting was suppressing precision on the easier split | interrupted | superseded before meaningful progress by the user-requested 1-GPU rerun on the `5070 Ti` |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_bridge_1gpu_10k` | ScoNet-MT-ske bridge | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Same plain-CE `1:1:2` bridge, relaunched on the `5070 Ti` only per user request | complete | best proxy subset at `7000`: `88.28/69.12/74.15/68.80`; full test at `7000`: `83.16/68.24/80.02/68.47`; final proxy at `10000`: `75.00/65.00/63.41/54.55` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_headlite_plaince_bridge_1gpu_10k` | ScoNet-MT-ske bridge | `Scoliosis1K-drf-pkl-112-sigma15-joint8-headlite` + `Scoliosis1K_112.json` | Added `head-lite` structure (nose plus shoulder links, no eyes/ears) on top of the plain-CE `1:1:2` bridge; first `3090` launch OOMed due unrelated occupancy, then relaunched on the UUID-pinned `5070 Ti` | complete | best proxy subset at `7000`: `86.72/70.15/89.00/70.44`; full test at `7000`: `78.07/65.42/80.50/62.08` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `DRF_skeleton_112_sigma15_joint8_bodyonly_plaince_bridge_1gpu_10k` | DRF bridge | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | First practical DRF run on the winning `1:1:2` skeleton recipe: `body-only`, plain CE, SGD, `10k` bridge schedule, fixed proxy subset seed `112` | complete | best proxy subset at `2000`: `88.28/61.79/60.31/60.93`; full test at `2000`: `80.21/58.92/59.23/57.84` (Acc/Prec/Rec/F1) |
| 2026-03-14 | `DRF_author_eval_112_1gpu` | DRF author checkpoint compat | `Scoliosis1K-drf-pkl` + `Scoliosis1K_112.json` | Re-evaluated the author-provided checkpoint after adding legacy module-name remap and correcting the author class order; kept the stale `112` path to test whether the provided YAML was trustworthy | complete | `85.19/57.98/56.65/57.30` (Acc/Prec/Rec/F1) |
| 2026-03-14 | `DRF_author_eval_118_splitroot_1gpu` | DRF author checkpoint compat | `Scoliosis1K-drf-pkl-118` + `Scoliosis1K_118.json` | Same author-checkpoint compat path, but switched to the `118` split-specific local DRF dataset root while keeping `BaseSilCuttingTransform` and author class order | complete | `77.17/73.61/72.59/72.98` (Acc/Prec/Rec/F1) |
| 2026-03-14 | `DRF_author_eval_118_aligned_1gpu` | DRF author checkpoint compat | `Scoliosis1K-drf-pkl-118-aligned` + `Scoliosis1K_118.json` | Same author-checkpoint compat path, but evaluated on the aligned `118` DRF export; this is currently the best recovered author-checkpoint runtime contract | complete | `80.24/76.73/76.40/76.56` (Acc/Prec/Rec/F1) |
| 2026-03-14 | `DRF_author_eval_118_paper_1gpu` | DRF author checkpoint compat | `Scoliosis1K-drf-pkl-118-paper` + `Scoliosis1K_118.json` | Tested the author checkpoint against the local paper-literal summed-heatmap export with `BaseSilTransform` to see whether it matched the later paper-style preprocessing branch | complete | `27.24/9.08/33.33/14.27` (Acc/Prec/Rec/F1) |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_main_1gpu_20k` | ScoNet-MT-ske mainline | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Promoted the winning practical skeleton recipe to a longer `20k` run with full `TEST_SET` eval and checkpoint save every `1000`; no proxy subset, same plain CE + SGD setup | interrupted | superseded by the true-resume continuation below |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_resume_1gpu_20k` | ScoNet-MT-ske mainline | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | True continuation of the earlier plain-CE `1:1:2` `10k` bridge from its `latest.pt`, extended to `20k` with full `TEST_SET` eval and checkpoint save every `1000` | interrupted | superseded by the AdamW finetune branch below |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_finetune_1gpu_20k` | ScoNet-MT-ske finetune | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | AdamW finetune from the earlier plain-CE `1:1:2` `10k` checkpoint; restores model weights only, resets optimizer/scheduler state, keeps full `TEST_SET` eval and checkpoint save every `1000` | interrupted | superseded by the longer overnight 40k finetune below |
| 2026-03-10 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_finetune_1gpu_40k` | ScoNet-MT-ske finetune | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Longer overnight AdamW finetune from the same `10k` plain-CE checkpoint; restores model weights only, resets optimizer/scheduler state, extends to `40000` total iterations with full `TEST_SET` eval every `1000` | interrupted | superseded by the cosine `80k` finetune below |
| 2026-03-10 to 2026-03-11 | `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k` | ScoNet-MT-ske finetune | `Scoliosis1K-drf-pkl-118-sigma15-joint8-bodyonly` + `Scoliosis1K_112.json` | Practical long-run finetune from the same `10k` plain-CE checkpoint, but switched to `AdamW` with cosine decay, HDD-backed `output_root`, `save_iter=500`, `eval_iter=1000`, and best-N checkpoint retention | complete | final `80000` eval: `90.64/72.87/93.19/75.74`; verified best retained full-test checkpoint at `27000`: `92.38/90.30/87.39/88.70` (Acc/Prec/Rec/F1) |
## Current best skeleton baseline
Current best `ScoNet-MT-ske`-style result:
- practical best on the easier `1:1:2` split:
- `ScoNet_skeleton_112_sigma15_joint8_bodyonly_plaince_adamw_cosine_finetune_1gpu_80k` at retained best checkpoint `27000`
- verified standalone full-test eval: `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1`
- best result retained on the harder `1:1:8` split:
- `ScoNet_skeleton_118_sigma15_joint8_sharedalign_2gpu_bs12x8`
- `50.47 Acc / 69.31 Prec / 54.58 Rec / 48.63 F1`
## Notes
- `ckpt/ScoNet-20000-better.pt` is intentionally not listed here because it is a silhouette checkpoint, not a skeleton-map run.
- `DRF` runs are included because they are part of the same reproduction/debugging loop, but this log should stay focused on train/eval changes, not broader code refactors.
- The long `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_1gpu_bs8x8` run was intentionally interrupted and superseded by the shorter proxy run once fast-iteration support was added.
- The geometry-fixed proxy run fit the train split quickly but did not produce a stable proxy validation curve, so it should not be promoted to a full 20k run.
- The weighted-CE proxy briefly improved the proxy peak at `1000` iterations, but it also collapsed afterward, so class weighting alone is not a sufficient fix for the skeleton branch.
- The body-only proxy improved the early `500`-iteration proxy result relative to some earlier runs, but it still collapsed badly after `1000`, so removing face/head structure alone is also not a sufficient fix.
- Combining `body-only` with `weighted CE` gave the best `500`-iteration proxy seen so far (`68.75 Acc / 49.59 F1` on the fixed 128-sequence subset), but it still degraded substantially by `1000+`, which points more toward schedule/imbalance dynamics than a single missing representation tweak.
- Full-test check on retained checkpoints from the combined `body-only + weighted CE` run did not confirm a simple early-stop win: the retained `1000` checkpoint scored `56.61 Acc / 52.11 Prec / 34.15 Rec / 25.61 F1` on the full test split, while `2000` scored `41.52 Acc / 56.75 Prec / 54.75 Rec / 40.09 F1`. That means the small proxy subset is useful for screening but not reliable enough to choose the final stopping point by itself.
- The `1:1:2` bridge run is the strongest evidence so far that the skeleton branch is learnable: with the current best skeleton recipe (`body-only + weighted CE`), the full test score at `7000` reached `81.82 Acc / 66.21 Prec / 88.50 Rec / 65.96 F1`. That still trails the ScoNet papers stronger `1:1:2` result, but it is dramatically better than the `1:1:8` skeleton runs and makes class distribution a first-order factor in the reproduction gap.
- Removing weighted CE on the `1:1:2` bridge improved the current best full-test result further: `body-only + plain CE` reached `83.16 Acc / 68.24 Prec / 80.02 Rec / 68.47 F1` at `7000`, so weighted CE does not currently look beneficial on the easier split.
- A later full-test rerun of the retained `body-only + plain CE` `7000` checkpoint reproduced the same `83.16 / 68.24 / 80.02 / 68.47` result exactly, so that number is now explicitly reconfirmed rather than just carried forward from the original run log.
- `Head-lite` looked stronger than `body-only` on the fixed 128-sequence proxy subset at `7000`, but it did not transfer to the full test set: `78.07 Acc / 65.42 Prec / 80.50 Rec / 62.08 F1`, which is clearly below the `body-only + plain CE` full-test result.
- The first practical DRF bridge on the winning `1:1:2` recipe did not beat the plain skeleton baseline. Its best retained checkpoint (`2000`) reached only `80.21 Acc / 58.92 Prec / 59.23 Rec / 57.84 F1` on the full test set, versus `83.16 / 68.24 / 80.02 / 68.47` for `body-only + plain CE` at `7000`. The working local interpretation is that the added PAV/PGA path is currently injecting a weak or noisy prior rather than a useful complementary signal.
- The author-provided DRF checkpoint turned out to be partially recoverable once the local integration contract was corrected. The main hidden bug was class-order mismatch: the author stub uses `negative=0, positive=1, neutral=2`, while the repo evaluator assumes `negative=0, neutral=1, positive=2`. After adding label-order compatibility and the legacy `attention_layer -> PGA` remap, the checkpoint became meaningfully usable. The best recovered path so far is the aligned `118` export, which reached `80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1`. The stale author YAML is therefore not a trustworthy source of the original runtime contract; the checkpoint aligns much better with an `118` aligned/OpenGait-like path than with the later local `118-paper` export.
- Extending the practical `1:1:2` body-only plain-CE baseline with an `AdamW` cosine finetune changed the picture substantially. The final `80000` checkpoint still evaluated well (`90.64 / 72.87 / 93.19 / 75.74`), but the real win was an earlier retained checkpoint: `27000` reproduced exactly at `92.38 Acc / 90.30 Prec / 87.39 Rec / 88.70 F1` on a standalone full-test rerun. So for this practical path, the best result is no longer the original SGD bridge checkpoint but the retained `AdamW` cosine finetune checkpoint.