From 37483fa628f97017de0a8e007cfddf3ad502169d Mon Sep 17 00:00:00 2001 From: crosstyan Date: Tue, 10 Mar 2026 01:00:05 +0800 Subject: [PATCH] Document scoliosis reproducibility findings --- docs/scoliosis_reproducibility_audit.md | 199 ++++++++++++++++++++++++ docs/scoliosis_training_change_log.md | 3 +- docs/sconet-drf-status-and-training.md | 2 + 3 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 docs/scoliosis_reproducibility_audit.md diff --git a/docs/scoliosis_reproducibility_audit.md b/docs/scoliosis_reproducibility_audit.md new file mode 100644 index 0000000..5319df6 --- /dev/null +++ b/docs/scoliosis_reproducibility_audit.md @@ -0,0 +1,199 @@ +# Scoliosis1K Reproducibility Audit + +This note records which parts of the ScoNet and DRF papers are currently reproducible in this repo, which parts are only partially reproducible, and which parts remain under-specified or unsupported by local evidence. + +Ground truth policy for this audit: + +- the papers are the methodological source of truth +- local code and local runs are the implementation evidence +- when the two diverge, this document states that explicitly + +## Papers and local references + +- ScoNet paper: [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex) +- DRF paper: [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex) +- local run history: [scoliosis_training_change_log.md](/home/crosstyan/Code/OpenGait/docs/scoliosis_training_change_log.md) +- current status note: [sconet-drf-status-and-training.md](/home/crosstyan/Code/OpenGait/docs/sconet-drf-status-and-training.md) + +## What is reproducible + +### 1. The silhouette ScoNet pipeline is reproducible + +Evidence: + +- The ScoNet paper states the standard `1:1:8` evaluation protocol and the SGD schedule clearly in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex#L201) and [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex#L205). +- The same paper reports that multi-task ScoNet-MT is much stronger than single-task ScoNet, including the class-ratio study in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex#L277). +- In this repo, the standard silhouette ScoNet path is stable: + - the model/trainer/evaluator path is intact + - a strong silhouette checkpoint reproduces cleanly on the correct split family + +Conclusion: + +- the Scoliosis1K silhouette modality is usable +- the core OpenGait training and evaluation stack is usable for this task +- the repo is not globally broken + +### 2. The high-level DRF architecture is reproducible + +Evidence: + +- The DRF paper defines the method as `skeleton map + PAV + PGA` in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L130). +- It defines: + - pelvis-centering and height normalization in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L135) + - two-channel skeleton maps in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L139) + - PAV metrics in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L164) + - PGA channel/spatial attention in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L196) +- This repo now has a functioning DRF model and DRF-specific preprocessing path implementing those ideas. + +Conclusion: + +- the paper is specific enough to implement a plausible DRF model family +- the architecture-level claim is reproducible +- the exact paper-level quantitative result is not yet reproducible + +### 3. The PAV concept is reproducible + +Evidence: + +- The DRF paper defines 8 symmetric joint pairs and 3 asymmetry metrics in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L160). +- The local preprocessing implements those metrics and produces stable sequence-level PAVs. +- Local dataset analysis showed the PAV still carries useful signal, even with a simple probe. + +Conclusion: + +- PAV is not the main missing piece +- the main reproduction gap is not “we cannot build the clinical prior” + +## What is only partially reproducible + +### 4. The skeleton-map branch is reproducible only at the concept level + +Evidence: + +- The DRF paper describes the skeleton map as a dense, silhouette-like two-channel representation in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L139). +- It does not specify crucial rasterization details such as: + - numeric `sigma` + - joint-vs-limb relative weighting + - quantization and dtype + - crop policy + - resize policy + - whether alignment is per-frame or per-sequence +- Local runs show these details matter a lot: + - `sigma=8` skeleton runs were very poor + - smaller sigma and fixed limb/joint alignment improved results materially + - the best local skeleton baseline is still only `50.47 Acc / 48.63 Macro-F1`, far below the paper's `82.5 / 76.6` for ScoNet-MT-ske in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L252) + +Conclusion: + +- the paper specifies the representation idea +- it does not specify enough to make the skeleton-map branch quantitatively reproducible from text alone + +### 5. The visualization story is only partially reproducible + +Evidence: + +- The ScoNet paper cites the attention-transfer visualization family in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex#L246). +- The DRF paper cites Zhou et al. CAM in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L295). +- Neither paper states: + - which layer is visualized + - whether visualization is before or after temporal pooling + - the exact normalization/rendering procedure +- Local attempts suggest that only certain intermediate layers produce qualitatively plausible maps, and even then the results are approximations rather than faithful reproductions. + +Conclusion: + +- qualitative visualization claims are only partially reproducible +- they should not be treated as strong evidence until the extraction procedure is specified better + +## What is not reproducible from the paper and local materials alone + +### 6. The paper-level ScoNet-MT-ske and DRF numbers are not reproducible yet + +Evidence: + +- The DRF paper reports: + - ScoNet-MT-ske: `82.5 Acc / 81.4 Prec / 74.3 Rec / 76.6 F1` + - DRF: `86.0 Acc / 84.1 Prec / 79.2 Rec / 80.8 F1` + in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L252) +- The best local skeleton-map baseline so far is recorded in [scoliosis_training_change_log.md](/home/crosstyan/Code/OpenGait/docs/scoliosis_training_change_log.md): + - `50.47 Acc / 69.31 Prec / 54.58 Rec / 48.63 F1` +- Local DRF runs are also well below the paper: + - `58.08 / 78.80 / 60.22 / 56.99` + - `51.67 / 72.37 / 56.22 / 50.92` + +Conclusion: + +- the current repo can reproduce the idea of DRF +- it cannot reproduce the paper’s reported skeleton/DRF metrics yet + +### 7. The missing author-side training details are still unresolved + +Evidence: + +- The author-side DRF stub referenced a `BaseModel_body` path that was not released in the original materials. +- A local reconstruction suggests that `BaseModel_body` was probably thin, not the whole explanation for the metric gap. +- Even after matching the likely missing base-class contract more closely, the metric gap remained large. + +Conclusion: + +- the missing private code is probably not the only reason reproduction fails +- but the lack of released code still weakens the paper’s reproducibility + +### 8. The exact split accounting is slightly inconsistent + +Evidence: + +- The ScoNet and DRF papers describe the standard split as `745 train / 748 test` in [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2407.05726v3/main.tex#L201) and [main.tex](/home/crosstyan/Code/OpenGait/research/arXiv-2509.00872v1/main.tex#L280). +- The released partition file matching the `1:1:8` class ratio in this repo is effectively `744 / 749`. + +Conclusion: + +- this is probably a release/text bookkeeping mismatch, not the main source of failure +- but it is another example that the paper protocol is not perfectly audit-friendly + +## Current strongest local conclusions + +### Reproducible with high confidence + +- silhouette ScoNet runs are meaningful +- the Scoliosis1K raw pose data does not appear obviously broken +- the OpenGait training/evaluation infrastructure is not the main problem +- PAV computation is not the main blocker + +### Not reproducible with current evidence + +- the paper’s claimed skeleton-map baseline quality +- the paper’s claimed DRF improvement magnitude +- the paper’s qualitative response-map story as shown in the figures + +### Most likely interpretation + +- the papers are probably directionally correct +- but the skeleton-map and DRF pipelines are under-specified +- the missing implementation details are important enough that a faithful independent reproduction is not currently achievable from the paper text and released materials alone + +## Recommended standard for future work in this repo + +When a future change claims to improve DRF reproduction, it should satisfy all of the following: + +1. beat the current best skeleton baseline on the fixed proxy protocol +2. remain stable across at least one full run, not just a short spike +3. state whether the change is: + - paper-faithful + - implementation-motivated + - or purely empirical +4. avoid using silhouette success as evidence that the skeleton path is correct + +## Practical bottom line + +At the moment, this repo supports: + +- faithful silhouette ScoNet experimentation +- plausible DRF implementation work +- structured debugging of the skeleton-map branch + +At the moment, this repo does not yet support: + +- claiming a successful independent reproduction of the DRF paper’s quantitative results +- claiming that the paper’s skeleton-map preprocessing is fully specified +- treating the paper’s qualitative feature-response visualizations as reproduced diff --git a/docs/scoliosis_training_change_log.md b/docs/scoliosis_training_change_log.md index c069161..a665df2 100644 --- a/docs/scoliosis_training_change_log.md +++ b/docs/scoliosis_training_change_log.md @@ -29,7 +29,7 @@ Use it for: | 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_1gpu_bs8x8` | ScoNet-MT-ske control | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Switched runtime transform from `BaseSilCuttingTransform` to `BaseSilTransform` (`no-cut`), kept `AdamW`, reduced `8x8` due to 5070 Ti OOM at `12x8` | interrupted | superseded by proxy route before eval | | 2026-03-09 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Fast proxy route: `no-cut`, `AdamW`, `8x8`, `total_iter=2000`, `eval_iter=500`, `test_seq_subset_size=128` | interrupted | superseded by geometry-fixed proxy before completion | | 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_geomfix_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-geomfix` | Geometry ablation: aspect-ratio-preserving crop+pad instead of square-warp resize; `AdamW`, `no-cut`, `8x8`, `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | complete | proxy subset unstable: `500 24.22/8.07/33.33/13.00`, `1000 60.16/68.05/58.13/55.25`, `1500 26.56/58.33/35.64/17.68`, `2000 27.34/63.96/37.02/20.14` (Acc/Prec/Rec/F1) | -| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_weightedce_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Training-side imbalance ablation: kept the current best shared-align geometry, restored `SGD` baseline settings, and applied weighted CE with class weights `[1.0, 4.0, 4.0]`; `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | training | no eval yet | +| 2026-03-10 | `ScoNet_skeleton_118_sigma15_joint8_sharedalign_weightedce_proxy_1gpu` | ScoNet-MT-ske proxy | `Scoliosis1K-drf-pkl-118-sigma15-joint8-sharedalign` | Training-side imbalance ablation: kept the current best shared-align geometry, restored `SGD` baseline settings, and applied weighted CE with class weights `[1.0, 4.0, 4.0]`; `total_iter=2000`, `eval_iter=500`, fixed test subset seed `118` | complete | `500 24.22/8.07/33.33/13.00`, `1000 71.09/48.12/53.93/50.19`, `1500 46.09/52.26/52.34/43.72`, `2000 37.50/47.03/45.45/34.28` (Acc/Prec/Rec/F1) | ## Current best skeleton baseline @@ -43,3 +43,4 @@ Current best `ScoNet-MT-ske`-style result: - `DRF` runs are included because they are part of the same reproduction/debugging loop, but this log should stay focused on train/eval changes, not broader code refactors. - The long `ScoNet_skeleton_118_sigma15_joint8_sharedalign_nocut_adamw_1gpu_bs8x8` run was intentionally interrupted and superseded by the shorter proxy run once fast-iteration support was added. - The geometry-fixed proxy run fit the train split quickly but did not produce a stable proxy validation curve, so it should not be promoted to a full 20k run. +- The weighted-CE proxy briefly improved the proxy peak at `1000` iterations, but it also collapsed afterward, so class weighting alone is not a sufficient fix for the skeleton branch. diff --git a/docs/sconet-drf-status-and-training.md b/docs/sconet-drf-status-and-training.md index 081a020..da7a98e 100644 --- a/docs/sconet-drf-status-and-training.md +++ b/docs/sconet-drf-status-and-training.md @@ -2,6 +2,8 @@ This note records the current Scoliosis1K implementation status in this repo and the main conclusions from the recent reproduction/debugging work. +For a stricter paper-vs-local reproducibility breakdown, see [scoliosis_reproducibility_audit.md](/home/crosstyan/Code/OpenGait/docs/scoliosis_reproducibility_audit.md). + ## Current status - `opengait/modeling/models/sconet.py` is still the standard Scoliosis1K baseline in this repo.