9.2 KiB
DRF Author Checkpoint Compatibility Note
This note records what happened when evaluating the author-provided DRF bundle in this repo:
- checkpoint:
artifact/scoliosis_drf_author_118_compat/DRF_118_unordered_iter2w_lr0.001_8830-08000.pt - config:
ckpt/drf_author/drf_scoliosis1k_20000.yaml
The short version:
- the weight file is real and structurally usable
- the provided YAML is not a reliable source of truth
- the main problem was integration-contract mismatch, not a broken checkpoint
What Was Wrong
The author bundle was internally inconsistent in several ways.
1. Split mismatch
The DRF paper says the main experiment uses 1:1:8, i.e. the 118 split.
But the provided YAML pointed to:
./datasets/Scoliosis1K/Scoliosis1K_112.json
while the checkpoint filename itself says:
DRF_118_...
So the bundle already disagreed with itself.
2. Class-order mismatch
The biggest hidden bug was class ordering.
The current repo evaluator assumes:
negative = 0neutral = 1positive = 2
But the author stub in research/drf.py uses:
negative = 0positive = 1neutral = 2
That means an otherwise good checkpoint can look very bad if logits are interpreted in the wrong class order.
3. Legacy module-name mismatch
The author checkpoint stores PGA weights under:
attention_layer.*
The current repo uses:
PGA.*
This is a small compatibility issue, but it must be remapped before loading.
4. Preprocessing/runtime-contract mismatch
The author checkpoint does not line up with the stale YAML’s full runtime contract.
Most importantly, it did not work well with the more paper-literal local export:
Scoliosis1K-drf-pkl-118-paper
It worked much better with the more OpenGait-like aligned export:
Scoliosis1K-drf-pkl-118-aligned
That strongly suggests the checkpoint was trained against a preprocessing/runtime path closer to the aligned OpenGait integration than to the later local “paper-literal” summed-heatmap ablation.
What Was Added In-Tree
The current repo now has a small compatibility layer in:
opengait/modeling/models/drf.py
It does two things:
- remaps legacy keys
attention_layer.* -> PGA.* - supports configurable
model_cfg.label_order
The model also canonicalizes inference logits back into the repo’s evaluator order, so author checkpoints can be evaluated without modifying the evaluator itself.
Tested Compatibility Results
Best usable author-checkpoint path
Config:
configs/drf/drf_author_eval_118_aligned_1gpu.yaml
Dataset/runtime:
- dataset root:
Scoliosis1K-drf-pkl-118-aligned - partition:
Scoliosis1K_118.json - transform:
BaseSilCuttingTransform - label order:
negativepositiveneutral
Result:
80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1
This is the strongest recovered path so far.
Verified provenance of Scoliosis1K-drf-pkl-118-aligned
The 118-aligned root is no longer just an informed guess. It was verified
directly against the raw pose source:
/mnt/public/data/Scoliosis1K/Scoliosis1K-pose-pkl
The matching preprocessing path is:
datasets/pretreatment_scoliosis_drf.py- default heatmap config:
configs/drf/pretreatment_heatmap_drf.yaml
- archived equivalent config:
configs/drf/pretreatment_heatmap_drf_118_aligned.yaml
That means the aligned root was produced with:
- shared
sigma: 8.0 align: Truefinal_img_size: 64- default
heatmap_reduction=upstream - no
--stats_partition, i.e. dataset-level PAV min-max stats
Equivalent command:
uv run python datasets/pretreatment_scoliosis_drf.py \
--pose_data_path /mnt/public/data/Scoliosis1K/Scoliosis1K-pose-pkl \
--output_path /mnt/public/data/Scoliosis1K/Scoliosis1K-drf-pkl-118-aligned
Verification evidence:
- a regenerated
0_heatmap.pklsample from the raw pose input matched the storedScoliosis1K-drf-pkl-118-alignedsample exactly (array_equal == True) - a full recomputation of
pav_stats.pklfrom the raw pose input matched the storedpav_min,pav_max, andstats_partition=Noneexactly
So 118-aligned is the old default OpenGait-style DRF export, not the later:
118-paperpaper-literal summed-heatmap export118train-only-stats splitroot exportsigma15/sigma15_joint8exports
Targeted preprocessing ablations around the recovered path
After verifying the aligned root provenance, a few focused runtime/data ablations were tested against the author checkpoint to see which part of the contract still mattered most.
Baseline:
118-alignedBaseSilCuttingTransform- result:
80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1
Hybrid 1:
- aligned heatmap + splitroot PAV
- result:
77.30 Acc / 73.70 Prec / 73.04 Rec / 73.28 F1
Hybrid 2:
- splitroot heatmap + aligned PAV
- result:
80.37 Acc / 77.16 Prec / 76.48 Rec / 76.80 F1
Runtime ablation:
118-aligned+BaseSilTransform(no-cut)- result:
49.93 Acc / 50.49 Prec / 51.58 Rec / 47.75 F1
What these ablations suggest:
BaseSilCuttingTransformis necessary;no-cutbreaks the checkpoint badly- dataset-level PAV stats (
stats_partition=None) matter more than the exact aligned-vs-splitroot heatmap writer - the heatmap export is still part of the contract, but it is no longer the dominant remaining mismatch
Other tested paths
configs/drf/drf_author_eval_118_splitroot_1gpu.yaml
- dataset root:
Scoliosis1K-drf-pkl-118 - result:
77.17 Acc / 73.61 Prec / 72.59 Rec / 72.98 F1
configs/drf/drf_author_eval_112_1gpu.yaml
- dataset root:
Scoliosis1K-drf-pkl - partition:
Scoliosis1K_112.json - result:
85.19 Acc / 57.98 Prec / 56.65 Rec / 57.30 F1
configs/drf/drf_author_eval_118_paper_1gpu.yaml
- dataset root:
Scoliosis1K-drf-pkl-118-paper - transform:
BaseSilTransform - result:
27.24 Acc / 9.08 Prec / 33.33 Rec / 14.27 F1
Interpretation
What these results mean:
- the checkpoint is not garbage
- the original “very bad” local eval was mostly a compatibility failure
- the largest single hidden bug was the class-order mismatch
- the author checkpoint is also sensitive to which local DRF dataset root is used
- the recovered runtime is now good enough to make the checkpoint believable, but preprocessing alone did not recover the paper DRF headline row
What they do not mean:
- we have perfectly reconstructed the author’s original training path
- the provided YAML is trustworthy as-is
- the paper’s full DRF claim is fully reproduced here
One practical caveat on 1:1:2 vs 1:1:8 comparisons in this repo:
- local
Scoliosis1K_112.jsonandScoliosis1K_118.jsonare not the same train/test split with only a different class ratio - they differ substantially in membership
- so local
112vs118results should not be overinterpreted as a pure class-balance ablation unless the train/test pool is explicitly held fixed
To support a clean same-pool comparison, the repo now also includes:
datasets/Scoliosis1K/Scoliosis1K_118_fixedpool_train112.json
That partition keeps the full 118 TEST_SET unchanged and keeps the same
positive/neutral TRAIN_SET ids as 118, but downsamples TRAIN_SET negatives
to 148 so the train ratio becomes 74 / 74 / 148 (1:1:2).
The strongest recovered result:
80.24 / 76.73 / 76.40 / 76.56
This is close to the paper’s reported ScoNet-MT^ske F1 and much better than our earlier broken compat evals, but it is still below the paper’s DRF headline result:
- paper DRF:
86.0 Acc / 84.1 Prec / 79.2 Rec / 80.8 F1
Practical Recommendation
If someone wants to use the author checkpoint in this repo today, the recommended path is:
- use
configs/drf/drf_author_eval_118_aligned_1gpu.yaml - keep the author label order:
negative, positive, neutral
- keep the legacy
attention_layer -> PGAremap in the model - do not assume the stale
112YAML is the correct training/eval contract
If someone wants to push this further, the highest-value next step is:
- finetune from the author checkpoint on the aligned
118path instead of starting DRF from scratch
How To Run
Recommended eval:
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29693 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_aligned_1gpu.yaml \
--phase test
Other compatibility checks:
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29695 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_112_1gpu.yaml \
--phase test
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29696 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_splitroot_1gpu.yaml \
--phase test
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29697 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_paper_1gpu.yaml \
--phase test
If someone wants to reproduce this on another machine, the usual paths to change are:
data_cfg.dataset_rootdata_cfg.dataset_partitionevaluator_cfg.restore_hint
The archived artifact bundle is:
artifact/scoliosis_drf_author_118_compat