crosstyan/OpenGait

Fork 0

Files

T

crosstyan 7b98e066e4 feat: add fixed-pool scoliosis partition helper

2026-03-14 17:45:31 +08:00

9.2 KiB

Raw Permalink Blame History

DRF Author Checkpoint Compatibility Note

This note records what happened when evaluating the author-provided DRF bundle in this repo:

checkpoint: artifact/scoliosis_drf_author_118_compat/DRF_118_unordered_iter2w_lr0.001_8830-08000.pt
config: ckpt/drf_author/drf_scoliosis1k_20000.yaml

The short version:

the weight file is real and structurally usable
the provided YAML is not a reliable source of truth
the main problem was integration-contract mismatch, not a broken checkpoint

What Was Wrong

The author bundle was internally inconsistent in several ways.

1. Split mismatch

The DRF paper says the main experiment uses 1:1:8, i.e. the 118 split.

But the provided YAML pointed to:

./datasets/Scoliosis1K/Scoliosis1K_112.json

while the checkpoint filename itself says:

DRF_118_...

So the bundle already disagreed with itself.

2. Class-order mismatch

The biggest hidden bug was class ordering.

The current repo evaluator assumes:

negative = 0
neutral = 1
positive = 2

But the author stub in research/drf.py uses:

negative = 0
positive = 1
neutral = 2

That means an otherwise good checkpoint can look very bad if logits are interpreted in the wrong class order.

3. Legacy module-name mismatch

The author checkpoint stores PGA weights under:

attention_layer.*

The current repo uses:

PGA.*

This is a small compatibility issue, but it must be remapped before loading.

4. Preprocessing/runtime-contract mismatch

The author checkpoint does not line up with the stale YAML’s full runtime contract.

Most importantly, it did not work well with the more paper-literal local export:

Scoliosis1K-drf-pkl-118-paper

It worked much better with the more OpenGait-like aligned export:

Scoliosis1K-drf-pkl-118-aligned

That strongly suggests the checkpoint was trained against a preprocessing/runtime path closer to the aligned OpenGait integration than to the later local “paper-literal” summed-heatmap ablation.

What Was Added In-Tree

The current repo now has a small compatibility layer in:

opengait/modeling/models/drf.py

It does two things:

remaps legacy keys attention_layer.* -> PGA.*
supports configurable model_cfg.label_order

The model also canonicalizes inference logits back into the repo’s evaluator order, so author checkpoints can be evaluated without modifying the evaluator itself.

Tested Compatibility Results

Best usable author-checkpoint path

Config:

configs/drf/drf_author_eval_118_aligned_1gpu.yaml

Dataset/runtime:

dataset root: Scoliosis1K-drf-pkl-118-aligned
partition: Scoliosis1K_118.json
transform: BaseSilCuttingTransform
label order:
- negative
- positive
- neutral

Result:

80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1

This is the strongest recovered path so far.

Verified provenance of `Scoliosis1K-drf-pkl-118-aligned`

The 118-aligned root is no longer just an informed guess. It was verified directly against the raw pose source:

/mnt/public/data/Scoliosis1K/Scoliosis1K-pose-pkl

The matching preprocessing path is:

datasets/pretreatment_scoliosis_drf.py
default heatmap config:
- configs/drf/pretreatment_heatmap_drf.yaml
archived equivalent config:
- configs/drf/pretreatment_heatmap_drf_118_aligned.yaml

That means the aligned root was produced with:

shared sigma: 8.0
align: True
final_img_size: 64
default heatmap_reduction=upstream
no --stats_partition, i.e. dataset-level PAV min-max stats

Equivalent command:

uv run python datasets/pretreatment_scoliosis_drf.py \
  --pose_data_path /mnt/public/data/Scoliosis1K/Scoliosis1K-pose-pkl \
  --output_path /mnt/public/data/Scoliosis1K/Scoliosis1K-drf-pkl-118-aligned

Verification evidence:

a regenerated 0_heatmap.pkl sample from the raw pose input matched the stored Scoliosis1K-drf-pkl-118-aligned sample exactly (array_equal == True)
a full recomputation of pav_stats.pkl from the raw pose input matched the stored pav_min, pav_max, and stats_partition=None exactly

So 118-aligned is the old default OpenGait-style DRF export, not the later:

118-paper paper-literal summed-heatmap export
118 train-only-stats splitroot export
sigma15 / sigma15_joint8 exports

Targeted preprocessing ablations around the recovered path

After verifying the aligned root provenance, a few focused runtime/data ablations were tested against the author checkpoint to see which part of the contract still mattered most.

Baseline:

118-aligned
BaseSilCuttingTransform
result:
- 80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1

Hybrid 1:

aligned heatmap + splitroot PAV
result:
- 77.30 Acc / 73.70 Prec / 73.04 Rec / 73.28 F1

Hybrid 2:

splitroot heatmap + aligned PAV
result:
- 80.37 Acc / 77.16 Prec / 76.48 Rec / 76.80 F1

Runtime ablation:

118-aligned + BaseSilTransform (no-cut)
result:
- 49.93 Acc / 50.49 Prec / 51.58 Rec / 47.75 F1

What these ablations suggest:

BaseSilCuttingTransform is necessary; no-cut breaks the checkpoint badly
dataset-level PAV stats (stats_partition=None) matter more than the exact aligned-vs-splitroot heatmap writer
the heatmap export is still part of the contract, but it is no longer the dominant remaining mismatch

Other tested paths

configs/drf/drf_author_eval_118_splitroot_1gpu.yaml

dataset root: Scoliosis1K-drf-pkl-118
result:
- 77.17 Acc / 73.61 Prec / 72.59 Rec / 72.98 F1

configs/drf/drf_author_eval_112_1gpu.yaml

dataset root: Scoliosis1K-drf-pkl
partition: Scoliosis1K_112.json
result:
- 85.19 Acc / 57.98 Prec / 56.65 Rec / 57.30 F1

configs/drf/drf_author_eval_118_paper_1gpu.yaml

dataset root: Scoliosis1K-drf-pkl-118-paper
transform: BaseSilTransform
result:
- 27.24 Acc / 9.08 Prec / 33.33 Rec / 14.27 F1

Interpretation

What these results mean:

the checkpoint is not garbage
the original “very bad” local eval was mostly a compatibility failure
the largest single hidden bug was the class-order mismatch
the author checkpoint is also sensitive to which local DRF dataset root is used
the recovered runtime is now good enough to make the checkpoint believable, but preprocessing alone did not recover the paper DRF headline row

What they do not mean:

we have perfectly reconstructed the author’s original training path
the provided YAML is trustworthy as-is
the paper’s full DRF claim is fully reproduced here

One practical caveat on 1:1:2 vs 1:1:8 comparisons in this repo:

local Scoliosis1K_112.json and Scoliosis1K_118.json are not the same train/test split with only a different class ratio
they differ substantially in membership
so local 112 vs 118 results should not be overinterpreted as a pure class-balance ablation unless the train/test pool is explicitly held fixed

To support a clean same-pool comparison, the repo now also includes:

datasets/Scoliosis1K/Scoliosis1K_118_fixedpool_train112.json

That partition keeps the full 118 TEST_SET unchanged and keeps the same positive/neutral TRAIN_SET ids as 118, but downsamples TRAIN_SET negatives to 148 so the train ratio becomes 74 / 74 / 148 (1:1:2).

The strongest recovered result:

80.24 / 76.73 / 76.40 / 76.56

This is close to the paper’s reported ScoNet-MT^ske F1 and much better than our earlier broken compat evals, but it is still below the paper’s DRF headline result:

paper DRF: 86.0 Acc / 84.1 Prec / 79.2 Rec / 80.8 F1

Practical Recommendation

If someone wants to use the author checkpoint in this repo today, the recommended path is:

use configs/drf/drf_author_eval_118_aligned_1gpu.yaml
keep the author label order:
- negative, positive, neutral
keep the legacy attention_layer -> PGA remap in the model
do not assume the stale 112 YAML is the correct training/eval contract

If someone wants to push this further, the highest-value next step is:

finetune from the author checkpoint on the aligned 118 path instead of starting DRF from scratch

How To Run

Recommended eval:

CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29693 \
  opengait/main.py \
  --cfgs ./configs/drf/drf_author_eval_118_aligned_1gpu.yaml \
  --phase test

Other compatibility checks:

CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29695 \
  opengait/main.py \
  --cfgs ./configs/drf/drf_author_eval_112_1gpu.yaml \
  --phase test

CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29696 \
  opengait/main.py \
  --cfgs ./configs/drf/drf_author_eval_118_splitroot_1gpu.yaml \
  --phase test

CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29697 \
  opengait/main.py \
  --cfgs ./configs/drf/drf_author_eval_118_paper_1gpu.yaml \
  --phase test

If someone wants to reproduce this on another machine, the usual paths to change are:

data_cfg.dataset_root
data_cfg.dataset_partition
evaluator_cfg.restore_hint

The archived artifact bundle is:

artifact/scoliosis_drf_author_118_compat

9.2 KiB Raw Permalink Blame History Unescape Escape