Files
OpenGait/docs/drf_author_checkpoint_compat.md
T

193 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DRF Author Checkpoint Compatibility Note
This note records what happened when evaluating the author-provided DRF bundle in this repo:
- checkpoint: `artifact/scoliosis_drf_author_118_compat/DRF_118_unordered_iter2w_lr0.001_8830-08000.pt`
- config: `ckpt/drf_author/drf_scoliosis1k_20000.yaml`
The short version:
- the weight file is real and structurally usable
- the provided YAML is not a reliable source of truth
- the main problem was integration-contract mismatch, not a broken checkpoint
## What Was Wrong
The author bundle was internally inconsistent in several ways.
### 1. Split mismatch
The DRF paper says the main experiment uses `1:1:8`, i.e. the `118` split.
But the provided YAML pointed to:
- `./datasets/Scoliosis1K/Scoliosis1K_112.json`
while the checkpoint filename itself says:
- `DRF_118_...`
So the bundle already disagreed with itself.
### 2. Class-order mismatch
The biggest hidden bug was class ordering.
The current repo evaluator assumes:
- `negative = 0`
- `neutral = 1`
- `positive = 2`
But the author stub in `research/drf.py` uses:
- `negative = 0`
- `positive = 1`
- `neutral = 2`
That means an otherwise good checkpoint can look very bad if logits are interpreted in the wrong class order.
### 3. Legacy module-name mismatch
The author checkpoint stores PGA weights under:
- `attention_layer.*`
The current repo uses:
- `PGA.*`
This is a small compatibility issue, but it must be remapped before loading.
### 4. Preprocessing/runtime-contract mismatch
The author checkpoint does not line up with the stale YAMLs full runtime contract.
Most importantly, it did **not** work well with the more paper-literal local export:
- `Scoliosis1K-drf-pkl-118-paper`
It worked much better with the more OpenGait-like aligned export:
- `Scoliosis1K-drf-pkl-118-aligned`
That strongly suggests the checkpoint was trained against a preprocessing/runtime path closer to the aligned OpenGait integration than to the later local “paper-literal” summed-heatmap ablation.
## What Was Added In-Tree
The current repo now has a small compatibility layer in:
- `opengait/modeling/models/drf.py`
It does two things:
- remaps legacy keys `attention_layer.* -> PGA.*`
- supports configurable `model_cfg.label_order`
The model also canonicalizes inference logits back into the repos evaluator order, so author checkpoints can be evaluated without modifying the evaluator itself.
## Tested Compatibility Results
### Best usable author-checkpoint path
Config:
- `configs/drf/drf_author_eval_118_aligned_1gpu.yaml`
Dataset/runtime:
- dataset root: `Scoliosis1K-drf-pkl-118-aligned`
- partition: `Scoliosis1K_118.json`
- transform: `BaseSilCuttingTransform`
- label order:
- `negative`
- `positive`
- `neutral`
Result:
- `80.24 Acc / 76.73 Prec / 76.40 Rec / 76.56 F1`
This is the strongest recovered path so far.
### Other tested paths
`configs/drf/drf_author_eval_118_splitroot_1gpu.yaml`
- dataset root: `Scoliosis1K-drf-pkl-118`
- result:
- `77.17 Acc / 73.61 Prec / 72.59 Rec / 72.98 F1`
`configs/drf/drf_author_eval_112_1gpu.yaml`
- dataset root: `Scoliosis1K-drf-pkl`
- partition: `Scoliosis1K_112.json`
- result:
- `85.19 Acc / 57.98 Prec / 56.65 Rec / 57.30 F1`
`configs/drf/drf_author_eval_118_paper_1gpu.yaml`
- dataset root: `Scoliosis1K-drf-pkl-118-paper`
- transform: `BaseSilTransform`
- result:
- `27.24 Acc / 9.08 Prec / 33.33 Rec / 14.27 F1`
## Interpretation
What these results mean:
- the checkpoint is not garbage
- the original “very bad” local eval was mostly a compatibility failure
- the largest single hidden bug was the class-order mismatch
- the author checkpoint is also sensitive to which local DRF dataset root is used
What they do **not** mean:
- we have perfectly reconstructed the authors original training path
- the provided YAML is trustworthy as-is
- the papers full DRF claim is fully reproduced here
The strongest recovered result:
- `80.24 / 76.73 / 76.40 / 76.56`
This is close to the papers reported `ScoNet-MT^ske` F1 and much better than our earlier broken compat evals, but it is still below the papers DRF headline result:
- paper DRF: `86.0 Acc / 84.1 Prec / 79.2 Rec / 80.8 F1`
## Practical Recommendation
If someone wants to use the author checkpoint in this repo today, the recommended path is:
1. use `configs/drf/drf_author_eval_118_aligned_1gpu.yaml`
2. keep the author label order:
- `negative, positive, neutral`
3. keep the legacy `attention_layer -> PGA` remap in the model
4. do **not** assume the stale `112` YAML is the correct training/eval contract
If someone wants to push this further, the highest-value next step is:
- finetune from the author checkpoint on the aligned `118` path instead of starting DRF from scratch
## How To Run
Recommended eval:
```bash
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29693 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_aligned_1gpu.yaml \
--phase test
```
Other compatibility checks:
```bash
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29695 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_112_1gpu.yaml \
--phase test
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29696 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_splitroot_1gpu.yaml \
--phase test
CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
uv run torchrun --nproc_per_node=1 --master_port=29697 \
opengait/main.py \
--cfgs ./configs/drf/drf_author_eval_118_paper_1gpu.yaml \
--phase test
```
If someone wants to reproduce this on another machine, the usual paths to change are:
- `data_cfg.dataset_root`
- `data_cfg.dataset_partition`
- `evaluator_cfg.restore_hint`
The archived artifact bundle is:
- `artifact/scoliosis_drf_author_118_compat`