# DATASET PREP KNOWLEDGE BASE ## OVERVIEW `datasets/` is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGait’s required pickle layout and partition metadata. ## STRUCTURE ```text datasets/ ├── pretreatment.py # generic image->pkl pipeline (and pose mode) ├── pretreatment_heatmap.py # heatmap generation for skeleton workflows ├── /README.md # dataset-specific acquisition + conversion steps ├── /*.json # train/test partition files └── /*.py # extract/rearrange/convert scripts ``` ## WHERE TO LOOK | Task | Location | Notes | |------|----------|-------| | Generic preprocessing | `pretreatment.py` | handles multiple datasets, pose switch | | OUMVLP pose index flow | `OUMVLP/README.md`, `OUMVLP/pose_index_extractor.py` | required for temporal consistency | | Heatmap + skeleton prep | `pretreatment_heatmap.py`, `ln_sil_heatmap.py`, `configs/skeletongait/README.md` | multi-step pipeline | | Dataset splits | `/.json` | consumed by runtime `data_cfg.dataset_partition` | ## CONVENTIONS - Final runtime-ready format is `id/type/view/*.pkl`. - Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly. - Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment. ## ANTI-PATTERNS - Don’t point runtime to raw image trees; training expects pkl-converted structure. - Don’t skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser. - Don’t ignore documented optional/required flags in per-dataset README commands.