1.7 KiB
1.7 KiB
DATASET PREP KNOWLEDGE BASE
OVERVIEW
datasets/ is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGait’s required pickle layout and partition metadata.
STRUCTURE
datasets/
├── pretreatment.py # generic image->pkl pipeline (and pose mode)
├── pretreatment_heatmap.py # heatmap generation for skeleton workflows
├── <DatasetName>/README.md # dataset-specific acquisition + conversion steps
├── <DatasetName>/*.json # train/test partition files
└── <DatasetName>/*.py # extract/rearrange/convert scripts
WHERE TO LOOK
| Task | Location | Notes |
|---|---|---|
| Generic preprocessing | pretreatment.py |
handles multiple datasets, pose switch |
| OUMVLP pose index flow | OUMVLP/README.md, OUMVLP/pose_index_extractor.py |
required for temporal consistency |
| Heatmap + skeleton prep | pretreatment_heatmap.py, ln_sil_heatmap.py, configs/skeletongait/README.md |
multi-step pipeline |
| Dataset splits | <Dataset>/<Dataset>.json |
consumed by runtime data_cfg.dataset_partition |
CONVENTIONS
- Final runtime-ready format is
id/type/view/*.pkl. - Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly.
- Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment.
ANTI-PATTERNS
- Don’t point runtime to raw image trees; training expects pkl-converted structure.
- Don’t skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser.
- Don’t ignore documented optional/required flags in per-dataset README commands.