# DATASET PREP KNOWLEDGE BASE

## OVERVIEW
`datasets/` is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGait’s required pickle layout and partition metadata.

## STRUCTURE
```text
datasets/
├── pretreatment.py              # generic image->pkl pipeline (and pose mode)
├── pretreatment_heatmap.py      # heatmap generation for skeleton workflows
├── <DatasetName>/README.md      # dataset-specific acquisition + conversion steps
├── <DatasetName>/*.json         # train/test partition files
└── <DatasetName>/*.py           # extract/rearrange/convert scripts
```

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Generic preprocessing | `pretreatment.py` | handles multiple datasets, pose switch |
| OUMVLP pose index flow | `OUMVLP/README.md`, `OUMVLP/pose_index_extractor.py` | required for temporal consistency |
| Heatmap + skeleton prep | `pretreatment_heatmap.py`, `ln_sil_heatmap.py`, `configs/skeletongait/README.md` | multi-step pipeline |
| Dataset splits | `<Dataset>/<Dataset>.json` | consumed by runtime `data_cfg.dataset_partition` |

## CONVENTIONS
- Final runtime-ready format is `id/type/view/*.pkl`.
- Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly.
- Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment.

## ANTI-PATTERNS
- Don’t point runtime to raw image trees; training expects pkl-converted structure.
- Don’t skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser.
- Don’t ignore documented optional/required flags in per-dataset README commands.