33 lines
1.7 KiB
Markdown
33 lines
1.7 KiB
Markdown
# DATASET PREP KNOWLEDGE BASE
|
||
|
||
## OVERVIEW
|
||
`datasets/` is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGait’s required pickle layout and partition metadata.
|
||
|
||
## STRUCTURE
|
||
```text
|
||
datasets/
|
||
├── pretreatment.py # generic image->pkl pipeline (and pose mode)
|
||
├── pretreatment_heatmap.py # heatmap generation for skeleton workflows
|
||
├── <DatasetName>/README.md # dataset-specific acquisition + conversion steps
|
||
├── <DatasetName>/*.json # train/test partition files
|
||
└── <DatasetName>/*.py # extract/rearrange/convert scripts
|
||
```
|
||
|
||
## WHERE TO LOOK
|
||
| Task | Location | Notes |
|
||
|------|----------|-------|
|
||
| Generic preprocessing | `pretreatment.py` | handles multiple datasets, pose switch |
|
||
| OUMVLP pose index flow | `OUMVLP/README.md`, `OUMVLP/pose_index_extractor.py` | required for temporal consistency |
|
||
| Heatmap + skeleton prep | `pretreatment_heatmap.py`, `ln_sil_heatmap.py`, `configs/skeletongait/README.md` | multi-step pipeline |
|
||
| Dataset splits | `<Dataset>/<Dataset>.json` | consumed by runtime `data_cfg.dataset_partition` |
|
||
|
||
## CONVENTIONS
|
||
- Final runtime-ready format is `id/type/view/*.pkl`.
|
||
- Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly.
|
||
- Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment.
|
||
|
||
## ANTI-PATTERNS
|
||
- Don’t point runtime to raw image trees; training expects pkl-converted structure.
|
||
- Don’t skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser.
|
||
- Don’t ignore documented optional/required flags in per-dataset README commands.
|