Add comprehensive knowledge base documentation across multiple domains

This commit is contained in:
2026-02-12 14:36:37 +08:00
parent f754f6f383
commit 0fdd35bd78
8 changed files with 336 additions and 0 deletions
+32
View File
@@ -0,0 +1,32 @@
# DATASET PREP KNOWLEDGE BASE
## OVERVIEW
`datasets/` is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGaits required pickle layout and partition metadata.
## STRUCTURE
```text
datasets/
├── pretreatment.py # generic image->pkl pipeline (and pose mode)
├── pretreatment_heatmap.py # heatmap generation for skeleton workflows
├── <DatasetName>/README.md # dataset-specific acquisition + conversion steps
├── <DatasetName>/*.json # train/test partition files
└── <DatasetName>/*.py # extract/rearrange/convert scripts
```
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Generic preprocessing | `pretreatment.py` | handles multiple datasets, pose switch |
| OUMVLP pose index flow | `OUMVLP/README.md`, `OUMVLP/pose_index_extractor.py` | required for temporal consistency |
| Heatmap + skeleton prep | `pretreatment_heatmap.py`, `ln_sil_heatmap.py`, `configs/skeletongait/README.md` | multi-step pipeline |
| Dataset splits | `<Dataset>/<Dataset>.json` | consumed by runtime `data_cfg.dataset_partition` |
## CONVENTIONS
- Final runtime-ready format is `id/type/view/*.pkl`.
- Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly.
- Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment.
## ANTI-PATTERNS
- Dont point runtime to raw image trees; training expects pkl-converted structure.
- Dont skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser.
- Dont ignore documented optional/required flags in per-dataset README commands.