Add comprehensive knowledge base documentation across multiple domains

2026-02-12 14:36:37 +08:00
parent f754f6f383
commit 0fdd35bd78
8 changed files with 336 additions and 0 deletions
@@ -0,0 +1,32 @@
+# DATASET PREP KNOWLEDGE BASE
+
+## OVERVIEW
+`datasets/` is a script-heavy preprocessing workspace. It transforms raw benchmarks into OpenGait’s required pickle layout and partition metadata.
+
+## STRUCTURE
+```text
+datasets/
+├── pretreatment.py              # generic image->pkl pipeline (and pose mode)
+├── pretreatment_heatmap.py      # heatmap generation for skeleton workflows
+├── <DatasetName>/README.md      # dataset-specific acquisition + conversion steps
+├── <DatasetName>/*.json         # train/test partition files
+└── <DatasetName>/*.py           # extract/rearrange/convert scripts
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Generic preprocessing | `pretreatment.py` | handles multiple datasets, pose switch |
+| OUMVLP pose index flow | `OUMVLP/README.md`, `OUMVLP/pose_index_extractor.py` | required for temporal consistency |
+| Heatmap + skeleton prep | `pretreatment_heatmap.py`, `ln_sil_heatmap.py`, `configs/skeletongait/README.md` | multi-step pipeline |
+| Dataset splits | `<Dataset>/<Dataset>.json` | consumed by runtime `data_cfg.dataset_partition` |
+
+## CONVENTIONS
+- Final runtime-ready format is `id/type/view/*.pkl`.
+- Many dataset folders provide both rearrange and extraction scripts; follow README ordering strictly.
+- Some pipelines require auxiliary artifacts (e.g., OUMVLP pose match indices) before pretreatment.
+
+## ANTI-PATTERNS
+- Don’t point runtime to raw image trees; training expects pkl-converted structure.
+- Don’t skip dataset-specific rearrange steps; many raw layouts are incompatible with runtime parser.
+- Don’t ignore documented optional/required flags in per-dataset README commands.