Files

23 lines
1.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DATA PIPELINE KNOWLEDGE BASE
## OVERVIEW
`opengait/data/` converts preprocessed dataset trees into training/evaluation batches for all models.
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Dataset parsing + file loading | `dataset.py` | expects partition json and `.pkl` sequence files |
| Sequence sampling strategy | `collate_fn.py` | fixed/unfixed/all + ordered/unordered behavior |
| Augmentations/transforms | `transform.py` | transform factories resolved from config |
| Batch identity sampling | `sampler.py` | sampler types referenced from config |
## CONVENTIONS
- Dataset root layout is `id/type/view/*.pkl` after preprocessing.
- `dataset_partition` JSON with `TRAIN_SET` / `TEST_SET` is required.
- `sample_type` drives control flow (`fixed_unordered`, `all_ordered`, etc.) and shape semantics downstream.
## ANTI-PATTERNS
- Never pass non-`.pkl` sequence files (`dataset.py` raises hard ValueError).
- Dont violate expected `batch_size` semantics for triplet samplers (`[P, K]` list).
- Dont assume all models use identical feature counts; collate is feature-index sensitive.