1.1 KiB
1.1 KiB
DATA PIPELINE KNOWLEDGE BASE
OVERVIEW
opengait/data/ converts preprocessed dataset trees into training/evaluation batches for all models.
WHERE TO LOOK
| Task | Location | Notes |
|---|---|---|
| Dataset parsing + file loading | dataset.py |
expects partition json and .pkl sequence files |
| Sequence sampling strategy | collate_fn.py |
fixed/unfixed/all + ordered/unordered behavior |
| Augmentations/transforms | transform.py |
transform factories resolved from config |
| Batch identity sampling | sampler.py |
sampler types referenced from config |
CONVENTIONS
- Dataset root layout is
id/type/view/*.pklafter preprocessing. dataset_partitionJSON withTRAIN_SET/TEST_SETis required.sample_typedrives control flow (fixed_unordered,all_ordered, etc.) and shape semantics downstream.
ANTI-PATTERNS
- Never pass non-
.pklsequence files (dataset.pyraises hard ValueError). - Don’t violate expected
batch_sizesemantics for triplet samplers ([P, K]list). - Don’t assume all models use identical feature counts; collate is feature-index sensitive.