Files

1.1 KiB
Raw Permalink Blame History

DATA PIPELINE KNOWLEDGE BASE

OVERVIEW

opengait/data/ converts preprocessed dataset trees into training/evaluation batches for all models.

WHERE TO LOOK

Task Location Notes
Dataset parsing + file loading dataset.py expects partition json and .pkl sequence files
Sequence sampling strategy collate_fn.py fixed/unfixed/all + ordered/unordered behavior
Augmentations/transforms transform.py transform factories resolved from config
Batch identity sampling sampler.py sampler types referenced from config

CONVENTIONS

  • Dataset root layout is id/type/view/*.pkl after preprocessing.
  • dataset_partition JSON with TRAIN_SET / TEST_SET is required.
  • sample_type drives control flow (fixed_unordered, all_ordered, etc.) and shape semantics downstream.

ANTI-PATTERNS

  • Never pass non-.pkl sequence files (dataset.py raises hard ValueError).
  • Dont violate expected batch_size semantics for triplet samplers ([P, K] list).
  • Dont assume all models use identical feature counts; collate is feature-index sensitive.