Add comprehensive knowledge base documentation across multiple domains

2026-02-12 14:36:37 +08:00
parent f754f6f383
commit 0fdd35bd78
8 changed files with 336 additions and 0 deletions
@@ -0,0 +1,33 @@
+# OPENGAIT RUNTIME KNOWLEDGE BASE
+
+## OVERVIEW
+`opengait/` is the runtime package: distributed launch entry, model lifecycle orchestration, data/evaluation integration.
+
+## STRUCTURE
+```text
+opengait/
+├── main.py         # DDP entrypoint + config load + model dispatch
+├── modeling/       # BaseModel + model/backbone/loss registries
+├── data/           # dataset parser + sampler/collate/transform
+├── evaluation/     # benchmark-specific evaluation functions
+└── utils/          # config merge, DDP passthrough, logging helpers
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Start train/test flow | `main.py` | parses `--cfgs`/`--phase`, initializes DDP |
+| Resolve model name from YAML | `modeling/models/__init__.py` | class auto-registration via iter_modules |
+| Build full train loop | `modeling/base_model.py` | loaders, optimizer/scheduler, ckpt, inference |
+| Merge config with defaults | `utils/common.py::config_loader` | overlays onto `configs/default.yaml` |
+| Shared logging | `utils/msg_manager.py` | global message manager |
+
+## CONVENTIONS
+- Imports are package-relative-at-runtime (`from modeling...`, `from data...`, `from utils...`) because `opengait/main.py` is launched as script target.
+- Runtime is DDP-first; non-DDP assumptions are usually invalid.
+- Losses and models are configured by names, not direct imports in `main.py`.
+
+## ANTI-PATTERNS
+- Don’t bypass `config_loader`; default config merge is expected by all modules.
+- Don’t instantiate models outside registry path (`modeling/models`), or YAML `model_cfg.model` lookup breaks.
+- Don’t bypass `get_ddp_module`; attribute passthrough wrapper is used for downstream method access.
@@ -0,0 +1,22 @@
+# DATA PIPELINE KNOWLEDGE BASE
+
+## OVERVIEW
+`opengait/data/` converts preprocessed dataset trees into training/evaluation batches for all models.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Dataset parsing + file loading | `dataset.py` | expects partition json and `.pkl` sequence files |
+| Sequence sampling strategy | `collate_fn.py` | fixed/unfixed/all + ordered/unordered behavior |
+| Augmentations/transforms | `transform.py` | transform factories resolved from config |
+| Batch identity sampling | `sampler.py` | sampler types referenced from config |
+
+## CONVENTIONS
+- Dataset root layout is `id/type/view/*.pkl` after preprocessing.
+- `dataset_partition` JSON with `TRAIN_SET` / `TEST_SET` is required.
+- `sample_type` drives control flow (`fixed_unordered`, `all_ordered`, etc.) and shape semantics downstream.
+
+## ANTI-PATTERNS
+- Never pass non-`.pkl` sequence files (`dataset.py` raises hard ValueError).
+- Don’t violate expected `batch_size` semantics for triplet samplers (`[P, K]` list).
+- Don’t assume all models use identical feature counts; collate is feature-index sensitive.
@@ -0,0 +1,33 @@
+# MODELING DOMAIN KNOWLEDGE BASE
+
+## OVERVIEW
+`opengait/modeling/` defines model contracts and algorithm implementations: `BaseModel`, loss aggregation, backbones, concrete model classes.
+
+## STRUCTURE
+```text
+opengait/modeling/
+├── base_model.py        # canonical train/test lifecycle
+├── loss_aggregator.py   # training_feat -> weighted summed loss
+├── modules.py           # shared NN building blocks
+├── backbones/           # backbone registry + implementations
+├── losses/              # loss registry + implementations
+└── models/              # concrete methods (Baseline, ScoNet, DeepGaitV2, ...)
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Add new model | `models/*.py` + `docs/4.how_to_create_your_model.md` | must inherit `BaseModel` |
+| Add new loss | `losses/*.py` | expose via dynamic registry |
+| Change training lifecycle | `base_model.py` | affects every model |
+| Debug feature/loss key mismatches | `loss_aggregator.py` | checks `training_feat` keys vs `loss_cfg.log_prefix` |
+
+## CONVENTIONS
+- `forward()` output contract is fixed dict with keys: `training_feat`, `visual_summary`, `inference_feat`.
+- `training_feat` subkeys must align with configured `loss_cfg[*].log_prefix`.
+- Backbones/losses/models are discovered dynamically via package `__init__.py`; filenames matter operationally.
+
+## ANTI-PATTERNS
+- Do not return arbitrary forward outputs; `LossAggregator` and evaluator assume fixed contract.
+- Do not put model classes outside `models/`; config lookup by `getattr(models, name)` depends on registry.
+- Do not ignore DDP loss wrapping (`get_ddp_module`) in loss construction.
@@ -0,0 +1,23 @@
+# MODEL ZOO IMPLEMENTATION KNOWLEDGE BASE
+
+## OVERVIEW
+This directory is the algorithm zoo. Each file usually contributes one `BaseModel` subclass selected by `model_cfg.model`.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Baseline pattern | `baseline.py` | minimal template for silhouette models |
+| Scoliosis pipeline | `sconet.py` | label remapping + screening-specific head |
+| Large-model fusion | `BiggerGait_DINOv2.py`, `BigGait.py` | external pretrained dependencies |
+| Diffusion/noise handling | `denoisinggait.py`, `diffgait_utils/` | high-complexity flow/feature fusion |
+| Skeleton variants | `skeletongait++.py`, `gaitgraph1.py`, `gaitgraph2.py` | pose-map/graph assumptions |
+
+## CONVENTIONS
+- Most models follow: preprocess input -> backbone -> temporal pooling -> horizontal pooling -> neck/head -> contract dict.
+- Input modality assumptions differ by model (silhouette / RGB / pose / multimodal); config and preprocess script must match.
+- Many models rely on utilities from `modeling/modules.py`; shared changes there are high blast-radius.
+
+## ANTI-PATTERNS
+- Don’t mix modality assumptions silently (e.g., pose tensor layout vs silhouette layout).
+- Don’t rename classes without updating `model_cfg.model` references in configs.
+- Don’t treat `BigGait_utils`/`diffgait_utils` as generic utilities; they are model-family specific.