Add comprehensive knowledge base documentation across multiple domains

This commit is contained in:
2026-02-12 14:36:37 +08:00
parent f754f6f383
commit 0fdd35bd78
8 changed files with 336 additions and 0 deletions
+33
View File
@@ -0,0 +1,33 @@
# OPENGAIT RUNTIME KNOWLEDGE BASE
## OVERVIEW
`opengait/` is the runtime package: distributed launch entry, model lifecycle orchestration, data/evaluation integration.
## STRUCTURE
```text
opengait/
├── main.py # DDP entrypoint + config load + model dispatch
├── modeling/ # BaseModel + model/backbone/loss registries
├── data/ # dataset parser + sampler/collate/transform
├── evaluation/ # benchmark-specific evaluation functions
└── utils/ # config merge, DDP passthrough, logging helpers
```
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Start train/test flow | `main.py` | parses `--cfgs`/`--phase`, initializes DDP |
| Resolve model name from YAML | `modeling/models/__init__.py` | class auto-registration via iter_modules |
| Build full train loop | `modeling/base_model.py` | loaders, optimizer/scheduler, ckpt, inference |
| Merge config with defaults | `utils/common.py::config_loader` | overlays onto `configs/default.yaml` |
| Shared logging | `utils/msg_manager.py` | global message manager |
## CONVENTIONS
- Imports are package-relative-at-runtime (`from modeling...`, `from data...`, `from utils...`) because `opengait/main.py` is launched as script target.
- Runtime is DDP-first; non-DDP assumptions are usually invalid.
- Losses and models are configured by names, not direct imports in `main.py`.
## ANTI-PATTERNS
- Dont bypass `config_loader`; default config merge is expected by all modules.
- Dont instantiate models outside registry path (`modeling/models`), or YAML `model_cfg.model` lookup breaks.
- Dont bypass `get_ddp_module`; attribute passthrough wrapper is used for downstream method access.
+22
View File
@@ -0,0 +1,22 @@
# DATA PIPELINE KNOWLEDGE BASE
## OVERVIEW
`opengait/data/` converts preprocessed dataset trees into training/evaluation batches for all models.
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Dataset parsing + file loading | `dataset.py` | expects partition json and `.pkl` sequence files |
| Sequence sampling strategy | `collate_fn.py` | fixed/unfixed/all + ordered/unordered behavior |
| Augmentations/transforms | `transform.py` | transform factories resolved from config |
| Batch identity sampling | `sampler.py` | sampler types referenced from config |
## CONVENTIONS
- Dataset root layout is `id/type/view/*.pkl` after preprocessing.
- `dataset_partition` JSON with `TRAIN_SET` / `TEST_SET` is required.
- `sample_type` drives control flow (`fixed_unordered`, `all_ordered`, etc.) and shape semantics downstream.
## ANTI-PATTERNS
- Never pass non-`.pkl` sequence files (`dataset.py` raises hard ValueError).
- Dont violate expected `batch_size` semantics for triplet samplers (`[P, K]` list).
- Dont assume all models use identical feature counts; collate is feature-index sensitive.
+33
View File
@@ -0,0 +1,33 @@
# MODELING DOMAIN KNOWLEDGE BASE
## OVERVIEW
`opengait/modeling/` defines model contracts and algorithm implementations: `BaseModel`, loss aggregation, backbones, concrete model classes.
## STRUCTURE
```text
opengait/modeling/
├── base_model.py # canonical train/test lifecycle
├── loss_aggregator.py # training_feat -> weighted summed loss
├── modules.py # shared NN building blocks
├── backbones/ # backbone registry + implementations
├── losses/ # loss registry + implementations
└── models/ # concrete methods (Baseline, ScoNet, DeepGaitV2, ...)
```
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Add new model | `models/*.py` + `docs/4.how_to_create_your_model.md` | must inherit `BaseModel` |
| Add new loss | `losses/*.py` | expose via dynamic registry |
| Change training lifecycle | `base_model.py` | affects every model |
| Debug feature/loss key mismatches | `loss_aggregator.py` | checks `training_feat` keys vs `loss_cfg.log_prefix` |
## CONVENTIONS
- `forward()` output contract is fixed dict with keys: `training_feat`, `visual_summary`, `inference_feat`.
- `training_feat` subkeys must align with configured `loss_cfg[*].log_prefix`.
- Backbones/losses/models are discovered dynamically via package `__init__.py`; filenames matter operationally.
## ANTI-PATTERNS
- Do not return arbitrary forward outputs; `LossAggregator` and evaluator assume fixed contract.
- Do not put model classes outside `models/`; config lookup by `getattr(models, name)` depends on registry.
- Do not ignore DDP loss wrapping (`get_ddp_module`) in loss construction.
+23
View File
@@ -0,0 +1,23 @@
# MODEL ZOO IMPLEMENTATION KNOWLEDGE BASE
## OVERVIEW
This directory is the algorithm zoo. Each file usually contributes one `BaseModel` subclass selected by `model_cfg.model`.
## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Baseline pattern | `baseline.py` | minimal template for silhouette models |
| Scoliosis pipeline | `sconet.py` | label remapping + screening-specific head |
| Large-model fusion | `BiggerGait_DINOv2.py`, `BigGait.py` | external pretrained dependencies |
| Diffusion/noise handling | `denoisinggait.py`, `diffgait_utils/` | high-complexity flow/feature fusion |
| Skeleton variants | `skeletongait++.py`, `gaitgraph1.py`, `gaitgraph2.py` | pose-map/graph assumptions |
## CONVENTIONS
- Most models follow: preprocess input -> backbone -> temporal pooling -> horizontal pooling -> neck/head -> contract dict.
- Input modality assumptions differ by model (silhouette / RGB / pose / multimodal); config and preprocess script must match.
- Many models rely on utilities from `modeling/modules.py`; shared changes there are high blast-radius.
## ANTI-PATTERNS
- Dont mix modality assumptions silently (e.g., pose tensor layout vs silhouette layout).
- Dont rename classes without updating `model_cfg.model` references in configs.
- Dont treat `BigGait_utils`/`diffgait_utils` as generic utilities; they are model-family specific.