# ScoNet Checkpoint Evaluation Reproduction Notes This document records the findings and successful procedure for reproducing ScoNet checkpoint evaluation using `uv` and the OpenGait framework. ## Observed Failure Sequence and Root Causes ### 1. Missing Dependencies (Eager Auto-Import) OpenGait uses a dynamic registration pattern in `opengait/modeling/models/__init__.py`. When `main.py` imports `models`, it attempts to iterate through all modules in the `models/` directory. If any model file (e.g., `BiggerGait_DINOv2.py`) has dependencies not installed in the current environment (like `timm`), the entire program fails even if you are not using that specific model. **Root Cause:** `iter_modules` in `opengait/modeling/models/__init__.py` triggers imports of all sibling files. ### 2. GPU/World Size Mismatch The runtime enforces a strict equality between the number of visible GPUs and the DDP world size in `opengait/main.py`: ```python # opengait/main.py if torch.distributed.get_world_size() != torch.cuda.device_count(): raise ValueError("Expect number of available GPUs({}) equals to the world size({}).".format( torch.cuda.device_count(), torch.distributed.get_world_size())) ``` **Error Message:** `ValueError: Expect number of available GPUs(2) equals to the world size(1)` ### 3. Evaluator Sampler Batch Size Rule The evaluator enforces that the total batch size must equal the number of GPUs in testing mode, as checked in `opengait/modeling/base_model.py`. **Error Message:** `ValueError: The batch size (8) must be equal to the number of GPUs (1) in testing mode!` ## Successful Reproduction Environment - **Runtime:** `uv` with PEP 621 (`pyproject.toml`) - **Hardware:** 1 Visible GPU - **Dataset Path:** Symlinked at `datasets/Scoliosis1K` (user-created link pointing to the actual data root). ## Successful Command and Config ### Command ```bash CUDA_VISIBLE_DEVICES=0 uv run python -m torch.distributed.launch \ --nproc_per_node=1 \ opengait/main.py \ --cfgs ./configs/sconet/sconet_scoliosis1k_local_eval_1gpu.yaml \ --phase test ``` ### Config Highlights (`configs/sconet/sconet_scoliosis1k_local_eval_1gpu.yaml`) ```yaml data_cfg: dataset_root: ./datasets/Scoliosis1K/Scoliosis1K-sil-pkl dataset_partition: ./datasets/Scoliosis1K/Scoliosis1K_1116.json evaluator_cfg: restore_hint: ./ckpt/ScoNet-20000.pt sampler: batch_size: 1 # Must be integer for evaluation sample_type: all_ordered ``` ## Final Metrics The successful evaluation of the `ScoNet-20000.pt` checkpoint yielded: | Metric | Value | | :--- | :--- | | **Accuracy** | 80.88% | | **Macro Precision** | 81.50% | | **Macro Recall** | 78.82% | | **Macro F1** | 75.14% | ## Troubleshooting Checklist 1. **Environment:** Ensure all dependencies for *all* registered models are installed (e.g., `timm` for `BiggerGait_DINOv2.py`) to avoid eager import failures in `opengait/modeling/models/__init__.py`. 2. **GPU Visibility:** Match `CUDA_VISIBLE_DEVICES` count exactly with `--nproc_per_node` (checked in `opengait/main.py`). 3. **Config Check:** Verify `evaluator_cfg.sampler.batch_size` equals the number of GPUs (checked in `opengait/modeling/base_model.py`). 4. **Data Paths:** Ensure `dataset_root` and `dataset_partition` in the YAML point to valid paths (use symlinks under `datasets/` for convenience).