Add resumable ScoNet skeleton training diagnostics
This commit is contained in:
@@ -89,6 +89,8 @@ CUDA_VISIBLE_DEVICES=0,1 uv run python -m torch.distributed.launch --nproc_per_n
|
||||
```
|
||||
|
||||
> **Note:** The `--nproc_per_node` argument must exactly match the number of GPUs specified in `CUDA_VISIBLE_DEVICES`. For single-GPU evaluation, use `CUDA_VISIBLE_DEVICES=0` and `--nproc_per_node=1` with the DDP launcher.
|
||||
>
|
||||
> **Resume Tip:** To survive interrupted training runs, set `trainer_cfg.resume_every_iter` to a non-zero value and optionally `trainer_cfg.auto_resume_latest: true`. OpenGait will keep `output/.../checkpoints/latest.pt` updated for crash recovery.
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user