feat: retain best checkpoints and support alternate output roots

This commit is contained in:
2026-03-11 01:14:05 +08:00
parent 63e2ed1097
commit a0150c791f
14 changed files with 852 additions and 9 deletions
+27
View File
@@ -124,6 +124,33 @@ The launcher configures both:
This makes it easier to recover logs even if the original shell or tool session disappears.
## Moving outputs off the SSD
OpenGait writes checkpoints, TensorBoard summaries, best-checkpoint snapshots, and file logs under a run output root.
By default that root is `output/`, but you can override it per run with `output_root` in the engine config:
```yaml
trainer_cfg:
output_root: /mnt/hddl/data/OpenGait-output
evaluator_cfg:
output_root: /mnt/hddl/data/OpenGait-output
```
The final path layout stays the same under that root:
```text
<output_root>/<dataset>/<model>/<save_name>/
```
For long scoliosis runs, using an HDD-backed root is recommended so local SSD space is not consumed by:
- numbered checkpoints
- rolling resume checkpoints
- best-N retained checkpoints
- TensorBoard summary files
## GPU selection
Prefer GPU UUIDs, not ordinal indices.