Add resumable ScoNet skeleton training diagnostics

This commit is contained in:
2026-03-09 15:57:13 +08:00
parent 4e0b0a18dc
commit 36aef46a0d
15 changed files with 1226 additions and 44 deletions
+7 -1
View File
@@ -59,9 +59,12 @@
### trainer_cfg
* Trainer configuration
> * Args
> * restore_hint: `int` value indicates the iteration number of restored checkpoint; `str` value indicates the path to restored checkpoint. The option is often used to finetune on new dataset or restore the interrupted training process.
> * restore_hint: `int` value indicates the iteration number of restored checkpoint; `str` value indicates the path to restored checkpoint. Use `latest` to restore the latest rolling resume checkpoint. The option is often used to finetune on new dataset or restore the interrupted training process.
> * auto_resume_latest: If `True` and `restore_hint==0`, automatically resume from `output/.../checkpoints/latest.pt` when it exists.
> * fix_BN: If `True`, we fix the weight of all `BatchNorm` layers.
> * log_iter: Log the information per `log_iter` iterations.
> * resume_every_iter: Save a rolling resume checkpoint every `resume_every_iter` iterations. These checkpoints update `checkpoints/latest.pt` and are intended for crash recovery.
> * resume_keep: Number of rolling resume checkpoints retained under `checkpoints/resume/`. Set `0` to keep all of them.
> * save_iter: Save the checkpoint per `save_iter` iterations.
> * with_test: If `True`, we test the model every `save_iter` iterations. A bit of performance impact.(*Disable in Default*)
> * optimizer_reset: If `True` and `restore_hint!=0`, reset the optimizer while restoring the model.
@@ -168,6 +171,9 @@ trainer_cfg:
log_iter: 100
restore_ckpt_strict: true
restore_hint: 0
auto_resume_latest: false
resume_every_iter: 500
resume_keep: 3
save_iter: 10000
save_name: Baseline
sync_BN: true