feat: add systemd-run training launcher and docs
This commit is contained in:
@@ -49,6 +49,8 @@ CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 o
|
||||
|
||||
You can run commands in [train.sh](train.sh) for training different models.
|
||||
|
||||
For long-running local jobs, prefer the supervised `systemd-run --user` workflow documented in [systemd-run-training.md](systemd-run-training.md). It uses `torchrun`, UUID-based GPU selection, real log files, and survives shell/session teardown more reliably than `nohup ... &`.
|
||||
|
||||
## Test
|
||||
Evaluate the trained model by
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user