73 lines
2.9 KiB
Markdown
73 lines
2.9 KiB
Markdown
# Get Started
|
|
## Installation
|
|
1. clone this repo.
|
|
```
|
|
git clone https://github.com/ShiqiYu/OpenGait.git
|
|
```
|
|
2. Install dependenices:
|
|
- pytorch >= 1.10
|
|
- torchvision
|
|
- pyyaml
|
|
- tensorboard
|
|
- opencv-python
|
|
- tqdm
|
|
- py7zr
|
|
- kornia
|
|
- einops
|
|
|
|
Install dependenices by [Anaconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html):
|
|
```
|
|
conda install tqdm pyyaml tensorboard opencv kornia einops -c conda-forge
|
|
conda install pytorch==1.10 torchvision -c pytorch
|
|
```
|
|
Or, Install dependenices by pip:
|
|
```
|
|
pip install tqdm pyyaml tensorboard opencv-python kornia einops
|
|
pip install torch==1.10 torchvision==0.11
|
|
```
|
|
## Prepare dataset
|
|
See [prepare dataset](2.prepare_dataset.md).
|
|
|
|
## Get trained model
|
|
- Option 1:
|
|
```
|
|
python misc/download_pretrained_model.py
|
|
```
|
|
- Option 2: Go to the [release page](https://github.com/ShiqiYu/OpenGait/releases/), then download the model file and uncompress it to [output](output).
|
|
|
|
## Train
|
|
Train a model by
|
|
```
|
|
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opengait/main.py --cfgs ./configs/baseline/baseline.yaml --phase train
|
|
```
|
|
- `python -m torch.distributed.launch` [DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) launch instruction.
|
|
- `--nproc_per_node` The number of gpus to use, and it must equal the length of `CUDA_VISIBLE_DEVICES`.
|
|
- `--cfgs` The path to config file.
|
|
- `--phase` Specified as `train`.
|
|
<!-- - `--iter` You can specify a number of iterations or use `restore_hint` in the config file and resume training from there. -->
|
|
- `--log_to_file` If specified, the terminal log will be written on disk simultaneously.
|
|
|
|
You can run commands in [train.sh](train.sh) for training different models.
|
|
|
|
For long-running local jobs, prefer the supervised `systemd-run --user` workflow documented in [systemd-run-training.md](systemd-run-training.md). It uses `torchrun`, UUID-based GPU selection, real log files, and survives shell/session teardown more reliably than `nohup ... &`.
|
|
|
|
## Test
|
|
Evaluate the trained model by
|
|
```
|
|
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opengait/main.py --cfgs ./configs/baseline/baseline.yaml --phase test
|
|
```
|
|
- `--phase` Specified as `test`.
|
|
- `--iter` Specify a iteration checkpoint.
|
|
|
|
**Tip**: Other arguments are the same as train phase.
|
|
|
|
You can run commands in [test.sh](test.sh) for testing different models.
|
|
|
|
## Customize
|
|
1. Read the [detailed config](docs/1.detailed_config.md) to figure out the usage of needed setting items;
|
|
2. See [how to create your model](docs/2.how_to_create_your_model.md);
|
|
3. There are some advanced usages, refer to [advanced usages](docs/3.advanced_usages.md), please.
|
|
|
|
## Warning
|
|
- In `DDP` mode, zombie processes may be generated when the program terminates abnormally. You can use this command [sh misc/clean_process.sh](./misc/clean_process.sh) to clear them.
|