Files
OpenGait/docs/0.get_started.md

73 lines
2.9 KiB
Markdown

# Get Started
## Installation
1. clone this repo.
```
git clone https://github.com/ShiqiYu/OpenGait.git
```
2. Install dependenices:
- pytorch >= 1.10
- torchvision
- pyyaml
- tensorboard
- opencv-python
- tqdm
- py7zr
- kornia
- einops
Install dependenices by [Anaconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html):
```
conda install tqdm pyyaml tensorboard opencv kornia einops -c conda-forge
conda install pytorch==1.10 torchvision -c pytorch
```
Or, Install dependenices by pip:
```
pip install tqdm pyyaml tensorboard opencv-python kornia einops
pip install torch==1.10 torchvision==0.11
```
## Prepare dataset
See [prepare dataset](2.prepare_dataset.md).
## Get trained model
- Option 1:
```
python misc/download_pretrained_model.py
```
- Option 2: Go to the [release page](https://github.com/ShiqiYu/OpenGait/releases/), then download the model file and uncompress it to [output](output).
## Train
Train a model by
```
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opengait/main.py --cfgs ./configs/baseline/baseline.yaml --phase train
```
- `python -m torch.distributed.launch` [DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) launch instruction.
- `--nproc_per_node` The number of gpus to use, and it must equal the length of `CUDA_VISIBLE_DEVICES`.
- `--cfgs` The path to config file.
- `--phase` Specified as `train`.
<!-- - `--iter` You can specify a number of iterations or use `restore_hint` in the config file and resume training from there. -->
- `--log_to_file` If specified, the terminal log will be written on disk simultaneously.
You can run commands in [train.sh](train.sh) for training different models.
For long-running local jobs, prefer the supervised `systemd-run --user` workflow documented in [systemd-run-training.md](systemd-run-training.md). It uses `torchrun`, UUID-based GPU selection, real log files, and survives shell/session teardown more reliably than `nohup ... &`.
## Test
Evaluate the trained model by
```
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opengait/main.py --cfgs ./configs/baseline/baseline.yaml --phase test
```
- `--phase` Specified as `test`.
- `--iter` Specify a iteration checkpoint.
**Tip**: Other arguments are the same as train phase.
You can run commands in [test.sh](test.sh) for testing different models.
## Customize
1. Read the [detailed config](docs/1.detailed_config.md) to figure out the usage of needed setting items;
2. See [how to create your model](docs/2.how_to_create_your_model.md);
3. There are some advanced usages, refer to [advanced usages](docs/3.advanced_usages.md), please.
## Warning
- In `DDP` mode, zombie processes may be generated when the program terminates abnormally. You can use this command [sh misc/clean_process.sh](./misc/clean_process.sh) to clear them.