add config doc

2021-11-03 15:35:03 +08:00
parent 01935aac2f
commit 6e71c7ac34
3 changed files with 194 additions and 10 deletions
@@ -14,13 +14,13 @@ OpenGait is a flexible and extensible gait recognition project provided by the [

 # Model Zoo

-|                                                                                          Model                                                                                          |     NM     |     BG     |     CL     | Configuration                                                                                | Input Size | Inference Time |    Model Size    |
-| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------: | :--------: | :--------: | :------------------------------------------------------------------------------------------- | :--------: | :------------: | :--------------: |
-|                                                                                        Baseline                                                                                         |    96.3    |    92.2    |    77.6    | [baseline.yaml](config/baseline.yaml)                                                        |   64x44    |      12s       |      3.78M       |
-|                                                                [GaitSet(AAAI2019)](https://arxiv.org/pdf/1811.06186.pdf)                                                                | 95.8(95.0) | 90.0(87.2) | 75.4(70.4) | [gaitset.yaml](config/gaitset.yaml)                                                          |   64x44    |      11s       |      2.59M       |
-|                                                   [GaitPart(CVPR2020)](http://home.ustc.edu.cn/~saihui/papers/cvpr2020_gaitpart.pdf)                                                    | 96.1(96.2) | 90.7(91.5) | 78.7(78.7) | [gaitpart.yaml](config/gaitpart.yaml)                                                        |   64x44    |      22s       |      1.20M       |
+|                                                                                          Model                                                                                          |     NM     |     BG     |     CL     | Configuration                                                                                | Input Size | Inference Time |   Model Size   |
+| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------: | :--------: | :--------: | :------------------------------------------------------------------------------------------- | :--------: | :------------: | :------------: |
+|                                                                                        Baseline                                                                                         |    96.3    |    92.2    |    77.6    | [baseline.yaml](config/baseline.yaml)                                                        |   64x44    |      12s       |     3.78M      |
+|                                                                [GaitSet(AAAI2019)](https://arxiv.org/pdf/1811.06186.pdf)                                                                | 95.8(95.0) | 90.0(87.2) | 75.4(70.4) | [gaitset.yaml](config/gaitset.yaml)                                                          |   64x44    |      11s       |     2.59M      |
+|                                                   [GaitPart(CVPR2020)](http://home.ustc.edu.cn/~saihui/papers/cvpr2020_gaitpart.pdf)                                                    | 96.1(96.2) | 90.7(91.5) | 78.7(78.7) | [gaitpart.yaml](config/gaitpart.yaml)                                                        |   64x44    |      22s       |     1.20M      |
 |                                                        [GLN*(ECCV2020)](http://home.ustc.edu.cn/~saihui/papers/eccv2020_gln.pdf)                                                        | 96.4(95.6) | 93.1(92.0) | 81.0(77.2) | [gln_phase1.yaml](config/gln/gln_phase1.yaml), [gln_phase2.yaml](config/gln/gln_phase2.yaml) |   128x88   |      14s       | 8.54M / 14.70M |
-| [GaitGL(ICCV2021)](https://openaccess.thecvf.com/content/ICCV2021/papers/Lin_Gait_Recognition_via_Effective_Global-Local_Feature_Representation_and_Local_Temporal_ICCV_2021_paper.pdf) | 97.4(97.4) | 94.5(94.5) | 83.8(83.6) | [gaitgl.yaml](config/gaitgl.yaml)                                                            |   64x44    |      31s       |      3.10M       |
+| [GaitGL(ICCV2021)](https://openaccess.thecvf.com/content/ICCV2021/papers/Lin_Gait_Recognition_via_Effective_Global-Local_Feature_Representation_and_Local_Temporal_ICCV_2021_paper.pdf) | 97.4(97.4) | 94.5(94.5) | 83.8(83.6) | [gaitgl.yaml](config/gaitgl.yaml)                                                            |   64x44    |      31s       |     3.10M      |

 The results in the parentheses are mentioned in the papers

@@ -60,7 +60,7 @@ It's inference process just cost about 90 secs(Baseline & 8 RTX6000).
 ## Prepare dataset
 See [prepare dataset](doc/prepare_dataset.md).

-## Get pretrained model
+## Get trained model
 - Option 1:
    ```
    python misc/download_pretrained_model.py
@@ -93,12 +93,13 @@ CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 l

 You can run commands in [test.sh](test.sh) for testing different models.
 ## Customize
-If you want customize your own model, see [here](doc/how_to_create_your_model.md).
+1. First, you need to read the [config documentation](doc/detailed_config.md) to figure out the usage of every item.
+2. If you want create your own model, see [here](doc/how_to_create_your_model.md).

 # Warning
 - Some models may not be compatible with `AMP`, you can disable it by setting `enable_float16` **False**.
- In `DDP` mode, zombie processes may occur when the program terminates abnormally. You can use this command `kill $(ps aux | grep main.py | grep -v grep | awk '{print $2}')` to clear them. 
- We implemented the functionality of testing while training, but it slightly affected the results. None of our published models use this functionality. You can disable it by setting `with_test` **False**.
+- In `DDP` mode, zombie processes may be generated when the program terminates abnormally. You can use this command `kill $(ps aux | grep main.py | grep -v grep | awk '{print $2}')` to clear them. 
+- We implemented the functionality about testing while training, but it slightly affected the results. None of our published models use this functionality. You can disable it by setting `with_test` **False**.

 # Authors:
 **Open Gait Team (OGT)**
@@ -0,0 +1,181 @@
+# Configuration item
+
+### data_cfg
+* Data configuration
+>
+>  * Args
+>     * dataset_name: Dataset name. Only support `CASIA-B`.
+>     * dataset_root: The path of storing your dataset.
+>     * num_workers: The number of workers to collect data.
+>     * dataset_partition: The path of storing your dataset partition file. It splits the dataset to two parts, including train set and test set.
+>     * cache: If `True`, load all data to memory during buiding dataset.
+>     * test_dataset_name: The name of test dataset. 
+----
+
+### loss_cfg
+* Loss function
+>  * Args
+>     * type: Loss function type, support `TripletLoss` and `CrossEntropyLoss`
+>     * loss_term_weights: loss weight.
+>     * log_prefix: the prefix of loss log.
+
+----
+### optimizer_cfg
+* Optimizer
+>  * Args
+>     * solver: Optimizer type, example: `SGD`, `Adam`
+>     * **others**: Please refer to `torch.optim`
+
+
+### scheduler_cfg
+* Learning rate scheduler
+>  * Args
+>     * scheduler : Learning rate scheduler, example: `MultiStepLR`
+>     * **others** : Please refer to `torch.optim.lr_scheduler`
+----
+### model_cfg
+* Model to be trained
+>  * Args
+>     * model : Model type, please refer to [Model Library](../lib/modeling/models) for the supported values
+>     * **others** : Please refer to [Training Configuration File of Corresponding Model](../config)
+----
+### evaluator_cfg
+* Evaluator configuration
+>  * Args
+>     * enable_float16: If `True`, enable auto mixed precision.
+>     * restore_ckpt_strict: If `True`, check whether the checkpoint is the same as the model.
+>     * restore_hint: `int` value indicates the iteration number of restored checkpoint; `str` value indicates the path of restored checkpoint.
+>     * save_name: The name of the experiment.
+>     * eval_func: The function name of evaluation. For `CASIA-B`, choose `identification`.
+>     * sampler:
+>       - type: The name of sampler. Choose `InferenceSampler`
+>       - sample_type: In general, we use `all_ordered` to input all frames by its natural order, which makes sure the tests are consistent.
+>       - batch_size: In general, it should equal to the number of utilized GPU.
+>       - **others**: Please refer to [data.sampler](../lib/data/sampler.py) and [data.collate_fn](../lib/data/collate_fn.py)
+>     * transform: support `BaseSilCuttingTransform`, `BaseSilTransform`. The difference between them is `BaseSilCuttingTransform` cut the pixels on both sides horizontally.
+>     * metric: `euc` or `cos`, generally, `euc` performs better.
+
+----
+### trainer_cfg
+* Trainer configuration
+>  * Args
+>     * fix_BN: If `True`, we fix the weight of all `BatchNorm` layers.
+>     * log_iter: Every `log_iter` iterations, log the information.
+>     * save_iter: Every `save_iter` iterations, save the model.
+>     * with_test: If `True`, we test the model every `save_iter` iterations. A bit of performance impact.(*To Be Fixed*)
+>     * optimizer_reset: If `True` and `restore_hint!=0`, reset the optimizer while restoring the model.
+>     * scheduler_reset: If `True` and `restore_hint!=0`, reset the scheduler while restoring the model.
+>     * sync_BN: If `True`, applies Batch Normalization as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167).
+>     * total_iter: The total number of training iterations.
+>     * sampler:
+>       - type: The name of sampler. Choose `TripletSampler`
+>       - sample_type: `[all, fixed, unfixed]` indicates the number of frames used to test, while `[unordered, ordered]` means whether input sequence by its natural order. Example: `fixed_unordered` means selecting fixed number of frames randomly.
+>       - batch_size: *[P,K]*\
+>         **example**:
+>         - 8
+>         - 16
+>       - **others**: Please refer to [data.sampler](../lib/data/sampler.py) and [data.collate_fn](../lib/data/collate_fn.py)
+>     * **others**: Please refer to `evaluator_cfg`
+---
+**Note**: All configuatrarion items will merged into [default.yaml](../config/default.yaml), and the current configuration is preferable.
+
+# Example
+
+```yaml
+data_cfg:
+  dataset_name: CASIA-B
+  dataset_root:  your_path
+  dataset_partition: ./misc/partitions/CASIA-B_include_005.json
+  num_workers: 1
+  remove_no_gallery: false # Remove probe if no gallery for it
+  test_dataset_name: CASIA-B
+
+evaluator_cfg:
+  enable_float16: true
+  restore_ckpt_strict: true
+  restore_hint: 60000
+  save_name: Baseline
+  eval_func: identification
+  sampler:
+    batch_shuffle: false
+    batch_size: 16
+    sample_type: all_ordered # all indicates whole sequence used to test, while ordered means input sequence by its natural order; Other options:   fixed_unordered
+    frames_all_limit: 720 # limit the number of sampled frames to prevent out of memory
+  metric: euc # cos
+
+
+loss_cfg:
+  - loss_term_weights: 1.0
+    margin: 0.2
+    type: TripletLoss
+    log_prefix: triplet
+  - loss_term_weights: 0.1
+    scale: 16
+    type: CrossEntropyLoss
+    log_prefix: softmax
+    log_accuracy: true
+
+model_cfg:
+  model: Baseline
+  backbone_cfg:
+    in_channels: 1
+    layers_cfg: # Layers configuration for automatically model construction
+      - BC-64
+      - BC-64
+      - M
+      - BC-128
+      - BC-128
+      - M
+      - BC-256
+      - BC-256
+    type: Plain
+  SeparateFCs:
+    in_channels: 256
+    out_channels: 256
+    parts_num: 31
+  SeparateBNNecks:
+    class_num: 74
+    in_channels: 256
+    parts_num: 31
+  bin_num:
+    - 16
+    - 8
+    - 4
+    - 2
+    - 1
+
+optimizer_cfg:
+  lr: 0.1
+  momentum: 0.9
+  solver: SGD
+  weight_decay: 0.0005
+
+scheduler_cfg:
+  gamma: 0.1
+  milestones: # Learning Rate Reduction at each milestones
+    - 20000
+    - 40000
+  scheduler: MultiStepLR
+trainer_cfg:
+  enable_float16: true # half_percesion float for memory reduction and speedup
+  fix_BN: false
+  log_iter: 100
+  restore_ckpt_strict: true
+  restore_hint: 0
+  save_iter: 10000
+  save_name: Baseline
+  sync_BN: true
+  total_iter: 60000
+  sampler:
+    batch_shuffle: true
+    batch_size:
+      - 8 # TripletSampler, batch_size[0] indicates Number of Identity
+      - 16 #                 batch_size[1] indicates Samples sequqnce for each Identity
+    frames_num_fixed: 30 # fixed frames number for training
+    frames_num_max: 50 # max frames number for unfixed training
+    frames_num_min: 25 # min frames number for unfixed traing
+    sample_type: fixed_unordered # fixed control input frames number, unordered for controlling order of input tensor; Other options: unfixed_ordered or all_ordered
+    type: TripletSampler
+
+
+```
@@ -101,6 +101,8 @@ def download_file_and_uncompress(url,
        if not os.path.exists(savepath):
            _download_file(url, savepath, print_progress)

+        if print_progress:
+            print("Uncompress %s" % os.path.basename(savepath))
        for total_num, index, rootpath in _uncompress_file_zip(savepath, extrapath):
            if print_progress:
                done = int(50 * float(index) / total_num)