Files
OpenGait/.sisyphus/notepads/sconet-preprocess-research/learnings.md
T
crosstyan 15523bb84c docs(sisyphus): record demo fixes and preprocess research
Capture validated debugging outcomes and ScoNet preprocessing findings in persistent notes so future sessions can resume with verified context instead of redoing the same investigation.
2026-02-28 21:52:07 +08:00

116 lines
3.9 KiB
Markdown

# ScoNet Preprocessing Research - Learnings
## Official Sources Identified
### 1. Primary Implementation (HIGHEST TRUST)
- **Repository**: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait)
- **ScoNet Model Code**: opengait/modeling/models/sconet.py
- **Preprocessing Code**: datasets/pretreatment.py (lines 18-95)
- **Config**: configs/sconet/sconet_scoliosis1k.yaml
### 2. Dataset Source
- **Scoliosis1K Dataset**: https://zhouzi180.github.io/Scoliosis1K/
- **Raw silhouettes**: Extracted using PP-HumanSeg v2
- **Raw pose**: Extracted using ViTPose
### 3. Academic Papers
- **MICCAI 2024**: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.)
- PDF: https://arxiv.org/pdf/2407.05726
- Introduces ScoNet and Scoliosis1K dataset
- **MICCAI 2025**: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.)
- PDF: https://arxiv.org/abs/2509.00872
- Extends ScoNet with pose annotations
## Preprocessing Pipeline (Confirmed from Official Code)
### From `datasets/pretreatment.py` (imgs2pickle function):
```python
# Step 1: Filter empty images
if img.sum() <= 10000:
continue
# Step 2: VERTICAL TIGHT CROP (y-axis projection)
y_sum = img.sum(axis=1)
y_top = (y_sum != 0).argmax(axis=0)
y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0)
img = img[y_top: y_btm + 1, :] # <-- TIGHT CROP TO PERSON HEIGHT
# Step 3: Resize based on height (maintain aspect ratio)
ratio = img.shape[1] / img.shape[0]
img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC)
# Step 4: Find x-center by cumulative sum
x_csum = img.sum(axis=0).cumsum()
for idx, csum in enumerate(x_csum):
if csum > img.sum() / 2:
x_center = idx
break
# Step 5: Horizontal crop to img_size width (centered)
half_width = img_size // 2
left = x_center - half_width
right = x_center + half_width
# Step 6: Padding if needed
if left <= 0 or right >= img.shape[1]:
left += half_width
right += half_width
_ = np.zeros((img.shape[0], half_width))
img = np.concatenate([_, img, _], axis=1)
# Final crop
to_pickle.append(img[:, left: right].astype('uint8'))
```
### Key Parameters
- **Default img_size**: 64 (configurable via `--img_size`)
- **Interpolation**: cv2.INTER_CUBIC
- **Output format**: uint8 grayscale, pickle files
- **Normalization**: None during preprocessing (happens later in BaseSilTransform)
## Alignment with Local Implementation
### CONFIRMED: Vertical tight-crop before resize is OFFICIAL
- **Evidence**: Line 50 in pretreatment.py: `img = img[y_top: y_btm + 1, :]`
- **Purpose**: Removes vertical padding, focuses on actual person silhouette
- **Resize behavior**: Height-based resize maintains aspect ratio
### Transform Pipeline (from config)
```yaml
evaluator_cfg:
transform:
- type: BaseSilCuttingTransform # Optional cutting
trainer_cfg:
transform:
- type: BaseSilCuttingTransform # Optional cutting
```
From `opengait/data/transform.py`:
- `BaseSilCuttingTransform`: Applies optional cutting + divides by 255.0
- Default cutting: `int(x.shape[-1] // 64) * 10` pixels from sides
- If cutting=0, only normalization is applied
## Differences from Standard Gait Recognition
1. **No horizontal flip augmentation** in ScoNet config
2. **Evaluation uses**: `evaluate_scoliosis` function (not standard gait metrics)
3. **Class num**: 3 (Positive, Neutral, Negative) vs 74+ for gait ID
4. **Metric**: euclidean distance (not cosine)
## Critical Finding for User Concern
**Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.**
This is NOT a bug - it's the intended preprocessing pipeline:
1. Crop to person's actual height (remove empty vertical space)
2. Resize to fixed height (64px) maintaining aspect ratio
3. Center crop/pad horizontally to get 64x64 output
This ensures:
- Consistent scale across different camera distances
- Person fills the frame vertically
- Aspect ratio is preserved