OpenGait/.sisyphus/notepads/sconet-preprocess-research/learnings.md

# ScoNet Preprocessing Research - Learnings

## Official Sources Identified

### 1. Primary Implementation (HIGHEST TRUST)
- **Repository**: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait)
- **ScoNet Model Code**: opengait/modeling/models/sconet.py
- **Preprocessing Code**: datasets/pretreatment.py (lines 18-95)
- **Config**: configs/sconet/sconet_scoliosis1k.yaml

### 2. Dataset Source
- **Scoliosis1K Dataset**: https://zhouzi180.github.io/Scoliosis1K/
- **Raw silhouettes**: Extracted using PP-HumanSeg v2
- **Raw pose**: Extracted using ViTPose

### 3. Academic Papers
- **MICCAI 2024**: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.)
  - PDF: https://arxiv.org/pdf/2407.05726
  - Introduces ScoNet and Scoliosis1K dataset

- **MICCAI 2025**: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.)
  - PDF: https://arxiv.org/abs/2509.00872
  - Extends ScoNet with pose annotations

## Preprocessing Pipeline (Confirmed from Official Code)

### From `datasets/pretreatment.py` (imgs2pickle function):

```python
# Step 1: Filter empty images
if img.sum() <= 10000:
    continue

# Step 2: VERTICAL TIGHT CROP (y-axis projection)
y_sum = img.sum(axis=1)
y_top = (y_sum != 0).argmax(axis=0)
y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0)
img = img[y_top: y_btm + 1, :]  # <-- TIGHT CROP TO PERSON HEIGHT

# Step 3: Resize based on height (maintain aspect ratio)
ratio = img.shape[1] / img.shape[0]
img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC)

# Step 4: Find x-center by cumulative sum
x_csum = img.sum(axis=0).cumsum()
for idx, csum in enumerate(x_csum):
    if csum > img.sum() / 2:
        x_center = idx
        break

# Step 5: Horizontal crop to img_size width (centered)
half_width = img_size // 2
left = x_center - half_width
right = x_center + half_width

# Step 6: Padding if needed
if left <= 0 or right >= img.shape[1]:
    left += half_width
    right += half_width
    _ = np.zeros((img.shape[0], half_width))
    img = np.concatenate([_, img, _], axis=1)

# Final crop
to_pickle.append(img[:, left: right].astype('uint8'))
```

### Key Parameters
- **Default img_size**: 64 (configurable via `--img_size`)
- **Interpolation**: cv2.INTER_CUBIC
- **Output format**: uint8 grayscale, pickle files
- **Normalization**: None during preprocessing (happens later in BaseSilTransform)

## Alignment with Local Implementation

### CONFIRMED: Vertical tight-crop before resize is OFFICIAL
- **Evidence**: Line 50 in pretreatment.py: `img = img[y_top: y_btm + 1, :]`
- **Purpose**: Removes vertical padding, focuses on actual person silhouette
- **Resize behavior**: Height-based resize maintains aspect ratio

### Transform Pipeline (from config)
```yaml
evaluator_cfg:
  transform:
    - type: BaseSilCuttingTransform  # Optional cutting

trainer_cfg:
  transform:
    - type: BaseSilCuttingTransform  # Optional cutting
```

From `opengait/data/transform.py`:
- `BaseSilCuttingTransform`: Applies optional cutting + divides by 255.0
- Default cutting: `int(x.shape[-1] // 64) * 10` pixels from sides
- If cutting=0, only normalization is applied

## Differences from Standard Gait Recognition

1. **No horizontal flip augmentation** in ScoNet config
2. **Evaluation uses**: `evaluate_scoliosis` function (not standard gait metrics)
3. **Class num**: 3 (Positive, Neutral, Negative) vs 74+ for gait ID
4. **Metric**: euclidean distance (not cosine)

## Critical Finding for User Concern

**Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.**

This is NOT a bug - it's the intended preprocessing pipeline:
1. Crop to person's actual height (remove empty vertical space)
2. Resize to fixed height (64px) maintaining aspect ratio
3. Center crop/pad horizontally to get 64x64 output

This ensures:
- Consistent scale across different camera distances
- Person fills the frame vertically
- Aspect ratio is preserved