15523bb84c
Capture validated debugging outcomes and ScoNet preprocessing findings in persistent notes so future sessions can resume with verified context instead of redoing the same investigation.
116 lines
3.9 KiB
Markdown
116 lines
3.9 KiB
Markdown
# ScoNet Preprocessing Research - Learnings
|
|
|
|
## Official Sources Identified
|
|
|
|
### 1. Primary Implementation (HIGHEST TRUST)
|
|
- **Repository**: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait)
|
|
- **ScoNet Model Code**: opengait/modeling/models/sconet.py
|
|
- **Preprocessing Code**: datasets/pretreatment.py (lines 18-95)
|
|
- **Config**: configs/sconet/sconet_scoliosis1k.yaml
|
|
|
|
### 2. Dataset Source
|
|
- **Scoliosis1K Dataset**: https://zhouzi180.github.io/Scoliosis1K/
|
|
- **Raw silhouettes**: Extracted using PP-HumanSeg v2
|
|
- **Raw pose**: Extracted using ViTPose
|
|
|
|
### 3. Academic Papers
|
|
- **MICCAI 2024**: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.)
|
|
- PDF: https://arxiv.org/pdf/2407.05726
|
|
- Introduces ScoNet and Scoliosis1K dataset
|
|
|
|
- **MICCAI 2025**: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.)
|
|
- PDF: https://arxiv.org/abs/2509.00872
|
|
- Extends ScoNet with pose annotations
|
|
|
|
## Preprocessing Pipeline (Confirmed from Official Code)
|
|
|
|
### From `datasets/pretreatment.py` (imgs2pickle function):
|
|
|
|
```python
|
|
# Step 1: Filter empty images
|
|
if img.sum() <= 10000:
|
|
continue
|
|
|
|
# Step 2: VERTICAL TIGHT CROP (y-axis projection)
|
|
y_sum = img.sum(axis=1)
|
|
y_top = (y_sum != 0).argmax(axis=0)
|
|
y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0)
|
|
img = img[y_top: y_btm + 1, :] # <-- TIGHT CROP TO PERSON HEIGHT
|
|
|
|
# Step 3: Resize based on height (maintain aspect ratio)
|
|
ratio = img.shape[1] / img.shape[0]
|
|
img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC)
|
|
|
|
# Step 4: Find x-center by cumulative sum
|
|
x_csum = img.sum(axis=0).cumsum()
|
|
for idx, csum in enumerate(x_csum):
|
|
if csum > img.sum() / 2:
|
|
x_center = idx
|
|
break
|
|
|
|
# Step 5: Horizontal crop to img_size width (centered)
|
|
half_width = img_size // 2
|
|
left = x_center - half_width
|
|
right = x_center + half_width
|
|
|
|
# Step 6: Padding if needed
|
|
if left <= 0 or right >= img.shape[1]:
|
|
left += half_width
|
|
right += half_width
|
|
_ = np.zeros((img.shape[0], half_width))
|
|
img = np.concatenate([_, img, _], axis=1)
|
|
|
|
# Final crop
|
|
to_pickle.append(img[:, left: right].astype('uint8'))
|
|
```
|
|
|
|
### Key Parameters
|
|
- **Default img_size**: 64 (configurable via `--img_size`)
|
|
- **Interpolation**: cv2.INTER_CUBIC
|
|
- **Output format**: uint8 grayscale, pickle files
|
|
- **Normalization**: None during preprocessing (happens later in BaseSilTransform)
|
|
|
|
## Alignment with Local Implementation
|
|
|
|
### CONFIRMED: Vertical tight-crop before resize is OFFICIAL
|
|
- **Evidence**: Line 50 in pretreatment.py: `img = img[y_top: y_btm + 1, :]`
|
|
- **Purpose**: Removes vertical padding, focuses on actual person silhouette
|
|
- **Resize behavior**: Height-based resize maintains aspect ratio
|
|
|
|
### Transform Pipeline (from config)
|
|
```yaml
|
|
evaluator_cfg:
|
|
transform:
|
|
- type: BaseSilCuttingTransform # Optional cutting
|
|
|
|
trainer_cfg:
|
|
transform:
|
|
- type: BaseSilCuttingTransform # Optional cutting
|
|
```
|
|
|
|
From `opengait/data/transform.py`:
|
|
- `BaseSilCuttingTransform`: Applies optional cutting + divides by 255.0
|
|
- Default cutting: `int(x.shape[-1] // 64) * 10` pixels from sides
|
|
- If cutting=0, only normalization is applied
|
|
|
|
## Differences from Standard Gait Recognition
|
|
|
|
1. **No horizontal flip augmentation** in ScoNet config
|
|
2. **Evaluation uses**: `evaluate_scoliosis` function (not standard gait metrics)
|
|
3. **Class num**: 3 (Positive, Neutral, Negative) vs 74+ for gait ID
|
|
4. **Metric**: euclidean distance (not cosine)
|
|
|
|
## Critical Finding for User Concern
|
|
|
|
**Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.**
|
|
|
|
This is NOT a bug - it's the intended preprocessing pipeline:
|
|
1. Crop to person's actual height (remove empty vertical space)
|
|
2. Resize to fixed height (64px) maintaining aspect ratio
|
|
3. Center crop/pad horizontally to get 64x64 output
|
|
|
|
This ensures:
|
|
- Consistent scale across different camera distances
|
|
- Person fills the frame vertically
|
|
- Aspect ratio is preserved
|