15523bb84c
Capture validated debugging outcomes and ScoNet preprocessing findings in persistent notes so future sessions can resume with verified context instead of redoing the same investigation.
3.9 KiB
3.9 KiB
ScoNet Preprocessing Research - Learnings
Official Sources Identified
1. Primary Implementation (HIGHEST TRUST)
- Repository: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait)
- ScoNet Model Code: opengait/modeling/models/sconet.py
- Preprocessing Code: datasets/pretreatment.py (lines 18-95)
- Config: configs/sconet/sconet_scoliosis1k.yaml
2. Dataset Source
- Scoliosis1K Dataset: https://zhouzi180.github.io/Scoliosis1K/
- Raw silhouettes: Extracted using PP-HumanSeg v2
- Raw pose: Extracted using ViTPose
3. Academic Papers
-
MICCAI 2024: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.)
- PDF: https://arxiv.org/pdf/2407.05726
- Introduces ScoNet and Scoliosis1K dataset
-
MICCAI 2025: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.)
- PDF: https://arxiv.org/abs/2509.00872
- Extends ScoNet with pose annotations
Preprocessing Pipeline (Confirmed from Official Code)
From datasets/pretreatment.py (imgs2pickle function):
# Step 1: Filter empty images
if img.sum() <= 10000:
continue
# Step 2: VERTICAL TIGHT CROP (y-axis projection)
y_sum = img.sum(axis=1)
y_top = (y_sum != 0).argmax(axis=0)
y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0)
img = img[y_top: y_btm + 1, :] # <-- TIGHT CROP TO PERSON HEIGHT
# Step 3: Resize based on height (maintain aspect ratio)
ratio = img.shape[1] / img.shape[0]
img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC)
# Step 4: Find x-center by cumulative sum
x_csum = img.sum(axis=0).cumsum()
for idx, csum in enumerate(x_csum):
if csum > img.sum() / 2:
x_center = idx
break
# Step 5: Horizontal crop to img_size width (centered)
half_width = img_size // 2
left = x_center - half_width
right = x_center + half_width
# Step 6: Padding if needed
if left <= 0 or right >= img.shape[1]:
left += half_width
right += half_width
_ = np.zeros((img.shape[0], half_width))
img = np.concatenate([_, img, _], axis=1)
# Final crop
to_pickle.append(img[:, left: right].astype('uint8'))
Key Parameters
- Default img_size: 64 (configurable via
--img_size) - Interpolation: cv2.INTER_CUBIC
- Output format: uint8 grayscale, pickle files
- Normalization: None during preprocessing (happens later in BaseSilTransform)
Alignment with Local Implementation
CONFIRMED: Vertical tight-crop before resize is OFFICIAL
- Evidence: Line 50 in pretreatment.py:
img = img[y_top: y_btm + 1, :] - Purpose: Removes vertical padding, focuses on actual person silhouette
- Resize behavior: Height-based resize maintains aspect ratio
Transform Pipeline (from config)
evaluator_cfg:
transform:
- type: BaseSilCuttingTransform # Optional cutting
trainer_cfg:
transform:
- type: BaseSilCuttingTransform # Optional cutting
From opengait/data/transform.py:
BaseSilCuttingTransform: Applies optional cutting + divides by 255.0- Default cutting:
int(x.shape[-1] // 64) * 10pixels from sides - If cutting=0, only normalization is applied
Differences from Standard Gait Recognition
- No horizontal flip augmentation in ScoNet config
- Evaluation uses:
evaluate_scoliosisfunction (not standard gait metrics) - Class num: 3 (Positive, Neutral, Negative) vs 74+ for gait ID
- Metric: euclidean distance (not cosine)
Critical Finding for User Concern
Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.
This is NOT a bug - it's the intended preprocessing pipeline:
- Crop to person's actual height (remove empty vertical space)
- Resize to fixed height (64px) maintaining aspect ratio
- Center crop/pad horizontally to get 64x64 output
This ensures:
- Consistent scale across different camera distances
- Person fills the frame vertically
- Aspect ratio is preserved