# ScoNet Preprocessing Research - Learnings ## Official Sources Identified ### 1. Primary Implementation (HIGHEST TRUST) - **Repository**: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait) - **ScoNet Model Code**: opengait/modeling/models/sconet.py - **Preprocessing Code**: datasets/pretreatment.py (lines 18-95) - **Config**: configs/sconet/sconet_scoliosis1k.yaml ### 2. Dataset Source - **Scoliosis1K Dataset**: https://zhouzi180.github.io/Scoliosis1K/ - **Raw silhouettes**: Extracted using PP-HumanSeg v2 - **Raw pose**: Extracted using ViTPose ### 3. Academic Papers - **MICCAI 2024**: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.) - PDF: https://arxiv.org/pdf/2407.05726 - Introduces ScoNet and Scoliosis1K dataset - **MICCAI 2025**: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.) - PDF: https://arxiv.org/abs/2509.00872 - Extends ScoNet with pose annotations ## Preprocessing Pipeline (Confirmed from Official Code) ### From `datasets/pretreatment.py` (imgs2pickle function): ```python # Step 1: Filter empty images if img.sum() <= 10000: continue # Step 2: VERTICAL TIGHT CROP (y-axis projection) y_sum = img.sum(axis=1) y_top = (y_sum != 0).argmax(axis=0) y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0) img = img[y_top: y_btm + 1, :] # <-- TIGHT CROP TO PERSON HEIGHT # Step 3: Resize based on height (maintain aspect ratio) ratio = img.shape[1] / img.shape[0] img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC) # Step 4: Find x-center by cumulative sum x_csum = img.sum(axis=0).cumsum() for idx, csum in enumerate(x_csum): if csum > img.sum() / 2: x_center = idx break # Step 5: Horizontal crop to img_size width (centered) half_width = img_size // 2 left = x_center - half_width right = x_center + half_width # Step 6: Padding if needed if left <= 0 or right >= img.shape[1]: left += half_width right += half_width _ = np.zeros((img.shape[0], half_width)) img = np.concatenate([_, img, _], axis=1) # Final crop to_pickle.append(img[:, left: right].astype('uint8')) ``` ### Key Parameters - **Default img_size**: 64 (configurable via `--img_size`) - **Interpolation**: cv2.INTER_CUBIC - **Output format**: uint8 grayscale, pickle files - **Normalization**: None during preprocessing (happens later in BaseSilTransform) ## Alignment with Local Implementation ### CONFIRMED: Vertical tight-crop before resize is OFFICIAL - **Evidence**: Line 50 in pretreatment.py: `img = img[y_top: y_btm + 1, :]` - **Purpose**: Removes vertical padding, focuses on actual person silhouette - **Resize behavior**: Height-based resize maintains aspect ratio ### Transform Pipeline (from config) ```yaml evaluator_cfg: transform: - type: BaseSilCuttingTransform # Optional cutting trainer_cfg: transform: - type: BaseSilCuttingTransform # Optional cutting ``` From `opengait/data/transform.py`: - `BaseSilCuttingTransform`: Applies optional cutting + divides by 255.0 - Default cutting: `int(x.shape[-1] // 64) * 10` pixels from sides - If cutting=0, only normalization is applied ## Differences from Standard Gait Recognition 1. **No horizontal flip augmentation** in ScoNet config 2. **Evaluation uses**: `evaluate_scoliosis` function (not standard gait metrics) 3. **Class num**: 3 (Positive, Neutral, Negative) vs 74+ for gait ID 4. **Metric**: euclidean distance (not cosine) ## Critical Finding for User Concern **Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.** This is NOT a bug - it's the intended preprocessing pipeline: 1. Crop to person's actual height (remove empty vertical space) 2. Resize to fixed height (64px) maintaining aspect ratio 3. Center crop/pad horizontally to get 64x64 output This ensures: - Consistent scale across different camera distances - Person fills the frame vertically - Aspect ratio is preserved