Files
OpenGait/.sisyphus/notepads/sconet-preprocess-research/learnings.md
T
crosstyan 15523bb84c docs(sisyphus): record demo fixes and preprocess research
Capture validated debugging outcomes and ScoNet preprocessing findings in persistent notes so future sessions can resume with verified context instead of redoing the same investigation.
2026-02-28 21:52:07 +08:00

3.9 KiB

ScoNet Preprocessing Research - Learnings

Official Sources Identified

1. Primary Implementation (HIGHEST TRUST)

  • Repository: ShiqiYu/OpenGait (https://github.com/ShiqiYu/OpenGait)
  • ScoNet Model Code: opengait/modeling/models/sconet.py
  • Preprocessing Code: datasets/pretreatment.py (lines 18-95)
  • Config: configs/sconet/sconet_scoliosis1k.yaml

2. Dataset Source

3. Academic Papers

  • MICCAI 2024: "Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis" (Zhou et al.)

  • MICCAI 2025: "Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening" (Zhou et al.)

Preprocessing Pipeline (Confirmed from Official Code)

From datasets/pretreatment.py (imgs2pickle function):

# Step 1: Filter empty images
if img.sum() <= 10000:
    continue

# Step 2: VERTICAL TIGHT CROP (y-axis projection)
y_sum = img.sum(axis=1)
y_top = (y_sum != 0).argmax(axis=0)
y_btm = (y_sum != 0).cumsum(axis=0).argmax(axis=0)
img = img[y_top: y_btm + 1, :]  # <-- TIGHT CROP TO PERSON HEIGHT

# Step 3: Resize based on height (maintain aspect ratio)
ratio = img.shape[1] / img.shape[0]
img = cv2.resize(img, (int(img_size * ratio), img_size), interpolation=cv2.INTER_CUBIC)

# Step 4: Find x-center by cumulative sum
x_csum = img.sum(axis=0).cumsum()
for idx, csum in enumerate(x_csum):
    if csum > img.sum() / 2:
        x_center = idx
        break

# Step 5: Horizontal crop to img_size width (centered)
half_width = img_size // 2
left = x_center - half_width
right = x_center + half_width

# Step 6: Padding if needed
if left <= 0 or right >= img.shape[1]:
    left += half_width
    right += half_width
    _ = np.zeros((img.shape[0], half_width))
    img = np.concatenate([_, img, _], axis=1)

# Final crop
to_pickle.append(img[:, left: right].astype('uint8'))

Key Parameters

  • Default img_size: 64 (configurable via --img_size)
  • Interpolation: cv2.INTER_CUBIC
  • Output format: uint8 grayscale, pickle files
  • Normalization: None during preprocessing (happens later in BaseSilTransform)

Alignment with Local Implementation

CONFIRMED: Vertical tight-crop before resize is OFFICIAL

  • Evidence: Line 50 in pretreatment.py: img = img[y_top: y_btm + 1, :]
  • Purpose: Removes vertical padding, focuses on actual person silhouette
  • Resize behavior: Height-based resize maintains aspect ratio

Transform Pipeline (from config)

evaluator_cfg:
  transform:
    - type: BaseSilCuttingTransform  # Optional cutting
    
trainer_cfg:
  transform:
    - type: BaseSilCuttingTransform  # Optional cutting

From opengait/data/transform.py:

  • BaseSilCuttingTransform: Applies optional cutting + divides by 255.0
  • Default cutting: int(x.shape[-1] // 64) * 10 pixels from sides
  • If cutting=0, only normalization is applied

Differences from Standard Gait Recognition

  1. No horizontal flip augmentation in ScoNet config
  2. Evaluation uses: evaluate_scoliosis function (not standard gait metrics)
  3. Class num: 3 (Positive, Neutral, Negative) vs 74+ for gait ID
  4. Metric: euclidean distance (not cosine)

Critical Finding for User Concern

Vertical tight-crop BEFORE resize is CORRECT and OFFICIAL.

This is NOT a bug - it's the intended preprocessing pipeline:

  1. Crop to person's actual height (remove empty vertical space)
  2. Resize to fixed height (64px) maintaining aspect ratio
  3. Center crop/pad horizontally to get 64x64 output

This ensures:

  • Consistent scale across different camera distances
  • Person fills the frame vertically
  • Aspect ratio is preserved