feat: add batch MCAP export tooling for ZED segments

Add a Python batch wrapper around zed_svo_to_mcap for multi-camera segment exports. The new script supports dataset discovery, repeated segment-dir inputs, CSV-driven ordering, skip/probe/report flows, dry-run, and CUDA environment passthrough so kindergarten-style datasets can be converted into one bundled MCAP per segment. Extend zed_svo_to_mcap so bundled multi-camera mode accepts --end-frame with synced-group semantics. In this mode the value is interpreted as the last emitted synced frame-group index from the common synced start, while --start-frame remains unsupported. Vendor a minimal pose-config TOML and a sample segments CSV into this repo so the MCAP workflow is self-contained. Update the README to document the batch MCAP flow, use portable placeholders instead of machine-specific absolute paths, and describe the expected dataset layout explicitly.
2026-03-20 09:19:56 +00:00
parent 8d9bd1b815
commit 1691274e85
5 changed files with 920 additions and 13 deletions
@@ -65,7 +65,7 @@ The repo also includes an offline conversion tool for the left ZED color stream:
 ```bash
 CUDA_VISIBLE_DEVICES=GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
 ./build/bin/zed_svo_to_mp4 \
-    --input /workspaces/data/kindergarten/bar/2026-03-18T11-59-41/2026-03-18T11-59-41_zed1.svo2 \
+    --input <SVO_INPUT> \
    --encoder-device auto \
    --preset balanced \
    --quality 20 \
@@ -83,11 +83,40 @@ Python dependencies for the batch wrapper are managed with `uv`:
 uv sync
 ```

+Expected multi-camera dataset layout:
+
+```text
+<DATASET_ROOT>/
+├── svo2_segments_sorted.csv
+├── bar/
+│   └── 2026-03-18T11-59-41/
+│       ├── 2026-03-18T11-59-41_zed1.svo2
+│       ├── 2026-03-18T11-59-41_zed2.svo2
+│       ├── 2026-03-18T11-59-41_zed3.svo2
+│       └── 2026-03-18T11-59-41_zed4.svo2
+└── jump/
+    └── experiment/
+        └── 1/
+            └── 2026-03-18T11-26-23/
+                ├── 2026-03-18T11-26-23_zed1.svo2
+                ├── 2026-03-18T11-26-23_zed2.svo2
+                ├── 2026-03-18T11-26-23_zed3.svo2
+                └── 2026-03-18T11-26-23_zed4.svo2
+```
+
+Placeholders used below:
+- `<DATASET_ROOT>`: dataset root containing multi-camera segment directories
+- `<SEGMENT_DIR>`: one multi-camera segment directory containing `*_zedN.svo` or `*_zedN.svo2`
+- `<SEGMENT_DIR_A>`, `<SEGMENT_DIR_B>`: explicit segment directories
+- `<SEGMENTS_CSV>`: CSV file with a `segment_dir` column, for example `config/svo2_segments_sorted.sample.csv`
+- `<SVO_INPUT>`: one single-camera `.svo` or `.svo2` file
+- `<POSE_CONFIG>`: TOML file such as `config/zed_pose_config.toml`
+
 Use the wrapper to recurse through a folder, run `zed_svo_to_mp4` on every matched `.svo2`, and show one aggregate tqdm progress bar:

 ```bash
 uv run python scripts/zed_batch_svo_to_mp4.py \
-    /workspaces/data/kindergarten/bar \
+    <DATASET_ROOT>/bar \
    --pattern '*.svo2' \
    --recursive \
    --jobs 2 \
@@ -105,7 +134,7 @@ Use the grid converter to merge four synced ZED recordings into a 2x2 CCTV-style

 ```bash
 ./build/bin/zed_svo_grid_to_mp4 \
-    --segment-dir /workspaces/data/kindergarten/bar/2026-03-18T11-59-41 \
+    --segment-dir <SEGMENT_DIR> \
    --encoder-device auto \
    --codec h265 \
    --duration-seconds 2
@@ -117,7 +146,7 @@ Use the batch wrapper to run `zed_svo_grid_to_mp4` over many segment directories

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    /workspaces/data/kindergarten \
+    <DATASET_ROOT> \
    --recursive \
    --jobs 2 \
    --encoder-device auto \
@@ -128,8 +157,8 @@ You can also provide the exact segments to convert:

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    --segment-dir /workspaces/data/kindergarten/jump/external/recording/2026-03-18T11-23-22 \
-    --segment-dir /workspaces/data/kindergarten/jump/experiment/1/2026-03-18T11-26-23 \
+    --segment-dir <SEGMENT_DIR_A> \
+    --segment-dir <SEGMENT_DIR_B> \
    --jobs 2
 ```

@@ -137,7 +166,7 @@ Or preserve a precomputed CSV ordering:

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    --segments-csv /workspaces/data/kindergarten/svo2_segments_sorted.csv \
+    --segments-csv <SEGMENTS_CSV> \
    --jobs 2 \
    --duration-seconds 2
 ```
@@ -148,7 +177,7 @@ When you suspect a previous run left behind partial MP4 files, opt into `ffprobe

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    /workspaces/data/kindergarten \
+    <DATASET_ROOT> \
    --probe-existing \
    --jobs 2
 ```
@@ -157,7 +186,7 @@ Use `--report-existing` to audit existing outputs without launching conversions.

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    /workspaces/data/kindergarten \
+    <DATASET_ROOT> \
    --report-existing
 ```

@@ -165,7 +194,7 @@ Use `--dry-run` to preview what the batch wrapper would convert after applying s

 ```bash
 uv run python scripts/zed_batch_svo_grid_to_mp4.py \
-    /workspaces/data/kindergarten \
+    <DATASET_ROOT> \
    --probe-existing \
    --dry-run
 ```
@@ -174,7 +203,7 @@ uv run python scripts/zed_batch_svo_grid_to_mp4.py \

 The `--segments-csv` input expects a header row with at least a `segment_dir` column. Extra columns are allowed and ignored by the batch wrapper. `segment_dir` values may be absolute paths or paths relative to the CSV file's parent directory. Use `--csv-root` to override that base directory.

-Repeated rows for the same `segment_dir` are allowed; the wrapper converts each unique segment once, preserving the first-seen CSV order. This makes `/workspaces/data/kindergarten/svo2_segments_sorted.csv` a valid input even though it stores one row per camera file:
+Repeated rows for the same `segment_dir` are allowed; the wrapper converts each unique segment once, preserving the first-seen CSV order. The repo includes a small example at `config/svo2_segments_sorted.sample.csv`:

 ```csv
 timestamp,activity,group_path,segment_dir,camera,relative_path
@@ -182,6 +211,67 @@ timestamp,activity,group_path,segment_dir,camera,relative_path
 2026-03-18T11-23-22,jump,jump/external/recording,jump/external/recording/2026-03-18T11-23-22,zed2,jump/external/recording/2026-03-18T11-23-22/2026-03-18T11-23-22_zed2.svo2
 ```

+### Batch ZED Segments To MCAP
+
+Use the wrapper to recurse through a dataset root, run `zed_svo_to_mcap --segment-dir` on every matched multi-camera segment, and show one aggregate tqdm progress bar:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    <DATASET_ROOT> \
+    --recursive \
+    --jobs 2 \
+    --cuda-visible-devices GPU-9cc7b26e-90d4-0c49-4d4c-060e528ffba6 \
+    --end-frame 29
+```
+
+You can also preserve the precomputed kindergarten CSV ordering:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    --segments-csv <SEGMENTS_CSV> \
+    --jobs 2 \
+    --end-frame 29
+```
+
+Enable per-camera pose export when the segment has valid tracking:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    --segment-dir <SEGMENT_DIR> \
+    --with-pose \
+    --pose-config <POSE_CONFIG>
+```
+
+The batch MCAP wrapper writes `<segment>/<segment>.mcap` by default, skips existing outputs unless told otherwise, and returns a nonzero exit code if any segment fails.
+The repo includes a minimal pose config at `config/zed_pose_config.toml` so MCAP conversion does not depend on a separate `cv-mmap` checkout.
+In bundled multi-camera mode, `--end-frame` means the last emitted synced frame-group index from the common start timestamp.
+
+Use `--probe-existing` to validate existing MCAPs before skipping them. Invalid outputs are treated as missing and requeued:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    <DATASET_ROOT> \
+    --probe-existing \
+    --jobs 2
+```
+
+Use `--report-existing` to audit existing MCAPs without launching conversions:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    <DATASET_ROOT> \
+    --report-existing
+```
+
+Use `--dry-run` to preview what would be converted after applying skip or probe logic:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    --segments-csv <SEGMENTS_CSV> \
+    --probe-existing \
+    --dry-run
+```
+
 ### Mandatory Acceptance (Standalone)

 Run the full mandatory acceptance suite. This executes the complete protocol/codec matrix without requiring external servers.