feat(zed): improve MCAP export batching and defaults

Default ZED MCAP export to neural_plus depth across the CLI and Python wrappers, and add tail-frame handling plus better corrupted-frame diagnostics in zed_svo_to_mcap. Add mixed hardware/software worker pools to the batch MCAP wrapper, replace tqdm with progress-table on TTYs, keep text event logging and heartbeats for non-TTY runs, and document the NVENC session-limit rationale for mixed mode in the README. Also refresh Python dependencies for the batch tooling and move the OpenSSL lookup in CMake so the local workspace build remains compatible with the vendored cnats setup.
2026-03-23 09:07:38 +00:00
parent 2f74a9561d
commit a0b9c95d5b
7 changed files with 909 additions and 69 deletions
@@ -232,7 +232,7 @@ timestamp,activity,group_path,segment_dir,camera,relative_path
 This workflow depends on the `zed_svo_to_mcap` binary, which is only built when
 the ZED SDK is detected during CMake configure.

-Use the wrapper to recurse through a dataset root, run `zed_svo_to_mcap --segment-dir` on every matched multi-camera segment, and show one aggregate tqdm progress bar:
+Use the wrapper to recurse through a dataset root, run `zed_svo_to_mcap --segment-dir` on every matched multi-camera segment, and show interactive table progress on TTYs with durable text logging elsewhere:

 ```bash
 uv run python scripts/zed_batch_svo_to_mcap.py \
@@ -266,7 +266,31 @@ uv run python scripts/zed_batch_svo_to_mcap.py \
 The batch MCAP wrapper writes `<segment>/<segment>.mcap` by default, skips existing outputs unless told otherwise, and returns a nonzero exit code if any segment fails.
 The repo includes a minimal pose config at `config/zed_pose_config.toml` so MCAP conversion does not depend on a separate `cv-mmap` checkout.
 In bundled multi-camera mode, `--start-frame` and `--end-frame` mean the first and last emitted synced frame-group indices from the common start timestamp, inclusive.
-When stderr is attached to a TTY, `zed_svo_to_mcap` shows a tqdm-like progress bar. In bundled mode without `--end-frame`, it uses approximate progress over the common synchronized time window so export starts immediately without a full pre-count pass.
+When stderr is attached to a TTY, `zed_batch_svo_to_mcap.py` uses a `progress-table` view by default; otherwise it emits line-oriented start/completion/failure logs plus periodic heartbeat summaries. Use `--progress-ui table` or `--progress-ui text` to override the automatic mode selection.
+
+### Why Mixed Hardware/Software Mode Exists
+
+Bundled MCAP export opens one video encoder per camera stream. A four-camera segment therefore consumes four H.264/H.265 encoder sessions at once.
+
+This matters because NVIDIA's NVENC session limit is separate from raw CUDA utilization. In NVIDIA's Video Codec SDK documentation, non-qualified systems are capped at 8 concurrent encode sessions across all non-qualified GPUs in the system, and NVIDIA's SDK readme still calls out a 5-session GeForce limit in some contexts. In practice, consumer/GeForce hosts often hit NVENC session-init failures before the GPUs look "full" in `nvidia-smi`.
+
+That is why the batch wrapper supports mixed pools such as two NVENC workers plus two software-encoded workers:
+
+```bash
+uv run python scripts/zed_batch_svo_to_mcap.py \
+    <DATASET_ROOT> \
+    --recursive \
+    --overwrite \
+    --hardware-jobs 2 \
+    --hardware-cuda-visible-devices 0,1 \
+    --software-jobs 2 \
+    --software-cuda-visible-devices 0,1 \
+    --depth-mode neural_plus
+```
+
+With bundled four-camera segments, `4` all-hardware jobs would try to open about `16` NVENC sessions, which is why mixed mode is the safe default for high-throughput rebuilds on GeForce-class machines. The software workers still use the GPUs for ZED neural depth; only video encoding moves to CPU.
+
+If you intentionally want to bypass NVIDIA's consumer NVENC session cap, there is an unofficial driver patch at [`keylase/nvidia-patch`](https://github.com/keylase/nvidia-patch). That can make larger all-hardware batches viable, but it is not NVIDIA-supported and should be treated as an explicit ops decision rather than a project requirement.

 Use `--probe-existing` to validate existing MCAPs before skipping them. Invalid outputs are treated as missing and requeued: