Files
cvmmap-streamer/docs/zed_segment_time_index.md
T
crosstyan e3a423433e feat(zed): add DuckDB segment timestamp indexer
Add a new mcap_video_bounds helper binary plus a zed_segment_time_index.py CLI that builds and queries an embedded DuckDB index for bundled ZED segment recordings.

The index stores segment folders, MCAP paths, video time bounds, durations, camera labels, and dataset metadata, and reuses the existing recursive multi-camera segment discovery logic so nested kindergarten layouts are indexed correctly.

Infer a dataset default timezone from folder names versus MCAP timestamps, and make point queries precision-aware so second-level folder timestamps like 2026-03-18T12-00-23 resolve to the matching segment instead of missing due to subsecond start offsets.

Verification:
- uv add 'duckdb>=1.0'
- cmake --build build --target mcap_video_bounds
- uv run python -m unittest tests.test_zed_segment_time_index
- uv run python scripts/zed_segment_time_index.py build /workspaces/data/kindergarten --jobs 8
- uv run python scripts/zed_segment_time_index.py query /workspaces/data/kindergarten --at 2026-03-18T12-00-23
2026-03-24 16:02:50 +08:00

3.8 KiB

ZED Segment Time Index

scripts/zed_segment_time_index.py builds and queries an embedded DuckDB index for bundled ZED segment folders.

Default artifact name:

<DATASET_ROOT>/segment_time_index.duckdb

Primary commands:

uv run python scripts/zed_segment_time_index.py build <DATASET_ROOT>
uv run python scripts/zed_segment_time_index.py query <DATASET_ROOT> --at 2026-03-18T12-00-23
uv run python scripts/zed_segment_time_index.py query <DATASET_ROOT> --start 2026-03-18T12-00-23 --end 2026-03-18T12-00-30

Data Source Rules

  • Segment discovery is recursive and follows the same multi-camera layout assumptions as the batch ZED tooling.
  • A directory is considered a valid segment when it contains at least two unique *_zedN.svo or *_zedN.svo2 files and no duplicate camera labels.
  • Timing is sourced from the segment MCAP, not from the SVO/SVO2 files.
  • A valid segment is skipped when it has no .mcap file or more than one .mcap file in the segment directory.

MCAP Bounds Extraction

build/bin/mcap_video_bounds scans foxglove.CompressedVideo messages in one MCAP and emits:

  • start_ns
  • end_ns
  • duration_ns
  • video_message_count
  • start_iso_utc
  • end_iso_utc

The helper prefers the protobuf CompressedVideo.timestamp field and falls back to MCAP logTime when that field is zero.

DuckDB Layout

The database contains two tables: meta and segments.

meta

Key-value metadata for the index:

  • schema_version: current schema version, currently 1
  • dataset_root: absolute dataset root used when the index was built
  • built_at_utc: build timestamp in UTC
  • default_timezone: inferred dataset wall-clock timezone used when querying with --timezone dataset

segments

One row per indexed segment.

Column Type Meaning
segment_dir VARCHAR Absolute path to the segment directory
relative_segment_dir VARCHAR Path relative to the dataset root
group_path VARCHAR Parent path of the segment within the dataset
activity VARCHAR First path component under the dataset root
segment_name VARCHAR Segment directory basename
mcap_path VARCHAR Absolute MCAP path used for timing
start_ns BIGINT Earliest video timestamp in nanoseconds since Unix epoch
end_ns BIGINT Latest video timestamp in nanoseconds since Unix epoch
duration_ns BIGINT end_ns - start_ns
start_iso_utc VARCHAR UTC rendering of start_ns
end_iso_utc VARCHAR UTC rendering of end_ns
camera_count INTEGER Number of discovered camera inputs in the segment directory
camera_labels VARCHAR Comma-separated camera labels, for example zed1,zed2,zed3,zed4
video_message_count BIGINT Number of foxglove.CompressedVideo messages observed in the MCAP
index_source VARCHAR Current extractor label, currently mcap_video_bounds

Indexes are created on start_ns and end_ns.

Query Semantics

  • --at performs an overlap lookup, not just an exact nanosecond equality check.
  • Query precision follows the precision supplied by the user.
  • A second-precision value like 2026-03-18T12-00-23 is treated as the whole second [12:00:23.000, 12:00:23.999999999].
  • Integer epochs are widened similarly by their apparent unit:
    • 10 digits or fewer: seconds
    • 11-13 digits: milliseconds
    • 14-16 digits: microseconds
    • 17+ digits: nanoseconds
  • --start/--end returns every segment whose [start_ns, end_ns] overlaps the requested interval.

Timezone Behavior

  • Query default is --timezone dataset.
  • dataset resolves to the default_timezone stored in meta.
  • If inference is unavailable, the script falls back to local.
  • Explicit values are also accepted:
    • local
    • UTC
    • fixed offsets such as UTC+08:00
    • IANA zone names such as Asia/Shanghai