e3a423433e
Add a new mcap_video_bounds helper binary plus a zed_segment_time_index.py CLI that builds and queries an embedded DuckDB index for bundled ZED segment recordings. The index stores segment folders, MCAP paths, video time bounds, durations, camera labels, and dataset metadata, and reuses the existing recursive multi-camera segment discovery logic so nested kindergarten layouts are indexed correctly. Infer a dataset default timezone from folder names versus MCAP timestamps, and make point queries precision-aware so second-level folder timestamps like 2026-03-18T12-00-23 resolve to the matching segment instead of missing due to subsecond start offsets. Verification: - uv add 'duckdb>=1.0' - cmake --build build --target mcap_video_bounds - uv run python -m unittest tests.test_zed_segment_time_index - uv run python scripts/zed_segment_time_index.py build /workspaces/data/kindergarten --jobs 8 - uv run python scripts/zed_segment_time_index.py query /workspaces/data/kindergarten --at 2026-03-18T12-00-23
3.8 KiB
3.8 KiB
ZED Segment Time Index
scripts/zed_segment_time_index.py builds and queries an embedded DuckDB index for bundled ZED segment folders.
Default artifact name:
<DATASET_ROOT>/segment_time_index.duckdb
Primary commands:
uv run python scripts/zed_segment_time_index.py build <DATASET_ROOT>
uv run python scripts/zed_segment_time_index.py query <DATASET_ROOT> --at 2026-03-18T12-00-23
uv run python scripts/zed_segment_time_index.py query <DATASET_ROOT> --start 2026-03-18T12-00-23 --end 2026-03-18T12-00-30
Data Source Rules
- Segment discovery is recursive and follows the same multi-camera layout assumptions as the batch ZED tooling.
- A directory is considered a valid segment when it contains at least two unique
*_zedN.svoor*_zedN.svo2files and no duplicate camera labels. - Timing is sourced from the segment MCAP, not from the SVO/SVO2 files.
- A valid segment is skipped when it has no
.mcapfile or more than one.mcapfile in the segment directory.
MCAP Bounds Extraction
build/bin/mcap_video_bounds scans foxglove.CompressedVideo messages in one MCAP and emits:
start_nsend_nsduration_nsvideo_message_countstart_iso_utcend_iso_utc
The helper prefers the protobuf CompressedVideo.timestamp field and falls back to MCAP logTime when that field is zero.
DuckDB Layout
The database contains two tables: meta and segments.
meta
Key-value metadata for the index:
schema_version: current schema version, currently1dataset_root: absolute dataset root used when the index was builtbuilt_at_utc: build timestamp in UTCdefault_timezone: inferred dataset wall-clock timezone used when querying with--timezone dataset
segments
One row per indexed segment.
| Column | Type | Meaning |
|---|---|---|
segment_dir |
VARCHAR |
Absolute path to the segment directory |
relative_segment_dir |
VARCHAR |
Path relative to the dataset root |
group_path |
VARCHAR |
Parent path of the segment within the dataset |
activity |
VARCHAR |
First path component under the dataset root |
segment_name |
VARCHAR |
Segment directory basename |
mcap_path |
VARCHAR |
Absolute MCAP path used for timing |
start_ns |
BIGINT |
Earliest video timestamp in nanoseconds since Unix epoch |
end_ns |
BIGINT |
Latest video timestamp in nanoseconds since Unix epoch |
duration_ns |
BIGINT |
end_ns - start_ns |
start_iso_utc |
VARCHAR |
UTC rendering of start_ns |
end_iso_utc |
VARCHAR |
UTC rendering of end_ns |
camera_count |
INTEGER |
Number of discovered camera inputs in the segment directory |
camera_labels |
VARCHAR |
Comma-separated camera labels, for example zed1,zed2,zed3,zed4 |
video_message_count |
BIGINT |
Number of foxglove.CompressedVideo messages observed in the MCAP |
index_source |
VARCHAR |
Current extractor label, currently mcap_video_bounds |
Indexes are created on start_ns and end_ns.
Query Semantics
--atperforms an overlap lookup, not just an exact nanosecond equality check.- Query precision follows the precision supplied by the user.
- A second-precision value like
2026-03-18T12-00-23is treated as the whole second[12:00:23.000, 12:00:23.999999999]. - Integer epochs are widened similarly by their apparent unit:
- 10 digits or fewer: seconds
- 11-13 digits: milliseconds
- 14-16 digits: microseconds
- 17+ digits: nanoseconds
--start/--endreturns every segment whose[start_ns, end_ns]overlaps the requested interval.
Timezone Behavior
- Query default is
--timezone dataset. datasetresolves to thedefault_timezonestored inmeta.- If inference is unavailable, the script falls back to
local. - Explicit values are also accepted:
localUTC- fixed offsets such as
UTC+08:00 - IANA zone names such as
Asia/Shanghai