15 KiB
ZED SDK Architecture: Streaming vs Fusion API
Overview
The ZED SDK provides two distinct APIs for transmitting camera data over a network:
- Streaming API (
enableStreaming) - Video streaming - Fusion API (
startPublishing) - Metadata publishing
These serve fundamentally different use cases and have different compute/bandwidth tradeoffs.
API Comparison
| Feature | Streaming API | Fusion API |
|---|---|---|
| Primary Use Case | Remote camera access | Multi-camera data fusion |
| Data Transmitted | Compressed video (H264/H265) | Metadata only (bodies, objects, poses) |
| Bandwidth per Camera | 10-40 Mbps | <100 Kbps |
| Edge Compute | Video encoding only (NVENC) | Full depth NN + tracking + detection |
| Host Compute | Full depth NN + tracking + detection | Lightweight fusion only |
| Synchronization | None | Time-synced + geometric calibration |
| 360° Coverage | No | Yes (fuses overlapping views) |
| Receiver API | zed.open() with INPUT_TYPE::STREAM |
fusion.subscribe() |
Architecture Diagrams
Streaming API (Single Camera Remote Access)
┌─────────────────────┐ ┌─────────────────────┐
│ Edge (Jetson) │ │ Host (Server) │
│ │ │ │
│ ┌───────────────┐ │ H264/H265 RTP │ ┌───────────────┐ │
│ │ ZED Camera │ │ (10-40 Mbps) │ │ Decode │ │
│ └───────┬───────┘ │ ───────────────────► │ │ (NVENC) │ │
│ │ │ │ └───────┬───────┘ │
│ ┌───────▼───────┐ │ │ │ │
│ │ NVENC │ │ │ ┌───────▼───────┐ │
│ │ Encode │──┘ │ │ Neural Depth │ │
│ │ (hardware) │ │ │ (NN on GPU) │ │
│ └───────────────┘ │ └───────┬───────┘ │
│ │ │ │
│ │ ┌───────▼───────┐ │
│ │ │ Tracking / │ │
│ │ │ Detection │ │
│ │ └───────┬───────┘ │
│ │ │ │
│ │ ┌───────▼───────┐ │
│ │ │ Point Cloud │ │
│ │ └───────────────┘ │
└─────────────────────┘ └─────────────────────┘
Edge: Lightweight (encode only)
Host: Heavy (NN depth + all processing)
Fusion API (Multi-Camera 360° Coverage)
┌─────────────────────┐
│ Edge #1 (Jetson) │
│ ┌───────────────┐ │
│ │ ZED Camera │ │
│ └───────┬───────┘ │ Metadata Only
│ ┌───────▼───────┐ │ (bodies, poses)
│ │ Neural Depth │ │ (<100 Kbps) ┌─────────────────────┐
│ │ (NN on GPU) │ │ ──────────────────────►│ │
│ └───────┬───────┘ │ │ Fusion Server │
│ ┌───────▼───────┐ │ │ │
│ │ Body Track │──┘ │ ┌───────────────┐ │
│ └───────────────┘ │ │ Subscribe │ │
└─────────────────────┘ │ │ to all │ │
│ │ cameras │ │
┌─────────────────────┐ │ └───────┬───────┘ │
│ Edge #2 (Jetson) │ │ │ │
│ ┌───────────────┐ │ Metadata Only │ ┌───────▼───────┐ │
│ │ ZED Camera │ │ ──────────────────────►│ │ Time Sync │ │
│ └───────┬───────┘ │ │ │ + Geometric │ │
│ ┌───────▼───────┐ │ │ │ Calibration │ │
│ │ Neural Depth │ │ │ └───────┬───────┘ │
│ └───────┬───────┘ │ │ │ │
│ ┌───────▼───────┐ │ │ ┌───────▼───────┐ │
│ │ Body Track │──┘ │ │ 360° Fusion │ │
│ └───────────────┘ │ │ (merge views)│ │
└─────────────────────┘ │ └───────────────┘ │
│ │
┌─────────────────────┐ │ Lightweight GPU │
│ Edge #3 (Jetson) │ Metadata Only │ requirements │
│ ... │ ──────────────────────►│ │
└─────────────────────┘ └─────────────────────┘
Each Edge: Heavy (NN depth + tracking)
Fusion Server: Lightweight (data fusion only)
Communication Modes
Streaming API
| Mode | Description |
|---|---|
| H264 | AVCHD encoding, wider GPU support |
| H265 | HEVC encoding, better compression, requires Pascal+ GPU |
Port: Even number (default 30000), uses RTP protocol.
Fusion API
| Mode | Description |
|---|---|
| INTRA_PROCESS | Same machine, shared memory (zero-copy) |
| LOCAL_NETWORK | Different machines, RTP over network |
Port: Default 30000, configurable per camera.
Bandwidth Requirements
Streaming (H265 Compressed Video)
| Resolution | FPS | Bitrate per Camera | 4 Cameras |
|---|---|---|---|
| 2K | 15 | 7 Mbps | 28 Mbps |
| HD1080 | 30 | 11 Mbps | 44 Mbps |
| HD720 | 60 | 6 Mbps | 24 Mbps |
| HD1200 | 30 | ~12 Mbps | ~48 Mbps |
Fusion (Metadata Only)
| Data Type | Size per Frame | @ 30 FPS | 4 Cameras |
|---|---|---|---|
| Body (18 keypoints) | ~2 KB | ~60 KB/s | ~240 KB/s |
| Object detection | ~1 KB | ~30 KB/s | ~120 KB/s |
| Pose/Transform | ~100 B | ~3 KB/s | ~12 KB/s |
Fusion uses 100-1000x less bandwidth than Streaming.
The Architectural Gap
What You CAN Do
| Scenario | API | Edge Computes | Host Receives |
|---|---|---|---|
| Remote camera access | Streaming | Video encoding | Video → computes depth/tracking |
| Multi-camera fusion | Fusion | Depth + tracking | Metadata only (bodies, poses) |
| Local processing | Direct | Everything | N/A (same machine) |
What You CANNOT Do
There is no ZED SDK mode for:
┌─────────────────────┐ ┌─────────────────────┐
│ Edge (Jetson) │ │ Host (Server) │
│ │ │ │
│ ┌───────────────┐ │ Depth Map / │ ┌───────────────┐ │
│ │ ZED Camera │ │ Point Cloud │ │ Receive │ │
│ └───────┬───────┘ │ │ │ Depth/PC │ │
│ │ │ ??? │ └───────┬───────┘ │
│ ┌───────▼───────┐ │ ─────────────────X─► │ │ │
│ │ Neural Depth │ │ NOT SUPPORTED │ ┌───────▼───────┐ │
│ │ (NN on GPU) │ │ │ │ Further │ │
│ └───────┬───────┘ │ │ │ Processing │ │
│ │ │ │ └───────────────┘ │
│ ┌───────▼───────┐ │ │ │
│ │ Point Cloud │──┘ │ │
│ └───────────────┘ │ │
└─────────────────────┘ └─────────────────────┘
❌ Edge computes depth → streams depth map → Host receives depth
❌ Edge computes point cloud → streams point cloud → Host receives point cloud
Why This Architecture?
1. Bandwidth Economics
Point cloud streaming would require significantly more bandwidth than video:
| Data Type | Size per Frame (HD1080) | @ 30 FPS |
|---|---|---|
| Raw stereo video | ~12 MB | 360 MB/s |
| H265 compressed | ~46 KB | 11 Mbps |
| Depth map (16-bit) | ~4 MB | 120 MB/s |
| Point cloud (XYZ float) | ~12 MB | 360 MB/s |
Compressed depth/point cloud is lossy and still large (~50-100 Mbps).
2. Compute Distribution Philosophy
ZED SDK follows: "Compute entirely at edge OR entirely at host, not split"
| Scenario | Solution |
|---|---|
| Low bandwidth, multi-camera | Fusion (edge computes all, sends metadata) |
| High bandwidth, single camera | Streaming (host computes all) |
| Same machine | INTRA_PROCESS (shared memory) |
3. Fusion API Design Goals
From Stereolabs documentation:
"The Fusion module is lightweight (in computation resources requirements) compared to the requirements for camera publishers."
The Fusion receiver is intentionally lightweight because:
- It only needs to fuse pre-computed metadata
- It handles time synchronization and geometric calibration
- It can run on modest hardware while edges do heavy compute
4. Product Strategy
Stereolabs sells:
- ZED cameras (hardware)
- ZED Box (edge compute appliances)
- ZED Hub (cloud management)
The Fusion API encourages purchasing ZED Boxes for edge compute rather than building custom streaming solutions.
Workarounds for Custom Point Cloud Streaming
If you need to stream point clouds from edge to host (outside ZED SDK):
Option 1: Custom Compression + Streaming
// On edge: compute point cloud, compress, send
sl::Mat point_cloud;
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
// Compress with Draco/PCL octree
std::vector<uint8_t> compressed = draco_compress(point_cloud);
// Send via ZeroMQ/gRPC/raw UDP
socket.send(compressed);
Option 2: Depth Map Streaming
// On edge: get depth, compress as 16-bit PNG, send
sl::Mat depth;
zed.retrieveMeasure(depth, MEASURE::DEPTH);
// Compress as lossless PNG
cv::Mat depth_cv = slMat2cvMat(depth);
std::vector<uint8_t> png;
cv::imencode(".png", depth_cv, png);
// Send via network
socket.send(png);
Bandwidth Estimate for Custom Streaming
| Method | Compression | Bandwidth (HD1080@30fps) |
|---|---|---|
| Depth PNG (lossless) | ~4:1 | ~240 Mbps |
| Depth JPEG (lossy) | ~20:1 | ~48 Mbps |
| Point cloud Draco | ~10:1 | ~100 Mbps |
10 Gbps Ethernet could handle 4 cameras with custom depth streaming.
Recommendations
| Use Case | Recommended API |
|---|---|
| Single camera, remote development | Streaming |
| Multi-camera body tracking | Fusion |
| Multi-camera 360° coverage | Fusion |
| Custom point cloud pipeline | Manual (ZeroMQ + Draco) |
| Low latency, same machine | INTRA_PROCESS |
Code Examples
Streaming Sender (Edge)
sl::StreamingParameters stream_params;
stream_params.codec = sl::STREAMING_CODEC::H265;
stream_params.bitrate = 12000;
stream_params.port = 30000;
zed.enableStreaming(stream_params);
while (running) {
zed.grab(); // Encodes and sends frame
}
Streaming Receiver (Host)
sl::InitParameters init_params;
init_params.input.setFromStream("192.168.1.100", 30000);
zed.open(init_params);
while (running) {
if (zed.grab() == ERROR_CODE::SUCCESS) {
// Full ZED SDK available - depth, tracking, etc.
zed.retrieveMeasure(depth, MEASURE::DEPTH);
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
}
}
Fusion Sender (Edge)
// Enable body tracking
zed.enableBodyTracking(body_params);
// Start publishing metadata
sl::CommunicationParameters comm_params;
comm_params.setForLocalNetwork(30000);
zed.startPublishing(comm_params);
while (running) {
if (zed.grab() == ERROR_CODE::SUCCESS) {
zed.retrieveBodies(bodies); // Computes and publishes
}
}
Fusion Receiver (Host)
sl::Fusion fusion;
fusion.init(init_fusion_params);
sl::CameraIdentifier cam1(serial_number);
fusion.subscribe(cam1, comm_params, pose);
while (running) {
if (fusion.process() == FUSION_ERROR_CODE::SUCCESS) {
fusion.retrieveBodies(fused_bodies); // Already computed by edges
}
}
Summary
The ZED SDK architecture forces a choice:
- Streaming: Edge sends video → Host computes depth (NN inference on host)
- Fusion: Edge computes depth → Sends metadata only (no point cloud)
There is no built-in support for streaming computed depth maps or point clouds from edge to host. This is by design for bandwidth efficiency and to encourage use of ZED Box edge compute products.
For custom depth/point cloud streaming, you must implement your own compression and network layer outside the ZED SDK.