350 lines
15 KiB
Markdown
350 lines
15 KiB
Markdown
# ZED SDK Architecture: Streaming vs Fusion API
|
|
|
|
## Overview
|
|
|
|
The ZED SDK provides two distinct APIs for transmitting camera data over a network:
|
|
|
|
1. **Streaming API** (`enableStreaming`) - Video streaming
|
|
2. **Fusion API** (`startPublishing`) - Metadata publishing
|
|
|
|
These serve fundamentally different use cases and have different compute/bandwidth tradeoffs.
|
|
|
|
## API Comparison
|
|
|
|
| Feature | Streaming API | Fusion API |
|
|
|---------|---------------|------------|
|
|
| **Primary Use Case** | Remote camera access | Multi-camera data fusion |
|
|
| **Data Transmitted** | Compressed video (H264/H265) | Metadata only (bodies, objects, poses) |
|
|
| **Bandwidth per Camera** | 10-40 Mbps | <100 Kbps |
|
|
| **Edge Compute** | Video encoding only (NVENC) | Full depth NN + tracking + detection |
|
|
| **Host Compute** | Full depth NN + tracking + detection | Lightweight fusion only |
|
|
| **Synchronization** | None | Time-synced + geometric calibration |
|
|
| **360° Coverage** | No | Yes (fuses overlapping views) |
|
|
| **Receiver API** | `zed.open()` with `INPUT_TYPE::STREAM` | `fusion.subscribe()` |
|
|
|
|
## Architecture Diagrams
|
|
|
|
### Streaming API (Single Camera Remote Access)
|
|
|
|
```
|
|
┌─────────────────────┐ ┌─────────────────────┐
|
|
│ Edge (Jetson) │ │ Host (Server) │
|
|
│ │ │ │
|
|
│ ┌───────────────┐ │ H264/H265 RTP │ ┌───────────────┐ │
|
|
│ │ ZED Camera │ │ (10-40 Mbps) │ │ Decode │ │
|
|
│ └───────┬───────┘ │ ───────────────────► │ │ (NVENC) │ │
|
|
│ │ │ │ └───────┬───────┘ │
|
|
│ ┌───────▼───────┐ │ │ │ │
|
|
│ │ NVENC │ │ │ ┌───────▼───────┐ │
|
|
│ │ Encode │──┘ │ │ Neural Depth │ │
|
|
│ │ (hardware) │ │ │ (NN on GPU) │ │
|
|
│ └───────────────┘ │ └───────┬───────┘ │
|
|
│ │ │ │
|
|
│ │ ┌───────▼───────┐ │
|
|
│ │ │ Tracking / │ │
|
|
│ │ │ Detection │ │
|
|
│ │ └───────┬───────┘ │
|
|
│ │ │ │
|
|
│ │ ┌───────▼───────┐ │
|
|
│ │ │ Point Cloud │ │
|
|
│ │ └───────────────┘ │
|
|
└─────────────────────┘ └─────────────────────┘
|
|
|
|
Edge: Lightweight (encode only)
|
|
Host: Heavy (NN depth + all processing)
|
|
```
|
|
|
|
### Fusion API (Multi-Camera 360° Coverage)
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ Edge #1 (Jetson) │
|
|
│ ┌───────────────┐ │
|
|
│ │ ZED Camera │ │
|
|
│ └───────┬───────┘ │ Metadata Only
|
|
│ ┌───────▼───────┐ │ (bodies, poses)
|
|
│ │ Neural Depth │ │ (<100 Kbps) ┌─────────────────────┐
|
|
│ │ (NN on GPU) │ │ ──────────────────────►│ │
|
|
│ └───────┬───────┘ │ │ Fusion Server │
|
|
│ ┌───────▼───────┐ │ │ │
|
|
│ │ Body Track │──┘ │ ┌───────────────┐ │
|
|
│ └───────────────┘ │ │ Subscribe │ │
|
|
└─────────────────────┘ │ │ to all │ │
|
|
│ │ cameras │ │
|
|
┌─────────────────────┐ │ └───────┬───────┘ │
|
|
│ Edge #2 (Jetson) │ │ │ │
|
|
│ ┌───────────────┐ │ Metadata Only │ ┌───────▼───────┐ │
|
|
│ │ ZED Camera │ │ ──────────────────────►│ │ Time Sync │ │
|
|
│ └───────┬───────┘ │ │ │ + Geometric │ │
|
|
│ ┌───────▼───────┐ │ │ │ Calibration │ │
|
|
│ │ Neural Depth │ │ │ └───────┬───────┘ │
|
|
│ └───────┬───────┘ │ │ │ │
|
|
│ ┌───────▼───────┐ │ │ ┌───────▼───────┐ │
|
|
│ │ Body Track │──┘ │ │ 360° Fusion │ │
|
|
│ └───────────────┘ │ │ (merge views)│ │
|
|
└─────────────────────┘ │ └───────────────┘ │
|
|
│ │
|
|
┌─────────────────────┐ │ Lightweight GPU │
|
|
│ Edge #3 (Jetson) │ Metadata Only │ requirements │
|
|
│ ... │ ──────────────────────►│ │
|
|
└─────────────────────┘ └─────────────────────┘
|
|
|
|
Each Edge: Heavy (NN depth + tracking)
|
|
Fusion Server: Lightweight (data fusion only)
|
|
```
|
|
|
|
## Communication Modes
|
|
|
|
### Streaming API
|
|
|
|
| Mode | Description |
|
|
|------|-------------|
|
|
| **H264** | AVCHD encoding, wider GPU support |
|
|
| **H265** | HEVC encoding, better compression, requires Pascal+ GPU |
|
|
|
|
Port: Even number (default 30000), uses RTP protocol.
|
|
|
|
### Fusion API
|
|
|
|
| Mode | Description |
|
|
|------|-------------|
|
|
| **INTRA_PROCESS** | Same machine, shared memory (zero-copy) |
|
|
| **LOCAL_NETWORK** | Different machines, RTP over network |
|
|
|
|
Port: Default 30000, configurable per camera.
|
|
|
|
## Bandwidth Requirements
|
|
|
|
### Streaming (H265 Compressed Video)
|
|
|
|
| Resolution | FPS | Bitrate per Camera | 4 Cameras |
|
|
|------------|-----|-------------------|-----------|
|
|
| 2K | 15 | 7 Mbps | 28 Mbps |
|
|
| HD1080 | 30 | 11 Mbps | 44 Mbps |
|
|
| HD720 | 60 | 6 Mbps | 24 Mbps |
|
|
| HD1200 | 30 | ~12 Mbps | ~48 Mbps |
|
|
|
|
### Fusion (Metadata Only)
|
|
|
|
| Data Type | Size per Frame | @ 30 FPS | 4 Cameras |
|
|
|-----------|---------------|----------|-----------|
|
|
| Body (18 keypoints) | ~2 KB | ~60 KB/s | ~240 KB/s |
|
|
| Object detection | ~1 KB | ~30 KB/s | ~120 KB/s |
|
|
| Pose/Transform | ~100 B | ~3 KB/s | ~12 KB/s |
|
|
|
|
**Fusion uses 100-1000x less bandwidth than Streaming.**
|
|
|
|
## The Architectural Gap
|
|
|
|
### What You CAN Do
|
|
|
|
| Scenario | API | Edge Computes | Host Receives |
|
|
|----------|-----|---------------|---------------|
|
|
| Remote camera access | Streaming | Video encoding | Video → computes depth/tracking |
|
|
| Multi-camera fusion | Fusion | Depth + tracking | Metadata only (bodies, poses) |
|
|
| Local processing | Direct | Everything | N/A (same machine) |
|
|
|
|
### What You CANNOT Do
|
|
|
|
**There is no ZED SDK mode for:**
|
|
|
|
```
|
|
┌─────────────────────┐ ┌─────────────────────┐
|
|
│ Edge (Jetson) │ │ Host (Server) │
|
|
│ │ │ │
|
|
│ ┌───────────────┐ │ Depth Map / │ ┌───────────────┐ │
|
|
│ │ ZED Camera │ │ Point Cloud │ │ Receive │ │
|
|
│ └───────┬───────┘ │ │ │ Depth/PC │ │
|
|
│ │ │ ??? │ └───────┬───────┘ │
|
|
│ ┌───────▼───────┐ │ ─────────────────X─► │ │ │
|
|
│ │ Neural Depth │ │ NOT SUPPORTED │ ┌───────▼───────┐ │
|
|
│ │ (NN on GPU) │ │ │ │ Further │ │
|
|
│ └───────┬───────┘ │ │ │ Processing │ │
|
|
│ │ │ │ └───────────────┘ │
|
|
│ ┌───────▼───────┐ │ │ │
|
|
│ │ Point Cloud │──┘ │ │
|
|
│ └───────────────┘ │ │
|
|
└─────────────────────┘ └─────────────────────┘
|
|
|
|
❌ Edge computes depth → streams depth map → Host receives depth
|
|
❌ Edge computes point cloud → streams point cloud → Host receives point cloud
|
|
```
|
|
|
|
## Why This Architecture?
|
|
|
|
### 1. Bandwidth Economics
|
|
|
|
Point cloud streaming would require significantly more bandwidth than video:
|
|
|
|
| Data Type | Size per Frame (HD1080) | @ 30 FPS |
|
|
|-----------|------------------------|----------|
|
|
| Raw stereo video | ~12 MB | 360 MB/s |
|
|
| H265 compressed | ~46 KB | 11 Mbps |
|
|
| Depth map (16-bit) | ~4 MB | 120 MB/s |
|
|
| Point cloud (XYZ float) | ~12 MB | 360 MB/s |
|
|
|
|
Compressed depth/point cloud is lossy and still large (~50-100 Mbps).
|
|
|
|
### 2. Compute Distribution Philosophy
|
|
|
|
ZED SDK follows: **"Compute entirely at edge OR entirely at host, not split"**
|
|
|
|
| Scenario | Solution |
|
|
|----------|----------|
|
|
| Low bandwidth, multi-camera | Fusion (edge computes all, sends metadata) |
|
|
| High bandwidth, single camera | Streaming (host computes all) |
|
|
| Same machine | INTRA_PROCESS (shared memory) |
|
|
|
|
### 3. Fusion API Design Goals
|
|
|
|
From Stereolabs documentation:
|
|
|
|
> "The Fusion module is **lightweight** (in computation resources requirements) compared to the requirements for camera publishers."
|
|
|
|
The Fusion receiver is intentionally lightweight because:
|
|
- It only needs to fuse pre-computed metadata
|
|
- It handles time synchronization and geometric calibration
|
|
- It can run on modest hardware while edges do heavy compute
|
|
|
|
### 4. Product Strategy
|
|
|
|
Stereolabs sells:
|
|
- **ZED cameras** (hardware)
|
|
- **ZED Box** (edge compute appliances)
|
|
- **ZED Hub** (cloud management)
|
|
|
|
The Fusion API encourages purchasing ZED Boxes for edge compute rather than building custom streaming solutions.
|
|
|
|
## Workarounds for Custom Point Cloud Streaming
|
|
|
|
If you need to stream point clouds from edge to host (outside ZED SDK):
|
|
|
|
### Option 1: Custom Compression + Streaming
|
|
|
|
```cpp
|
|
// On edge: compute point cloud, compress, send
|
|
sl::Mat point_cloud;
|
|
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
|
|
|
|
// Compress with Draco/PCL octree
|
|
std::vector<uint8_t> compressed = draco_compress(point_cloud);
|
|
|
|
// Send via ZeroMQ/gRPC/raw UDP
|
|
socket.send(compressed);
|
|
```
|
|
|
|
### Option 2: Depth Map Streaming
|
|
|
|
```cpp
|
|
// On edge: get depth, compress as 16-bit PNG, send
|
|
sl::Mat depth;
|
|
zed.retrieveMeasure(depth, MEASURE::DEPTH);
|
|
|
|
// Compress as lossless PNG
|
|
cv::Mat depth_cv = slMat2cvMat(depth);
|
|
std::vector<uint8_t> png;
|
|
cv::imencode(".png", depth_cv, png);
|
|
|
|
// Send via network
|
|
socket.send(png);
|
|
```
|
|
|
|
### Bandwidth Estimate for Custom Streaming
|
|
|
|
| Method | Compression | Bandwidth (HD1080@30fps) |
|
|
|--------|-------------|-------------------------|
|
|
| Depth PNG (lossless) | ~4:1 | ~240 Mbps |
|
|
| Depth JPEG (lossy) | ~20:1 | ~48 Mbps |
|
|
| Point cloud Draco | ~10:1 | ~100 Mbps |
|
|
|
|
**10 Gbps Ethernet could handle 4 cameras with custom depth streaming.**
|
|
|
|
## Recommendations
|
|
|
|
| Use Case | Recommended API |
|
|
|----------|-----------------|
|
|
| Single camera, remote development | Streaming |
|
|
| Multi-camera body tracking | Fusion |
|
|
| Multi-camera 360° coverage | Fusion |
|
|
| Custom point cloud pipeline | Manual (ZeroMQ + Draco) |
|
|
| Low latency, same machine | INTRA_PROCESS |
|
|
|
|
## Code Examples
|
|
|
|
### Streaming Sender (Edge)
|
|
|
|
```cpp
|
|
sl::StreamingParameters stream_params;
|
|
stream_params.codec = sl::STREAMING_CODEC::H265;
|
|
stream_params.bitrate = 12000;
|
|
stream_params.port = 30000;
|
|
|
|
zed.enableStreaming(stream_params);
|
|
|
|
while (running) {
|
|
zed.grab(); // Encodes and sends frame
|
|
}
|
|
```
|
|
|
|
### Streaming Receiver (Host)
|
|
|
|
```cpp
|
|
sl::InitParameters init_params;
|
|
init_params.input.setFromStream("192.168.1.100", 30000);
|
|
|
|
zed.open(init_params);
|
|
|
|
while (running) {
|
|
if (zed.grab() == ERROR_CODE::SUCCESS) {
|
|
// Full ZED SDK available - depth, tracking, etc.
|
|
zed.retrieveMeasure(depth, MEASURE::DEPTH);
|
|
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Fusion Sender (Edge)
|
|
|
|
```cpp
|
|
// Enable body tracking
|
|
zed.enableBodyTracking(body_params);
|
|
|
|
// Start publishing metadata
|
|
sl::CommunicationParameters comm_params;
|
|
comm_params.setForLocalNetwork(30000);
|
|
zed.startPublishing(comm_params);
|
|
|
|
while (running) {
|
|
if (zed.grab() == ERROR_CODE::SUCCESS) {
|
|
zed.retrieveBodies(bodies); // Computes and publishes
|
|
}
|
|
}
|
|
```
|
|
|
|
### Fusion Receiver (Host)
|
|
|
|
```cpp
|
|
sl::Fusion fusion;
|
|
fusion.init(init_fusion_params);
|
|
|
|
sl::CameraIdentifier cam1(serial_number);
|
|
fusion.subscribe(cam1, comm_params, pose);
|
|
|
|
while (running) {
|
|
if (fusion.process() == FUSION_ERROR_CODE::SUCCESS) {
|
|
fusion.retrieveBodies(fused_bodies); // Already computed by edges
|
|
}
|
|
}
|
|
```
|
|
|
|
## Summary
|
|
|
|
The ZED SDK architecture forces a choice:
|
|
|
|
1. **Streaming**: Edge sends video → Host computes depth (NN inference on host)
|
|
2. **Fusion**: Edge computes depth → Sends metadata only (no point cloud)
|
|
|
|
There is **no built-in support** for streaming computed depth maps or point clouds from edge to host. This is by design for bandwidth efficiency and to encourage use of ZED Box edge compute products.
|
|
|
|
For custom depth/point cloud streaming, you must implement your own compression and network layer outside the ZED SDK.
|