x
This commit is contained in:
349
ZED_SDK_ARCHITECTURE.md
Normal file
349
ZED_SDK_ARCHITECTURE.md
Normal file
@ -0,0 +1,349 @@
|
||||
# ZED SDK Architecture: Streaming vs Fusion API
|
||||
|
||||
## Overview
|
||||
|
||||
The ZED SDK provides two distinct APIs for transmitting camera data over a network:
|
||||
|
||||
1. **Streaming API** (`enableStreaming`) - Video streaming
|
||||
2. **Fusion API** (`startPublishing`) - Metadata publishing
|
||||
|
||||
These serve fundamentally different use cases and have different compute/bandwidth tradeoffs.
|
||||
|
||||
## API Comparison
|
||||
|
||||
| Feature | Streaming API | Fusion API |
|
||||
|---------|---------------|------------|
|
||||
| **Primary Use Case** | Remote camera access | Multi-camera data fusion |
|
||||
| **Data Transmitted** | Compressed video (H264/H265) | Metadata only (bodies, objects, poses) |
|
||||
| **Bandwidth per Camera** | 10-40 Mbps | <100 Kbps |
|
||||
| **Edge Compute** | Video encoding only (NVENC) | Full depth NN + tracking + detection |
|
||||
| **Host Compute** | Full depth NN + tracking + detection | Lightweight fusion only |
|
||||
| **Synchronization** | None | Time-synced + geometric calibration |
|
||||
| **360° Coverage** | No | Yes (fuses overlapping views) |
|
||||
| **Receiver API** | `zed.open()` with `INPUT_TYPE::STREAM` | `fusion.subscribe()` |
|
||||
|
||||
## Architecture Diagrams
|
||||
|
||||
### Streaming API (Single Camera Remote Access)
|
||||
|
||||
```
|
||||
┌─────────────────────┐ ┌─────────────────────┐
|
||||
│ Edge (Jetson) │ │ Host (Server) │
|
||||
│ │ │ │
|
||||
│ ┌───────────────┐ │ H264/H265 RTP │ ┌───────────────┐ │
|
||||
│ │ ZED Camera │ │ (10-40 Mbps) │ │ Decode │ │
|
||||
│ └───────┬───────┘ │ ───────────────────► │ │ (NVENC) │ │
|
||||
│ │ │ │ └───────┬───────┘ │
|
||||
│ ┌───────▼───────┐ │ │ │ │
|
||||
│ │ NVENC │ │ │ ┌───────▼───────┐ │
|
||||
│ │ Encode │──┘ │ │ Neural Depth │ │
|
||||
│ │ (hardware) │ │ │ (NN on GPU) │ │
|
||||
│ └───────────────┘ │ └───────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ ┌───────▼───────┐ │
|
||||
│ │ │ Tracking / │ │
|
||||
│ │ │ Detection │ │
|
||||
│ │ └───────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ ┌───────▼───────┐ │
|
||||
│ │ │ Point Cloud │ │
|
||||
│ │ └───────────────┘ │
|
||||
└─────────────────────┘ └─────────────────────┘
|
||||
|
||||
Edge: Lightweight (encode only)
|
||||
Host: Heavy (NN depth + all processing)
|
||||
```
|
||||
|
||||
### Fusion API (Multi-Camera 360° Coverage)
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ Edge #1 (Jetson) │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ ZED Camera │ │
|
||||
│ └───────┬───────┘ │ Metadata Only
|
||||
│ ┌───────▼───────┐ │ (bodies, poses)
|
||||
│ │ Neural Depth │ │ (<100 Kbps) ┌─────────────────────┐
|
||||
│ │ (NN on GPU) │ │ ──────────────────────►│ │
|
||||
│ └───────┬───────┘ │ │ Fusion Server │
|
||||
│ ┌───────▼───────┐ │ │ │
|
||||
│ │ Body Track │──┘ │ ┌───────────────┐ │
|
||||
│ └───────────────┘ │ │ Subscribe │ │
|
||||
└─────────────────────┘ │ │ to all │ │
|
||||
│ │ cameras │ │
|
||||
┌─────────────────────┐ │ └───────┬───────┘ │
|
||||
│ Edge #2 (Jetson) │ │ │ │
|
||||
│ ┌───────────────┐ │ Metadata Only │ ┌───────▼───────┐ │
|
||||
│ │ ZED Camera │ │ ──────────────────────►│ │ Time Sync │ │
|
||||
│ └───────┬───────┘ │ │ │ + Geometric │ │
|
||||
│ ┌───────▼───────┐ │ │ │ Calibration │ │
|
||||
│ │ Neural Depth │ │ │ └───────┬───────┘ │
|
||||
│ └───────┬───────┘ │ │ │ │
|
||||
│ ┌───────▼───────┐ │ │ ┌───────▼───────┐ │
|
||||
│ │ Body Track │──┘ │ │ 360° Fusion │ │
|
||||
│ └───────────────┘ │ │ (merge views)│ │
|
||||
└─────────────────────┘ │ └───────────────┘ │
|
||||
│ │
|
||||
┌─────────────────────┐ │ Lightweight GPU │
|
||||
│ Edge #3 (Jetson) │ Metadata Only │ requirements │
|
||||
│ ... │ ──────────────────────►│ │
|
||||
└─────────────────────┘ └─────────────────────┘
|
||||
|
||||
Each Edge: Heavy (NN depth + tracking)
|
||||
Fusion Server: Lightweight (data fusion only)
|
||||
```
|
||||
|
||||
## Communication Modes
|
||||
|
||||
### Streaming API
|
||||
|
||||
| Mode | Description |
|
||||
|------|-------------|
|
||||
| **H264** | AVCHD encoding, wider GPU support |
|
||||
| **H265** | HEVC encoding, better compression, requires Pascal+ GPU |
|
||||
|
||||
Port: Even number (default 30000), uses RTP protocol.
|
||||
|
||||
### Fusion API
|
||||
|
||||
| Mode | Description |
|
||||
|------|-------------|
|
||||
| **INTRA_PROCESS** | Same machine, shared memory (zero-copy) |
|
||||
| **LOCAL_NETWORK** | Different machines, RTP over network |
|
||||
|
||||
Port: Default 30000, configurable per camera.
|
||||
|
||||
## Bandwidth Requirements
|
||||
|
||||
### Streaming (H265 Compressed Video)
|
||||
|
||||
| Resolution | FPS | Bitrate per Camera | 4 Cameras |
|
||||
|------------|-----|-------------------|-----------|
|
||||
| 2K | 15 | 7 Mbps | 28 Mbps |
|
||||
| HD1080 | 30 | 11 Mbps | 44 Mbps |
|
||||
| HD720 | 60 | 6 Mbps | 24 Mbps |
|
||||
| HD1200 | 30 | ~12 Mbps | ~48 Mbps |
|
||||
|
||||
### Fusion (Metadata Only)
|
||||
|
||||
| Data Type | Size per Frame | @ 30 FPS | 4 Cameras |
|
||||
|-----------|---------------|----------|-----------|
|
||||
| Body (18 keypoints) | ~2 KB | ~60 KB/s | ~240 KB/s |
|
||||
| Object detection | ~1 KB | ~30 KB/s | ~120 KB/s |
|
||||
| Pose/Transform | ~100 B | ~3 KB/s | ~12 KB/s |
|
||||
|
||||
**Fusion uses 100-1000x less bandwidth than Streaming.**
|
||||
|
||||
## The Architectural Gap
|
||||
|
||||
### What You CAN Do
|
||||
|
||||
| Scenario | API | Edge Computes | Host Receives |
|
||||
|----------|-----|---------------|---------------|
|
||||
| Remote camera access | Streaming | Video encoding | Video → computes depth/tracking |
|
||||
| Multi-camera fusion | Fusion | Depth + tracking | Metadata only (bodies, poses) |
|
||||
| Local processing | Direct | Everything | N/A (same machine) |
|
||||
|
||||
### What You CANNOT Do
|
||||
|
||||
**There is no ZED SDK mode for:**
|
||||
|
||||
```
|
||||
┌─────────────────────┐ ┌─────────────────────┐
|
||||
│ Edge (Jetson) │ │ Host (Server) │
|
||||
│ │ │ │
|
||||
│ ┌───────────────┐ │ Depth Map / │ ┌───────────────┐ │
|
||||
│ │ ZED Camera │ │ Point Cloud │ │ Receive │ │
|
||||
│ └───────┬───────┘ │ │ │ Depth/PC │ │
|
||||
│ │ │ ??? │ └───────┬───────┘ │
|
||||
│ ┌───────▼───────┐ │ ─────────────────X─► │ │ │
|
||||
│ │ Neural Depth │ │ NOT SUPPORTED │ ┌───────▼───────┐ │
|
||||
│ │ (NN on GPU) │ │ │ │ Further │ │
|
||||
│ └───────┬───────┘ │ │ │ Processing │ │
|
||||
│ │ │ │ └───────────────┘ │
|
||||
│ ┌───────▼───────┐ │ │ │
|
||||
│ │ Point Cloud │──┘ │ │
|
||||
│ └───────────────┘ │ │
|
||||
└─────────────────────┘ └─────────────────────┘
|
||||
|
||||
❌ Edge computes depth → streams depth map → Host receives depth
|
||||
❌ Edge computes point cloud → streams point cloud → Host receives point cloud
|
||||
```
|
||||
|
||||
## Why This Architecture?
|
||||
|
||||
### 1. Bandwidth Economics
|
||||
|
||||
Point cloud streaming would require significantly more bandwidth than video:
|
||||
|
||||
| Data Type | Size per Frame (HD1080) | @ 30 FPS |
|
||||
|-----------|------------------------|----------|
|
||||
| Raw stereo video | ~12 MB | 360 MB/s |
|
||||
| H265 compressed | ~46 KB | 11 Mbps |
|
||||
| Depth map (16-bit) | ~4 MB | 120 MB/s |
|
||||
| Point cloud (XYZ float) | ~12 MB | 360 MB/s |
|
||||
|
||||
Compressed depth/point cloud is lossy and still large (~50-100 Mbps).
|
||||
|
||||
### 2. Compute Distribution Philosophy
|
||||
|
||||
ZED SDK follows: **"Compute entirely at edge OR entirely at host, not split"**
|
||||
|
||||
| Scenario | Solution |
|
||||
|----------|----------|
|
||||
| Low bandwidth, multi-camera | Fusion (edge computes all, sends metadata) |
|
||||
| High bandwidth, single camera | Streaming (host computes all) |
|
||||
| Same machine | INTRA_PROCESS (shared memory) |
|
||||
|
||||
### 3. Fusion API Design Goals
|
||||
|
||||
From Stereolabs documentation:
|
||||
|
||||
> "The Fusion module is **lightweight** (in computation resources requirements) compared to the requirements for camera publishers."
|
||||
|
||||
The Fusion receiver is intentionally lightweight because:
|
||||
- It only needs to fuse pre-computed metadata
|
||||
- It handles time synchronization and geometric calibration
|
||||
- It can run on modest hardware while edges do heavy compute
|
||||
|
||||
### 4. Product Strategy
|
||||
|
||||
Stereolabs sells:
|
||||
- **ZED cameras** (hardware)
|
||||
- **ZED Box** (edge compute appliances)
|
||||
- **ZED Hub** (cloud management)
|
||||
|
||||
The Fusion API encourages purchasing ZED Boxes for edge compute rather than building custom streaming solutions.
|
||||
|
||||
## Workarounds for Custom Point Cloud Streaming
|
||||
|
||||
If you need to stream point clouds from edge to host (outside ZED SDK):
|
||||
|
||||
### Option 1: Custom Compression + Streaming
|
||||
|
||||
```cpp
|
||||
// On edge: compute point cloud, compress, send
|
||||
sl::Mat point_cloud;
|
||||
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
|
||||
|
||||
// Compress with Draco/PCL octree
|
||||
std::vector<uint8_t> compressed = draco_compress(point_cloud);
|
||||
|
||||
// Send via ZeroMQ/gRPC/raw UDP
|
||||
socket.send(compressed);
|
||||
```
|
||||
|
||||
### Option 2: Depth Map Streaming
|
||||
|
||||
```cpp
|
||||
// On edge: get depth, compress as 16-bit PNG, send
|
||||
sl::Mat depth;
|
||||
zed.retrieveMeasure(depth, MEASURE::DEPTH);
|
||||
|
||||
// Compress as lossless PNG
|
||||
cv::Mat depth_cv = slMat2cvMat(depth);
|
||||
std::vector<uint8_t> png;
|
||||
cv::imencode(".png", depth_cv, png);
|
||||
|
||||
// Send via network
|
||||
socket.send(png);
|
||||
```
|
||||
|
||||
### Bandwidth Estimate for Custom Streaming
|
||||
|
||||
| Method | Compression | Bandwidth (HD1080@30fps) |
|
||||
|--------|-------------|-------------------------|
|
||||
| Depth PNG (lossless) | ~4:1 | ~240 Mbps |
|
||||
| Depth JPEG (lossy) | ~20:1 | ~48 Mbps |
|
||||
| Point cloud Draco | ~10:1 | ~100 Mbps |
|
||||
|
||||
**10 Gbps Ethernet could handle 4 cameras with custom depth streaming.**
|
||||
|
||||
## Recommendations
|
||||
|
||||
| Use Case | Recommended API |
|
||||
|----------|-----------------|
|
||||
| Single camera, remote development | Streaming |
|
||||
| Multi-camera body tracking | Fusion |
|
||||
| Multi-camera 360° coverage | Fusion |
|
||||
| Custom point cloud pipeline | Manual (ZeroMQ + Draco) |
|
||||
| Low latency, same machine | INTRA_PROCESS |
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Streaming Sender (Edge)
|
||||
|
||||
```cpp
|
||||
sl::StreamingParameters stream_params;
|
||||
stream_params.codec = sl::STREAMING_CODEC::H265;
|
||||
stream_params.bitrate = 12000;
|
||||
stream_params.port = 30000;
|
||||
|
||||
zed.enableStreaming(stream_params);
|
||||
|
||||
while (running) {
|
||||
zed.grab(); // Encodes and sends frame
|
||||
}
|
||||
```
|
||||
|
||||
### Streaming Receiver (Host)
|
||||
|
||||
```cpp
|
||||
sl::InitParameters init_params;
|
||||
init_params.input.setFromStream("192.168.1.100", 30000);
|
||||
|
||||
zed.open(init_params);
|
||||
|
||||
while (running) {
|
||||
if (zed.grab() == ERROR_CODE::SUCCESS) {
|
||||
// Full ZED SDK available - depth, tracking, etc.
|
||||
zed.retrieveMeasure(depth, MEASURE::DEPTH);
|
||||
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Fusion Sender (Edge)
|
||||
|
||||
```cpp
|
||||
// Enable body tracking
|
||||
zed.enableBodyTracking(body_params);
|
||||
|
||||
// Start publishing metadata
|
||||
sl::CommunicationParameters comm_params;
|
||||
comm_params.setForLocalNetwork(30000);
|
||||
zed.startPublishing(comm_params);
|
||||
|
||||
while (running) {
|
||||
if (zed.grab() == ERROR_CODE::SUCCESS) {
|
||||
zed.retrieveBodies(bodies); // Computes and publishes
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Fusion Receiver (Host)
|
||||
|
||||
```cpp
|
||||
sl::Fusion fusion;
|
||||
fusion.init(init_fusion_params);
|
||||
|
||||
sl::CameraIdentifier cam1(serial_number);
|
||||
fusion.subscribe(cam1, comm_params, pose);
|
||||
|
||||
while (running) {
|
||||
if (fusion.process() == FUSION_ERROR_CODE::SUCCESS) {
|
||||
fusion.retrieveBodies(fused_bodies); // Already computed by edges
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
The ZED SDK architecture forces a choice:
|
||||
|
||||
1. **Streaming**: Edge sends video → Host computes depth (NN inference on host)
|
||||
2. **Fusion**: Edge computes depth → Sends metadata only (no point cloud)
|
||||
|
||||
There is **no built-in support** for streaming computed depth maps or point clouds from edge to host. This is by design for bandwidth efficiency and to encourage use of ZED Box edge compute products.
|
||||
|
||||
For custom depth/point cloud streaming, you must implement your own compression and network layer outside the ZED SDK.
|
||||
Reference in New Issue
Block a user