x

2026-02-04 03:03:16 +00:00
commit 53a443d120
9 changed files with 2353 additions and 0 deletions
--- a/ZED_SDK_ARCHITECTURE.md
+++ b/ZED_SDK_ARCHITECTURE.md
@ -0,0 +1,349 @@
+# ZED SDK Architecture: Streaming vs Fusion API
+
+## Overview
+
+The ZED SDK provides two distinct APIs for transmitting camera data over a network:
+
+1. **Streaming API** (`enableStreaming`) - Video streaming
+2. **Fusion API** (`startPublishing`) - Metadata publishing
+
+These serve fundamentally different use cases and have different compute/bandwidth tradeoffs.
+
+## API Comparison
+
+| Feature | Streaming API | Fusion API |
+|---------|---------------|------------|
+| **Primary Use Case** | Remote camera access | Multi-camera data fusion |
+| **Data Transmitted** | Compressed video (H264/H265) | Metadata only (bodies, objects, poses) |
+| **Bandwidth per Camera** | 10-40 Mbps | <100 Kbps |
+| **Edge Compute** | Video encoding only (NVENC) | Full depth NN + tracking + detection |
+| **Host Compute** | Full depth NN + tracking + detection | Lightweight fusion only |
+| **Synchronization** | None | Time-synced + geometric calibration |
+| **360° Coverage** | No | Yes (fuses overlapping views) |
+| **Receiver API** | `zed.open()` with `INPUT_TYPE::STREAM` | `fusion.subscribe()` |
+
+## Architecture Diagrams
+
+### Streaming API (Single Camera Remote Access)
+
+```
+┌─────────────────────┐                      ┌─────────────────────┐
+│   Edge (Jetson)     │                      │   Host (Server)     │
+│                     │                      │                     │
+│  ┌───────────────┐  │    H264/H265 RTP     │  ┌───────────────┐  │
+│  │  ZED Camera   │  │   (10-40 Mbps)       │  │  Decode       │  │
+│  └───────┬───────┘  │ ───────────────────► │  │  (NVENC)      │  │
+│          │          │                      │  └───────┬───────┘  │
+│  ┌───────▼───────┐  │                      │          │          │
+│  │  NVENC        │  │                      │  ┌───────▼───────┐  │
+│  │  Encode       │──┘                      │  │  Neural Depth │  │
+│  │  (hardware)   │                         │  │  (NN on GPU)  │  │
+│  └───────────────┘                         │  └───────┬───────┘  │
+│                                            │          │          │
+│                                            │  ┌───────▼───────┐  │
+│                                            │  │  Tracking /   │  │
+│                                            │  │  Detection    │  │
+│                                            │  └───────┬───────┘  │
+│                                            │          │          │
+│                                            │  ┌───────▼───────┐  │
+│                                            │  │  Point Cloud  │  │
+│                                            │  └───────────────┘  │
+└─────────────────────┘                      └─────────────────────┘
+
+Edge: Lightweight (encode only)
+Host: Heavy (NN depth + all processing)
+```
+
+### Fusion API (Multi-Camera 360° Coverage)
+
+```
+┌─────────────────────┐
+│   Edge #1 (Jetson)  │
+│  ┌───────────────┐  │
+│  │  ZED Camera   │  │
+│  └───────┬───────┘  │     Metadata Only
+│  ┌───────▼───────┐  │     (bodies, poses)
+│  │  Neural Depth │  │     (<100 Kbps)        ┌─────────────────────┐
+│  │  (NN on GPU)  │  │ ──────────────────────►│                     │
+│  └───────┬───────┘  │                        │   Fusion Server     │
+│  ┌───────▼───────┐  │                        │                     │
+│  │  Body Track   │──┘                        │  ┌───────────────┐  │
+│  └───────────────┘                           │  │  Subscribe    │  │
+└─────────────────────┘                        │  │  to all       │  │
+                                               │  │  cameras      │  │
+┌─────────────────────┐                        │  └───────┬───────┘  │
+│   Edge #2 (Jetson)  │                        │          │          │
+│  ┌───────────────┐  │     Metadata Only      │  ┌───────▼───────┐  │
+│  │  ZED Camera   │  │ ──────────────────────►│  │  Time Sync    │  │
+│  └───────┬───────┘  │                        │  │  + Geometric  │  │
+│  ┌───────▼───────┐  │                        │  │  Calibration  │  │
+│  │  Neural Depth │  │                        │  └───────┬───────┘  │
+│  └───────┬───────┘  │                        │          │          │
+│  ┌───────▼───────┐  │                        │  ┌───────▼───────┐  │
+│  │  Body Track   │──┘                        │  │  360° Fusion  │  │
+│  └───────────────┘                           │  │  (merge views)│  │
+└─────────────────────┘                        │  └───────────────┘  │
+                                               │                     │
+┌─────────────────────┐                        │  Lightweight GPU    │
+│   Edge #3 (Jetson)  │     Metadata Only      │  requirements       │
+│       ...           │ ──────────────────────►│                     │
+└─────────────────────┘                        └─────────────────────┘
+
+Each Edge: Heavy (NN depth + tracking)
+Fusion Server: Lightweight (data fusion only)
+```
+
+## Communication Modes
+
+### Streaming API
+
+| Mode | Description |
+|------|-------------|
+| **H264** | AVCHD encoding, wider GPU support |
+| **H265** | HEVC encoding, better compression, requires Pascal+ GPU |
+
+Port: Even number (default 30000), uses RTP protocol.
+
+### Fusion API
+
+| Mode | Description |
+|------|-------------|
+| **INTRA_PROCESS** | Same machine, shared memory (zero-copy) |
+| **LOCAL_NETWORK** | Different machines, RTP over network |
+
+Port: Default 30000, configurable per camera.
+
+## Bandwidth Requirements
+
+### Streaming (H265 Compressed Video)
+
+| Resolution | FPS | Bitrate per Camera | 4 Cameras |
+|------------|-----|-------------------|-----------|
+| 2K | 15 | 7 Mbps | 28 Mbps |
+| HD1080 | 30 | 11 Mbps | 44 Mbps |
+| HD720 | 60 | 6 Mbps | 24 Mbps |
+| HD1200 | 30 | ~12 Mbps | ~48 Mbps |
+
+### Fusion (Metadata Only)
+
+| Data Type | Size per Frame | @ 30 FPS | 4 Cameras |
+|-----------|---------------|----------|-----------|
+| Body (18 keypoints) | ~2 KB | ~60 KB/s | ~240 KB/s |
+| Object detection | ~1 KB | ~30 KB/s | ~120 KB/s |
+| Pose/Transform | ~100 B | ~3 KB/s | ~12 KB/s |
+
+**Fusion uses 100-1000x less bandwidth than Streaming.**
+
+## The Architectural Gap
+
+### What You CAN Do
+
+| Scenario | API | Edge Computes | Host Receives |
+|----------|-----|---------------|---------------|
+| Remote camera access | Streaming | Video encoding | Video → computes depth/tracking |
+| Multi-camera fusion | Fusion | Depth + tracking | Metadata only (bodies, poses) |
+| Local processing | Direct | Everything | N/A (same machine) |
+
+### What You CANNOT Do
+
+**There is no ZED SDK mode for:**
+
+```
+┌─────────────────────┐                      ┌─────────────────────┐
+│   Edge (Jetson)     │                      │   Host (Server)     │
+│                     │                      │                     │
+│  ┌───────────────┐  │     Depth Map /      │  ┌───────────────┐  │
+│  │  ZED Camera   │  │     Point Cloud      │  │  Receive      │  │
+│  └───────┬───────┘  │                      │  │  Depth/PC     │  │
+│          │          │         ???          │  └───────┬───────┘  │
+│  ┌───────▼───────┐  │ ─────────────────X─► │          │          │
+│  │  Neural Depth │  │   NOT SUPPORTED      │  ┌───────▼───────┐  │
+│  │  (NN on GPU)  │  │                      │  │  Further      │  │
+│  └───────┬───────┘  │                      │  │  Processing   │  │
+│          │          │                      │  └───────────────┘  │
+│  ┌───────▼───────┐  │                      │                     │
+│  │  Point Cloud  │──┘                      │                     │
+│  └───────────────┘                         │                     │
+└─────────────────────┘                      └─────────────────────┘
+
+❌ Edge computes depth → streams depth map → Host receives depth
+❌ Edge computes point cloud → streams point cloud → Host receives point cloud
+```
+
+## Why This Architecture?
+
+### 1. Bandwidth Economics
+
+Point cloud streaming would require significantly more bandwidth than video:
+
+| Data Type | Size per Frame (HD1080) | @ 30 FPS |
+|-----------|------------------------|----------|
+| Raw stereo video | ~12 MB | 360 MB/s |
+| H265 compressed | ~46 KB | 11 Mbps |
+| Depth map (16-bit) | ~4 MB | 120 MB/s |
+| Point cloud (XYZ float) | ~12 MB | 360 MB/s |
+
+Compressed depth/point cloud is lossy and still large (~50-100 Mbps).
+
+### 2. Compute Distribution Philosophy
+
+ZED SDK follows: **"Compute entirely at edge OR entirely at host, not split"**
+
+| Scenario | Solution |
+|----------|----------|
+| Low bandwidth, multi-camera | Fusion (edge computes all, sends metadata) |
+| High bandwidth, single camera | Streaming (host computes all) |
+| Same machine | INTRA_PROCESS (shared memory) |
+
+### 3. Fusion API Design Goals
+
+From Stereolabs documentation:
+
+> "The Fusion module is **lightweight** (in computation resources requirements) compared to the requirements for camera publishers."
+
+The Fusion receiver is intentionally lightweight because:
+- It only needs to fuse pre-computed metadata
+- It handles time synchronization and geometric calibration
+- It can run on modest hardware while edges do heavy compute
+
+### 4. Product Strategy
+
+Stereolabs sells:
+- **ZED cameras** (hardware)
+- **ZED Box** (edge compute appliances)
+- **ZED Hub** (cloud management)
+
+The Fusion API encourages purchasing ZED Boxes for edge compute rather than building custom streaming solutions.
+
+## Workarounds for Custom Point Cloud Streaming
+
+If you need to stream point clouds from edge to host (outside ZED SDK):
+
+### Option 1: Custom Compression + Streaming
+
+```cpp
+// On edge: compute point cloud, compress, send
+sl::Mat point_cloud;
+zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
+
+// Compress with Draco/PCL octree
+std::vector<uint8_t> compressed = draco_compress(point_cloud);
+
+// Send via ZeroMQ/gRPC/raw UDP
+socket.send(compressed);
+```
+
+### Option 2: Depth Map Streaming
+
+```cpp
+// On edge: get depth, compress as 16-bit PNG, send
+sl::Mat depth;
+zed.retrieveMeasure(depth, MEASURE::DEPTH);
+
+// Compress as lossless PNG
+cv::Mat depth_cv = slMat2cvMat(depth);
+std::vector<uint8_t> png;
+cv::imencode(".png", depth_cv, png);
+
+// Send via network
+socket.send(png);
+```
+
+### Bandwidth Estimate for Custom Streaming
+
+| Method | Compression | Bandwidth (HD1080@30fps) |
+|--------|-------------|-------------------------|
+| Depth PNG (lossless) | ~4:1 | ~240 Mbps |
+| Depth JPEG (lossy) | ~20:1 | ~48 Mbps |
+| Point cloud Draco | ~10:1 | ~100 Mbps |
+
+**10 Gbps Ethernet could handle 4 cameras with custom depth streaming.**
+
+## Recommendations
+
+| Use Case | Recommended API |
+|----------|-----------------|
+| Single camera, remote development | Streaming |
+| Multi-camera body tracking | Fusion |
+| Multi-camera 360° coverage | Fusion |
+| Custom point cloud pipeline | Manual (ZeroMQ + Draco) |
+| Low latency, same machine | INTRA_PROCESS |
+
+## Code Examples
+
+### Streaming Sender (Edge)
+
+```cpp
+sl::StreamingParameters stream_params;
+stream_params.codec = sl::STREAMING_CODEC::H265;
+stream_params.bitrate = 12000;
+stream_params.port = 30000;
+
+zed.enableStreaming(stream_params);
+
+while (running) {
+    zed.grab(); // Encodes and sends frame
+}
+```
+
+### Streaming Receiver (Host)
+
+```cpp
+sl::InitParameters init_params;
+init_params.input.setFromStream("192.168.1.100", 30000);
+
+zed.open(init_params);
+
+while (running) {
+    if (zed.grab() == ERROR_CODE::SUCCESS) {
+        // Full ZED SDK available - depth, tracking, etc.
+        zed.retrieveMeasure(depth, MEASURE::DEPTH);
+        zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
+    }
+}
+```
+
+### Fusion Sender (Edge)
+
+```cpp
+// Enable body tracking
+zed.enableBodyTracking(body_params);
+
+// Start publishing metadata
+sl::CommunicationParameters comm_params;
+comm_params.setForLocalNetwork(30000);
+zed.startPublishing(comm_params);
+
+while (running) {
+    if (zed.grab() == ERROR_CODE::SUCCESS) {
+        zed.retrieveBodies(bodies); // Computes and publishes
+    }
+}
+```
+
+### Fusion Receiver (Host)
+
+```cpp
+sl::Fusion fusion;
+fusion.init(init_fusion_params);
+
+sl::CameraIdentifier cam1(serial_number);
+fusion.subscribe(cam1, comm_params, pose);
+
+while (running) {
+    if (fusion.process() == FUSION_ERROR_CODE::SUCCESS) {
+        fusion.retrieveBodies(fused_bodies); // Already computed by edges
+    }
+}
+```
+
+## Summary
+
+The ZED SDK architecture forces a choice:
+
+1. **Streaming**: Edge sends video → Host computes depth (NN inference on host)
+2. **Fusion**: Edge computes depth → Sends metadata only (no point cloud)
+
+There is **no built-in support** for streaming computed depth maps or point clouds from edge to host. This is by design for bandwidth efficiency and to encourage use of ZED Box edge compute products.
+
+For custom depth/point cloud streaming, you must implement your own compression and network layer outside the ZED SDK.