crosstyan/zed-body-tracking-multicamera

Fork 0

Files

crosstyan 53a443d120 x

2026-02-04 03:03:16 +00:00

15 KiB

Raw Blame History

ZED SDK Architecture: Streaming vs Fusion API

Overview

The ZED SDK provides two distinct APIs for transmitting camera data over a network:

Streaming API (enableStreaming) - Video streaming
Fusion API (startPublishing) - Metadata publishing

These serve fundamentally different use cases and have different compute/bandwidth tradeoffs.

API Comparison

Feature	Streaming API	Fusion API
Primary Use Case	Remote camera access	Multi-camera data fusion
Data Transmitted	Compressed video (H264/H265)	Metadata only (bodies, objects, poses)
Bandwidth per Camera	10-40 Mbps	<100 Kbps
Edge Compute	Video encoding only (NVENC)	Full depth NN + tracking + detection
Host Compute	Full depth NN + tracking + detection	Lightweight fusion only
Synchronization	None	Time-synced + geometric calibration
360° Coverage	No	Yes (fuses overlapping views)
Receiver API	`zed.open()` with `INPUT_TYPE::STREAM`	`fusion.subscribe()`

Architecture Diagrams

Streaming API (Single Camera Remote Access)

┌─────────────────────┐                      ┌─────────────────────┐
│   Edge (Jetson)     │                      │   Host (Server)     │
│                     │                      │                     │
│  ┌───────────────┐  │    H264/H265 RTP     │  ┌───────────────┐  │
│  │  ZED Camera   │  │   (10-40 Mbps)       │  │  Decode       │  │
│  └───────┬───────┘  │ ───────────────────► │  │  (NVENC)      │  │
│          │          │                      │  └───────┬───────┘  │
│  ┌───────▼───────┐  │                      │          │          │
│  │  NVENC        │  │                      │  ┌───────▼───────┐  │
│  │  Encode       │──┘                      │  │  Neural Depth │  │
│  │  (hardware)   │                         │  │  (NN on GPU)  │  │
│  └───────────────┘                         │  └───────┬───────┘  │
│                                            │          │          │
│                                            │  ┌───────▼───────┐  │
│                                            │  │  Tracking /   │  │
│                                            │  │  Detection    │  │
│                                            │  └───────┬───────┘  │
│                                            │          │          │
│                                            │  ┌───────▼───────┐  │
│                                            │  │  Point Cloud  │  │
│                                            │  └───────────────┘  │
└─────────────────────┘                      └─────────────────────┘

Edge: Lightweight (encode only)
Host: Heavy (NN depth + all processing)

Fusion API (Multi-Camera 360° Coverage)

┌─────────────────────┐
│   Edge #1 (Jetson)  │
│  ┌───────────────┐  │
│  │  ZED Camera   │  │
│  └───────┬───────┘  │     Metadata Only
│  ┌───────▼───────┐  │     (bodies, poses)
│  │  Neural Depth │  │     (<100 Kbps)        ┌─────────────────────┐
│  │  (NN on GPU)  │  │ ──────────────────────►│                     │
│  └───────┬───────┘  │                        │   Fusion Server     │
│  ┌───────▼───────┐  │                        │                     │
│  │  Body Track   │──┘                        │  ┌───────────────┐  │
│  └───────────────┘                           │  │  Subscribe    │  │
└─────────────────────┘                        │  │  to all       │  │
                                               │  │  cameras      │  │
┌─────────────────────┐                        │  └───────┬───────┘  │
│   Edge #2 (Jetson)  │                        │          │          │
│  ┌───────────────┐  │     Metadata Only      │  ┌───────▼───────┐  │
│  │  ZED Camera   │  │ ──────────────────────►│  │  Time Sync    │  │
│  └───────┬───────┘  │                        │  │  + Geometric  │  │
│  ┌───────▼───────┐  │                        │  │  Calibration  │  │
│  │  Neural Depth │  │                        │  └───────┬───────┘  │
│  └───────┬───────┘  │                        │          │          │
│  ┌───────▼───────┐  │                        │  ┌───────▼───────┐  │
│  │  Body Track   │──┘                        │  │  360° Fusion  │  │
│  └───────────────┘                           │  │  (merge views)│  │
└─────────────────────┘                        │  └───────────────┘  │
                                               │                     │
┌─────────────────────┐                        │  Lightweight GPU    │
│   Edge #3 (Jetson)  │     Metadata Only      │  requirements       │
│       ...           │ ──────────────────────►│                     │
└─────────────────────┘                        └─────────────────────┘

Each Edge: Heavy (NN depth + tracking)
Fusion Server: Lightweight (data fusion only)

Communication Modes

Streaming API

Mode	Description
H264	AVCHD encoding, wider GPU support
H265	HEVC encoding, better compression, requires Pascal+ GPU

Port: Even number (default 30000), uses RTP protocol.

Fusion API

Mode	Description
INTRA_PROCESS	Same machine, shared memory (zero-copy)
LOCAL_NETWORK	Different machines, RTP over network

Port: Default 30000, configurable per camera.

Bandwidth Requirements

Streaming (H265 Compressed Video)

Resolution	FPS	Bitrate per Camera	4 Cameras
2K	15	7 Mbps	28 Mbps
HD1080	30	11 Mbps	44 Mbps
HD720	60	6 Mbps	24 Mbps
HD1200	30	~12 Mbps	~48 Mbps

Fusion (Metadata Only)

Data Type	Size per Frame	@ 30 FPS	4 Cameras
Body (18 keypoints)	~2 KB	~60 KB/s	~240 KB/s
Object detection	~1 KB	~30 KB/s	~120 KB/s
Pose/Transform	~100 B	~3 KB/s	~12 KB/s

Fusion uses 100-1000x less bandwidth than Streaming.

The Architectural Gap

What You CAN Do

Scenario	API	Edge Computes	Host Receives
Remote camera access	Streaming	Video encoding	Video → computes depth/tracking
Multi-camera fusion	Fusion	Depth + tracking	Metadata only (bodies, poses)
Local processing	Direct	Everything	N/A (same machine)

What You CANNOT Do

There is no ZED SDK mode for:

┌─────────────────────┐                      ┌─────────────────────┐
│   Edge (Jetson)     │                      │   Host (Server)     │
│                     │                      │                     │
│  ┌───────────────┐  │     Depth Map /      │  ┌───────────────┐  │
│  │  ZED Camera   │  │     Point Cloud      │  │  Receive      │  │
│  └───────┬───────┘  │                      │  │  Depth/PC     │  │
│          │          │         ???          │  └───────┬───────┘  │
│  ┌───────▼───────┐  │ ─────────────────X─► │          │          │
│  │  Neural Depth │  │   NOT SUPPORTED      │  ┌───────▼───────┐  │
│  │  (NN on GPU)  │  │                      │  │  Further      │  │
│  └───────┬───────┘  │                      │  │  Processing   │  │
│          │          │                      │  └───────────────┘  │
│  ┌───────▼───────┐  │                      │                     │
│  │  Point Cloud  │──┘                      │                     │
│  └───────────────┘                         │                     │
└─────────────────────┘                      └─────────────────────┘

❌ Edge computes depth → streams depth map → Host receives depth
❌ Edge computes point cloud → streams point cloud → Host receives point cloud

Why This Architecture?

1. Bandwidth Economics

Point cloud streaming would require significantly more bandwidth than video:

Data Type	Size per Frame (HD1080)	@ 30 FPS
Raw stereo video	~12 MB	360 MB/s
H265 compressed	~46 KB	11 Mbps
Depth map (16-bit)	~4 MB	120 MB/s
Point cloud (XYZ float)	~12 MB	360 MB/s

Compressed depth/point cloud is lossy and still large (~50-100 Mbps).

2. Compute Distribution Philosophy

ZED SDK follows: "Compute entirely at edge OR entirely at host, not split"

Scenario	Solution
Low bandwidth, multi-camera	Fusion (edge computes all, sends metadata)
High bandwidth, single camera	Streaming (host computes all)
Same machine	INTRA_PROCESS (shared memory)

3. Fusion API Design Goals

From Stereolabs documentation:

"The Fusion module is lightweight (in computation resources requirements) compared to the requirements for camera publishers."

The Fusion receiver is intentionally lightweight because:

It only needs to fuse pre-computed metadata
It handles time synchronization and geometric calibration
It can run on modest hardware while edges do heavy compute

4. Product Strategy

Stereolabs sells:

ZED cameras (hardware)
ZED Box (edge compute appliances)
ZED Hub (cloud management)

The Fusion API encourages purchasing ZED Boxes for edge compute rather than building custom streaming solutions.

Workarounds for Custom Point Cloud Streaming

If you need to stream point clouds from edge to host (outside ZED SDK):

Option 1: Custom Compression + Streaming

// On edge: compute point cloud, compress, send
sl::Mat point_cloud;
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);

// Compress with Draco/PCL octree
std::vector<uint8_t> compressed = draco_compress(point_cloud);

// Send via ZeroMQ/gRPC/raw UDP
socket.send(compressed);

Option 2: Depth Map Streaming

// On edge: get depth, compress as 16-bit PNG, send
sl::Mat depth;
zed.retrieveMeasure(depth, MEASURE::DEPTH);

// Compress as lossless PNG
cv::Mat depth_cv = slMat2cvMat(depth);
std::vector<uint8_t> png;
cv::imencode(".png", depth_cv, png);

// Send via network
socket.send(png);

Bandwidth Estimate for Custom Streaming

Method	Compression	Bandwidth (HD1080@30fps)
Depth PNG (lossless)	~4:1	~240 Mbps
Depth JPEG (lossy)	~20:1	~48 Mbps
Point cloud Draco	~10:1	~100 Mbps

10 Gbps Ethernet could handle 4 cameras with custom depth streaming.

Recommendations

Use Case	Recommended API
Single camera, remote development	Streaming
Multi-camera body tracking	Fusion
Multi-camera 360° coverage	Fusion
Custom point cloud pipeline	Manual (ZeroMQ + Draco)
Low latency, same machine	INTRA_PROCESS

Code Examples

Streaming Sender (Edge)

sl::StreamingParameters stream_params;
stream_params.codec = sl::STREAMING_CODEC::H265;
stream_params.bitrate = 12000;
stream_params.port = 30000;

zed.enableStreaming(stream_params);

while (running) {
    zed.grab(); // Encodes and sends frame
}

Streaming Receiver (Host)

sl::InitParameters init_params;
init_params.input.setFromStream("192.168.1.100", 30000);

zed.open(init_params);

while (running) {
    if (zed.grab() == ERROR_CODE::SUCCESS) {
        // Full ZED SDK available - depth, tracking, etc.
        zed.retrieveMeasure(depth, MEASURE::DEPTH);
        zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
    }
}

Fusion Sender (Edge)

// Enable body tracking
zed.enableBodyTracking(body_params);

// Start publishing metadata
sl::CommunicationParameters comm_params;
comm_params.setForLocalNetwork(30000);
zed.startPublishing(comm_params);

while (running) {
    if (zed.grab() == ERROR_CODE::SUCCESS) {
        zed.retrieveBodies(bodies); // Computes and publishes
    }
}

Fusion Receiver (Host)

sl::Fusion fusion;
fusion.init(init_fusion_params);

sl::CameraIdentifier cam1(serial_number);
fusion.subscribe(cam1, comm_params, pose);

while (running) {
    if (fusion.process() == FUSION_ERROR_CODE::SUCCESS) {
        fusion.retrieveBodies(fused_bodies); // Already computed by edges
    }
}

Summary

The ZED SDK architecture forces a choice:

Streaming: Edge sends video → Host computes depth (NN inference on host)
Fusion: Edge computes depth → Sends metadata only (no point cloud)

There is no built-in support for streaming computed depth maps or point clouds from edge to host. This is by design for bandwidth efficiency and to encourage use of ZED Box edge compute products.

For custom depth/point cloud streaming, you must implement your own compression and network layer outside the ZED SDK.

15 KiB Raw Blame History