Video Streaming
As soon as Velovision Rearview powers on, it simultaneously starts recording videos to its onboard microSD card, and also provides a live stream that we can connect to.
The video stream is on port 5000.
Video Streaming Demo
In the following Python demonstration, OpenCV’s cv2.VideoCapture
contains all the magic to parse H.264 video from a TCP stream.
import cv2import sys
stream_url = 'tcp://velovision-rearview.local:5000'
try: cap = cv2.VideoCapture(stream_url) if not cap.isOpened(): print('Error: Could not open video stream.') sys.exit()
while True: ret, frame = cap.read() if not ret: print('Error: Could not read frame.') break cv2.imshow('Video Stream', frame) # Press 'q' to quit if cv2.waitKey(1) & 0xFF == ord('q'): breakexcept Exception as e: print(f'An error occurred: {e}') sys.exit()
cap.release()cv2.destroyAllWindows()
Run this python script with uv
.
uv run --with 'opencv-python' display_h264_over_tcp.py
You should see a window with the live video stream from the camera. Note that the default OpenCV & Python video viewer isn’t optimized for latency, so there may be a noticeable delay.
Deep Dive into H.264-over-TCP
Our results from hundreds of experimentation hours and testing:
- We chose H.264 over H.265 because the Raspberry Pi Zero 2W only has hardware acceleration for H.264. Hardware accleration reduces CPU usage and improves battery life.
- H.264 is more mature and widely supported by video players and libraries compared to newer standards like H.265, AV1, or VP9.
- TCP was shown to be more reliable than UDP for streaming video over a local wifi network. Dropped UDP packets resulted in choppy and artifact-ridden video.
- We don’t use any encapsulation (RTSP, RTMP, HLS, etc.) mainly because of latency issues and the complexity of setting up a server and client. HLS, for example, was designed for ‘streaming’ in the sense of a live broadcast (sub minute delay), not for low-latency (sub 100ms) video transmission.
- We also avoid muxers like Matroska because they add unncessary overhead and software dependencies.
For an even deeper dive into the development history of the video encoding pipeline, see VIDEO_ENCODING.md on the repository.
The result is a barebones H.264-over-TCP stream that can be parsed by any platform that supports receiving TCP packets and parsing H.264 video.
This simplicity came in handy when we were developing the iOS app for Velovision Rearview because we had no good choices for low-latency video streaming libraries. Thanks to the simplicity of our stream, we were able to write a custom H.264 parser in Swift and display the video stream with minimal latency.
Parsing TCP Packets into H.264 Video Explained
A H.264 video stream consists of Network Abstraction Layer Units (NALU), which are then reassembled into frames on the client side.
Each NALU, delineated by a specific byte sequence, can be of different types, such as I-frames (which are closest to a full image), P-frames (which contain only the difference from a past frame), “SPS” and “PPS” (which contains information about the video stream like resolution, frame rate, and more.), etc.
The following debug tool helps us visualize the NALUs that are being sent by the camera to the client.
It was initially developed to assist with debugging the camera’s H.264 stream parameters, but it can also be used as a reference for implementing your own H.264-over-TCP client.
import socket
NALU_TYPE_NAMES = { 1: "Coded slice of a non-IDR picture", 5: "Coded slice of an IDR picture", 6: "Supplemental enhancement information (SEI)", 7: "Sequence parameter set", 8: "Picture parameter set", 9: "Access unit delimiter",}
def find_next_nalu(buffer): """ Find the next NALU in the buffer. Returns the NALU data and the remaining buffer. """ start_code = b'\x00\x00\x00\x01' start_pos = buffer.find(start_code, 1) # Find the start of the next NALU if start_pos == -1: return None, buffer # No complete NALU found
nalu_data = buffer[:start_pos] # Extract NALU remaining_buffer = buffer[start_pos:] # Remaining data return nalu_data, remaining_buffer
def get_nalu_type(nalu): """ Get the type of the NALU. """ if len(nalu) > 4: nalu_type_code = nalu[4] & 0x1F return NALU_TYPE_NAMES.get(nalu_type_code, f"Unknown ({nalu_type_code})") return "Unknown"
def verify_h264_stream(host, port): """ Connect to the TCP server and verify the H.264 stream. """ with socket.create_connection((host, port)) as sock: print(f"Connected to {host}:{port}") buffer = b'' try: while True: # Receive data from the server data = sock.recv(4096) if not data: break
buffer += data # Append new data to buffer
# Process complete NALUs in the buffer nalu, buffer = find_next_nalu(buffer) while nalu: nalu_type = get_nalu_type(nalu) print(f"Found NALU, Type: {nalu_type}, Length: {len(nalu)} bytes") nalu, buffer = find_next_nalu(buffer)
except Exception as e: print(f"Error: {e}")
if __name__ == "__main__": HOST = 'velovision-rearview.local' # Replace with the appropriate host PORT = 5000 # Replace with the appropriate port verify_h264_stream(HOST, PORT)
Save and run this script on your computer to visualize the NALUs being sent by the camera.
As before, we’ll use uv
.
uv run debug_h264_over_tcp.py
Its output looks something like:
Found NALU, Type: Sequence parameter set, Length: 38 bytesFound NALU, Type: Picture parameter set, Length: 9 bytesFound NALU, Type: Coded slice of an IDR picture, Length: 24655 bytesFound NALU, Type: Access unit delimiter, Length: 6 bytesFound NALU, Type: Coded slice of a non-IDR picture, Length: 7726 bytesFound NALU, Type: Access unit delimiter, Length: 6 bytesFound NALU, Type: Coded slice of a non-IDR picture, Length: 3781 bytesFound NALU, Type: Access unit delimiter, Length: 6 bytes