2. Video Streaming

Video Streaming from Velovision Rearview

As soon as Velovision Rearview powers on, it simultaneously starts recording videos to its onboard microSD card, and also provides a live stream that we can connect to.

By convention, the video stream is on port 5000.

Video Streaming Demo

In the following Python demonstration, OpenCV's cv2.VideoCapture contains all the magic to parse H.264 video from a TCP stream.

import cv2
import sys
stream_url = 'tcp://velovision-rearview.local:5000'
    cap = cv2.VideoCapture(stream_url)
    if not cap.isOpened():
        print('Error: Could not open video stream.')
    while True:
        ret, frame = cap.read()
        if not ret:
            print('Error: Could not read frame.')
        cv2.imshow('Video Stream', frame)
        # Press 'q' to quit
        if cv2.waitKey(1) & 0xFF == ord('q'):
except Exception as e:
    print(f'An error occurred: {e}')

Save and run this script on your computer to display the video stream from the Velovision Rearview camera.

python3 display_h264_over_tcp.py

You should see a window with the live video stream from the camera.

Read on to learn more about video streaming, or continue to Lesson 3. Video Recording, the final lesson in the tutorial.

H.264-over-TCP Explained

H.264 is a popular video encoding and compression method. We chose it over competing standards for a few reasons.

Why H.264?

  • We chose it over H.265 because the Raspberry Pi Zero 2W only has hardware acceleration for H.264. Hardware accleration reduces CPU usage and improves battery life.
  • H.264 is more mature and widely supported by video players and libraries compared to newer standards like H.265, AV1, and VP9.

Why TCP?

  • TCP was shown to be more reliable than UDP for streaming video over a local wifi network. Dropped UDP packets resulted in choppy and artifact-ridden video.

Other Considerations

  • We don't use any encapsulation (RTSP, RTMP, HLS, etc.) mainly because of latency issues and the complexity of setting up a server and client. HLS, for example, was designed for 'streaming' in the sense of a live broadcast (sub minute delay), not for low-latency (sub 100ms) video transmission.
  • We also avoid muxers like Matroska because they add unncessary overhead and software dependencies.

For an even deeper dive into the development history of the video encoding pipeline, see VIDEO_ENCODING.md (opens in a new tab) on the repository.

The result is a barebones H.264-over-TCP stream that can be parsed by any platform that supports receiving TCP packets and parsing H.264 video.

This simplicity came in handy when we were developing the iOS app for Velovision Rearview because we had no good choices for low-latency video streaming libraries. Thanks to the simplicity of our stream, we were able to write a custom H.264 parser in Swift and display the video stream with minimal latency.

Parsing TCP Packets into H.264 Video Explained

At the simplest level, a video is a series of images. To increase speed and efficiency, video encoders can intelligently encode only the differences between frames. So instead of sending the full image every time, H.264 sends something called 'NALU's, which are then reassembled into frames on the client side.

"NALU" stands for "Network Abstraction Layer Unit." It's a fancy way of saying "a chunk of video data." NALUs are the basic building blocks of H.264 video. They can be of different types, such as I-frames (which are closest to a full image), P-frames (which contain only the difference from a past frame), "SPS" and "PPS" (which contains information about the video stream like resolution, frame rate, and more.), etc.

The following debug tool helps us visualize the NALUs that are being sent by the camera to the client.

It was initially developed to assist with debugging the camera's H.264 stream parameters, but it can also be used as a reference for implementing your own H.264-over-TCP client.

import socket
    1: "Coded slice of a non-IDR picture",
    5: "Coded slice of an IDR picture",
    6: "Supplemental enhancement information (SEI)",
    7: "Sequence parameter set",
    8: "Picture parameter set",
    9: "Access unit delimiter",
def find_next_nalu(buffer):
    Find the next NALU in the buffer.
    Returns the NALU data and the remaining buffer.
    start_code = b'\x00\x00\x00\x01'
    start_pos = buffer.find(start_code, 1)  # Find the start of the next NALU
    if start_pos == -1:
        return None, buffer  # No complete NALU found
    nalu_data = buffer[:start_pos]  # Extract NALU
    remaining_buffer = buffer[start_pos:]  # Remaining data
    return nalu_data, remaining_buffer
def get_nalu_type(nalu):
    Get the type of the NALU.
    if len(nalu) > 4:
        nalu_type_code = nalu[4] & 0x1F
        return NALU_TYPE_NAMES.get(nalu_type_code, f"Unknown ({nalu_type_code})")
    return "Unknown"
def verify_h264_stream(host, port):
    Connect to the TCP server and verify the H.264 stream.
    with socket.create_connection((host, port)) as sock:
        print(f"Connected to {host}:{port}")
        buffer = b''
            while True:
                # Receive data from the server
                data = sock.recv(4096)
                if not data:
                buffer += data  # Append new data to buffer
                # Process complete NALUs in the buffer
                nalu, buffer = find_next_nalu(buffer)
                while nalu:
                    nalu_type = get_nalu_type(nalu)
                    print(f"Found NALU, Type: {nalu_type}, Length: {len(nalu)} bytes")
                    nalu, buffer = find_next_nalu(buffer)
        except Exception as e:
            print(f"Error: {e}")
if __name__ == "__main__":
    HOST = 'velovision-rearview.local'  # Replace with the appropriate host
    PORT = 5000       # Replace with the appropriate port
    verify_h264_stream(HOST, PORT)

Save and run this script on your computer to visualize the NALUs being sent by the camera.

python3 debug_h264_over_tcp.py

Its output looks something like:

Found NALU, Type: Sequence parameter set, Length: 38 bytes
Found NALU, Type: Picture parameter set, Length: 9 bytes
Found NALU, Type: Coded slice of an IDR picture, Length: 24655 bytes
Found NALU, Type: Access unit delimiter, Length: 6 bytes
Found NALU, Type: Coded slice of a non-IDR picture, Length: 7726 bytes
Found NALU, Type: Access unit delimiter, Length: 6 bytes
Found NALU, Type: Coded slice of a non-IDR picture, Length: 3781 bytes
Found NALU, Type: Access unit delimiter, Length: 6 bytes

Summary of Lesson 2: Video Streaming

In this lesson, we learned about the two modes of operation of the Velovision Rearview camera: streaming and recording. We also learned how to switch the camera to streaming mode and how to display the video stream on a computer.

In the next and final lesson, we will learn about the software architecture of the Velovision Rearview camera.

