Introduction to onvif protocol, RTP, RTCP, RTSP

ONVIF stands for Open Network Video Interface Forum. In embedded systems, the ONVIF protocol is an open standard protocol for interconnection and communication between network video devices.

The ONVIF protocol is designed to facilitate interoperability and integration between network video equipment from different vendors. It defines a set of standardized network interfaces and protocols that enable video surveillance cameras, network video recorders (NVRs) and other related devices to communicate and work together.

By using the ONVIF protocol, various brands of network video equipment can achieve the following functions:

  1. Discovery: Devices can automatically discover and join the network.

  2. PTZ control: Can control the PTZ and lens installed on the network camera.

  3. Video streaming: can transmit real-time audio and video streams or stored media.

  4. Recording and Playback: Recording, storage and playback of media can be performed.

  5. Security: Support authentication and encryption for network video devices.

  6. Configuration and management: Provide a unified interface and protocol for configuring and managing network video devices.

The ONVIF protocol is based on open standards and is jointly developed and supported by various manufacturers. It reduces the difficulty of integration between devices of different brands and enables users to manage and control multiple devices in a unified environment, thereby improving the interoperability and scalability of network video systems.

It should be noted that the ONVIF protocol is not only applicable to embedded systems, but also widely used in various network video surveillance systems and equipment, including security systems, smart home systems, remote monitoring, etc.


In embedded actual development, the network communication protocol for acquiring, decoding and encoding video streams usually uses the Real-time Transport Protocol (RTP).

RTP is a protocol for real-time transmission of audio and video data. It provides functions such as time synchronization, stream sequence and packet loss recovery to ensure the real-time and integrity of audio and video data. RTP is often used together with Real-time Transport Control Protocol (RTCP), which is used to monitor the quality of data transmission and the interaction between participants.

When using RTP for video stream acquisition, decoding and encoding, the common process is as follows:

  1. Video stream acquisition: Embedded devices acquire raw video stream data from cameras or other video sources.

  2. Video decoding: Use an appropriate video decoder to decode the raw video stream and convert it into a recognizable video format, such as H.264, H.265, etc.

  3. RTP packaging: Package the decoded video data according to the requirements of the RTP protocol, including assigning a timestamp, setting the sequence number and other necessary information for each video frame.

  4. Network transmission: Use UDP or TCP-based transport layer protocol to send the packaged RTP data to the receiving end through the network.

  5. Receiving end parsing: After receiving the RTP data, the receiving end parses the RTP header information, including timestamp, serial number, etc., so as to perform correct data reorganization and playback.

  6. Video encoding: The receiving end may need to encode the received video stream to suit the needs of the target device or network.

  7. Video playback or storage: Finally, the decoded or encoded video data can be played in real time through the player, or stored in a local file for subsequent processing.

RTP is usually used in conjunction with other protocols, such as Real-time Transport Control Protocol (RTCP) for quality monitoring and interaction, and Session Description Protocol (Session Description Protocol, SDP) for describing media session parameters.

It should be noted that in embedded systems, due to limited resources and performance requirements, specific lightweight decoding libraries and encoding libraries may be selected for video stream decoding and encoding to meet the requirements of the embedded platform.


RTCP is the abbreviation of Real-time Transport Control Protocol. It is a control protocol for real-time multimedia streaming, often used in combination with RTP (Real-time Transport Protocol).

The main role of RTCP is to monitor and control the quality of real-time transmission sessions, and to provide interactive capabilities with participants. It collects statistics, feedback, and application-level messages by periodically sending control packets. RTCP is included in an RTP session, transmitted on the same IP address and port with different odd and even destination addresses.

RTCP features include:

  1. Participant monitoring: RTCP monitors the receiving and sending status of participants by sending Receiver Report (Receiver Report) and Sender Report (Sender Report). Receiver reports provide information about the quality of received media streams, and sender reports provide information about sent media streams.

  2. Latency and Jitter Calculation: RTCP can collect and report statistics about network latency and jitter conditions so that participants can make adjustments and optimizations based on this information.

  3. Participant interaction: RTCP supports interaction between participants, allowing senders and receivers to send application-level messages to other participants.

  4. Session description information: RTCP can include session description protocol (Session Description Protocol, SDP) information, which is used to describe the parameters and configuration of the media session.

Through the function of RTCP, participants can dynamically adjust and optimize according to network conditions and real-time transmission requirements to provide better real-time multimedia transmission quality and user experience.

It should be noted that RTP and RTCP are used together, RTP is responsible for the actual media data transmission, and RTCP is responsible for monitoring and controlling the quality and interaction of transmission. They are all built on top of UDP (or other transport layer protocol) and use the same port pair.


RTSP stands for Real-Time Streaming Protocol. In embedded systems, RTSP is a protocol for controlling and managing real-time streaming sessions.

RTSP is an application layer protocol designed to support the transmission of audio, video and other multimedia data and the control of streaming media servers. It allows the client to establish a connection with the streaming media server, and control the playback, pause, stop, fast forward and other operations of the media by sending control commands. At the same time, RTSP defines some message types and status codes, which are used to transmit information and operation results related to streaming media.

In embedded systems, RTSP is often used to realize real-time streaming media transmission, such as video surveillance systems, IP cameras, network TV boxes, etc. By using the RTSP protocol, the client can request media streams or control commands, and interact with the streaming media server. The streaming media server can provide corresponding audio and video data according to the client's request, and respond to the client's control instructions.

It should be noted that RTSP itself is only responsible for the control and management of media streams, and does not involve actual media data transmission. The actual media data is usually transmitted using other protocols, such as RTP (Real-time Transport Protocol). Therefore, in practical applications, RTSP is usually used together with RTP protocol to realize a complete real-time streaming media transmission system.

Guess you like

Origin blog.csdn.net/FLM19990626/article/details/131409279
Recommended