Detailed explanation of RTP/RTSP/RTCP protocol

Link: https://www.zhihu.com/question/20278635/answer/14590945

The difference between RTP/RTSP/RTCP can be summed up in a simple sentence: RTSP initiates/terminates streaming media, RTP transmits streaming media data, and RTCP controls and synchronizes RTP.
The reason why these are a little unclear before is because there is no requirement for RTCP in the CTC standard, so there is no relevant part in the code of the standard RTSP. In the private RTSP code, related control, synchronization, etc. are implemented in the RTP Header by extending the definition .
In addition, RFC3550 can be regarded as an upgrade document of RFC1889, just look at RFC3550.
  • RTP: Real-time Transport Protocol (Real-time Transport Protocol)
    • RTP/RTCP is the protocol that actually transmits the data
    • RTP transmits audio/video data, if it is PLAY, the server sends it to the client, if it is RECORD, it can be sent to the server by the client
    • The entire RTP protocol consists of two closely related parts: RTP data protocol and RTP control protocol (ie RTCP)
  • RTSP: Real Time Streaming Protocol (RTSP)
    • RTSP requests mainly include DESCRIBE, SETUP, PLAY, PAUSE, TEARDOWN, OPTIONS, etc. As the name suggests, it can be used for dialogue and control.
    • During the RTSP dialog, SETUP can determine the port used by RTP/RTCP, PLAY/PAUSE/TEARDOWN can start or stop RTP sending, etc.
  • RTCP:
    • RTP/RTCP is the protocol that actually transmits the data
    • RTCP includes Sender Report and Receiver Report, used for audio/video synchronization and other purposes, is a control protocol
The following is an overview of each protocol:
1. RTP data protocol
The RTP data protocol is responsible for packetizing streaming media data and realizing real-time transmission of media streams. Each RTP datagram consists of two headers (Header) and payload (Payload). Parts, where the meaning of the first 12 bytes of the header is fixed, and the payload can be audio or video data. The header format of the RTP datagram is shown in Figure 1: <noscript><img src="https://pic1.zhimg.com/b6b12283d34ce6ed1ffa6ad4863ed05c_b.jpg" data-rawwidth="487" data-rawheight="153" class="origin_image zh-lightbox-thumb" width="487" data-original="https://pic1.zhimg.com/b6b12283d34ce6ed1ffa6ad4863ed05c_r.jpg"></noscript>
The more important fields and their meanings as follows:

  • CSRC count (CC): Indicates the number of CSRC identifiers. The CSRC identifier follows the RTP fixed header and is used to indicate the source of the RTP datagram. The RTP protocol allows multiple data sources to exist in the same session, and they can be combined into one data source through the RTP mixer. For example, a CSRC list can be generated to represent a conference call that combines all speakers' voice data into one RTP data source through an RTP mixer.
  • Payload Type (PT): Indicates the format of the RTP payload, including the coding algorithm used, sampling frequency, bearer channel, etc. For example, Type 2 indicates that the RTP data packet carries voice data encoded with the ITU G.721 algorithm, the sampling frequency is 8000 Hz, and the single channel is used.
  • Sequence number: It is used to provide the receiver with a method of detecting data loss, but how to deal with the lost data is the application's own business, and the RTP protocol itself is not responsible for data retransmission.
  • Timestamp: The sampling time of the first byte in the payload is recorded. The receiver can use the timestamp to determine whether the arrival of the data is affected by delay jitter, but how to compensate for the delay jitter is up to the application itself.
From the format of the RTP datagram, it is not difficult to see that it contains information such as the type, format, serial number, timestamp, and whether there is additional data of the transmission media, which provide a corresponding basis for real-time streaming media transmission. The purpose of the RTP protocol is to provide end-to-end transmission services for real-time data (such as interactive audio and video), so there is no concept of connection in RTP , it can be built on the underlying connection-oriented or connectionless transmission protocol. ; RTP does not depend on a special network address format, but only needs the underlying transport protocol to support Framing and Segmentation; in addition, RTP itself does not provide any reliability mechanism , which must be determined by The transport protocol or the application itself guarantees it. In typical applications, RTP is generally implemented as part of the application program on top of the transport protocol, as shown in Figure 2: <noscript><img src="https://pic1.zhimg.com/77c040a300f8bbb74fc6796584939fdc_b .jpg" data-rawwidth="191" data-rawheight="191" class="content_image" width="191"></noscript>
2. RTCP control protocol The
RTCP control protocol needs to be used together with the RTP data protocol. When an application program starts an RTP session, it will occupy two ports at the same time, which are used for RTP and RTCP respectively. RTP itself does not provide reliable guarantees for in-order transmission of data packets, nor does it provide flow control and congestion control, which are all done by RTCP . Usually, RTCP uses the same distribution mechanism as RTP to periodically send control information to all members in the session. By receiving these data, the application obtains relevant information of session participants, as well as feedback information such as network status and packet loss probability. , so that the quality of service can be controlled or the network status can be diagnosed.

The functions of the RTCP protocol are implemented through different RTCP datagrams, which mainly include the following types:

  • SR: Sender report, the so-called sender refers to the application or terminal that sends out the RTP datagram, and the sender can also be the receiver.
  • RR: Receiver report, the so-called receiver refers to an application or terminal that only receives but does not send RTP datagrams.
  • SDES: source description, the main function is to serve as the carrier of the identification information of session members, such as user name, email address, phone number, etc., and also has the function of conveying session control information to session members.
  • BYE: Notify to leave, the main function is to indicate that one or several sources are no longer valid, that is, to notify other members of the session that they will exit the session.
  • APP: Defined by the application itself, it solves the scalability problem of RTCP and provides great flexibility for the implementer of the protocol.
RTCP datagrams carry the necessary information for quality of service monitoring, which can dynamically adjust the quality of service and effectively control network congestion. Since the RTCP datagram adopts the multicast mode, all members in the session can learn the current situation of other participants through the control information returned by the RTCP datagram.
In a typical application scenario, the application program that sends the media stream will periodically generate a sender report SR, which contains synchronization information between different media streams, and the count of datagrams and bytes that have been sent. The terminal can estimate the actual data transmission rate based on this information. On the other hand, the receiver will send a receiver report RR to all known senders. The RTCP datagram contains important information such as the maximum sequence number of received datagrams, the number of lost datagrams, delay jitter, and timestamp. Based on this information, the sending application can estimate the round-trip delay, and can dynamically adjust the sending rate according to the datagram loss probability and delay jitter to improve network congestion, or smoothly adjust the application's QoS according to network conditions.

3. RTSP real-time streaming protocol
As an application layer protocol, RTSP provides an extensible framework, and its significance lies in making the control and on-demand of real-time streaming media data possible. In general, RTSP is a streaming media presentation protocol, which is mainly used to control the transmission of data with real-time characteristics, but it does not transmit data itself, but must rely on some services provided by the underlying transmission protocol. RTSP can provide operations such as play, pause, fast forward, etc. to streaming media, it is responsible for defining specific control messages, operation methods, status codes, etc., and also describes the interaction with RTP (RFC2326) .

RTSP made a lot of reference to the HTTP/1.1 protocol when it was formulated, and even many descriptions were exactly the same as HTTP/1.1 . The reason why RTSP deliberately uses the syntax and operations similar to HTTP/1.1 is to be compatible with the existing Web infrastructure to a large extent. Because of this, most of the extension mechanisms of HTTP/1.1 can be directly introduced into RTSP.
The set of media streams controlled by RTSP can be defined by a presentation description. The so-called presentation refers to the set of one or more media streams provided by the streaming media server to the client, while the presentation description contains the various media in a presentation. Stream related information, such as data encoding/decoding algorithm, network address, content of media stream, etc.
Although the RTSP server also uses an identifier to distinguish each stream connection session (Session), the RTSP connection is not bound to the transport layer connection (such as TCP, etc.), which means that during the entire RTSP connection, the RTSP user can open or Closes multiple reliable transport connections to the RTSP server to make RTSP requests. In addition, RTSP connections can also be based on connectionless transport protocols (such as UDP, etc.).

The RTSP protocol currently supports the following operations:

  • Retrieve Media: Allows the user to submit a presentation description to the media server via HTTP or other methods. If the representation is multicast, the representation description includes the multicast address and port number for the media stream; if the representation is unicast, only the destination address should be provided in the representation description for security.
  • Invite to join: A media server can be invited to an ongoing meeting, or play back media in a presentation, or record all or a subset of media in a presentation, ideal for distributed teaching.
  • Add Media: Notifies users of newly added available media streams, especially useful for live lectures. Similar to HTTP/1.1, RTSP requests can also be handled by proxies, channels, or caches.

Later supplement XMPP, SIP protocol introduction

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326986567&siteId=291194637