Live transmission protocol SRT

1. Why choose SRT?

Undoubtedly, the largest live streaming protocol in existence today is RTMP, but with the continuous development of new technologies and the continuous expansion of usage scenarios, it will feel a little overwhelming to continue to use RTMP. The defects of the RTMP protocol mainly have the following four aspects:

RTMP protocol defects

First of all, the RTMP protocol is too old, and the last update was in 2012; at the same time, there is no official definition of video formats such as HEVC / H.265 / AV1, so that it needs to be defined by domestic CDN manufacturers.

The RTMP connection process is relatively long, because RTMP is based on TCP (TCP has three handshakes), in addition to it, there are three handshakes from c0 / s0 to c2 / s2, plus connection, createstream, play / publish, in general It is said that RTMP needs 9 sessions to complete a connection, which is barely acceptable for the PC side, and the requirements for the network quality of the mobile terminal are very high.

RTMP's congestion control completely depends on the transport layer, that is, it completely depends on the TCP transport layer's congestion control algorithm for congestion management, and there is little optimization; RTMP itself is based on TCP transmission and cannot provide a bandwidth adaptive algorithm.

In this context, many manufacturers began to provide some new live broadcast protocols for industry reference. Such as QUIC, SRT, etc. This time we will focus on the characteristics and applications of SRT.

Characteristics of SRT protocol

Haivision teamed up with Wowza to propose an SRT protocol for audio and video real-time based on UDT. SRT is a UDT-based protocol (UDT protocol is a UDP-based transmission protocol, and four versions have been submitted in the IETF). It has a very good mechanism for packet loss retransmission. The control messages for packet loss retransmission are very rich. It also supports ACK, ACKACK, NACK.

We all know that audio and video are very concerned about time, and SRT time-based packet transmission makes it have a good ability to prevent traffic bursts. SRT provides rich congestion control statistics to the upper layer, including RTT, packet loss rate, inflight, send / receive bitrate, etc. Using this rich information, we can implement bandwidth prediction and perform adaptive dynamic coding and congestion control at the coding layer according to changes in bandwidth.

2. Principle analysis of SRT protocol

2.1 Basic idea of ​​SRT

The above picture can cover the basic idea of ​​SRT: comparing the encoded video and audio code stream (green broken line "Source" on the left) with the code stream transmitted through the public network (red broken line "NetworkTransmission"), you can see the encoded source The video and audio code stream has a fixed frame interval and a variable bit rate with certain characteristics. However, after the public network transmits the code stream, the frame interval becomes unstable and the code rate characteristics are completely changed. Decoding such a signal is a This is a very difficult challenge, and it can't even be decoded.

If the SRT protocol with error correction is used for public Internet transmission, although the encoded video and audio code stream becomes unstable through the public network transmission frame interval, because the SRT protocol package contains accurate time stamps, the decoding receiver can pass This time stamp reproduces a fixed frame interval. More importantly, by specifying the amount of delay, the send buffer (receiver buffer) and the receive buffer (receiver buffer) are defined at the same time, the feedback signal from the receiver (can be understood as backward error correction) through a series of The settings, error correction, and flow control make the encoded video and audio stream have almost the same bit rate characteristics as the original stream.

The data processed by the Encoder will be input into the send buffer. The send buffer will check the time of the data, that is, it will be encoded in a certain time sequence and sent to the receivebuffer through the Internet. The receive buffer will be processed according to the timestamp on the message. To ensure that the generated data is basically consistent with the source code stream.

The idea of ​​the entire SRT protocol is based on coding, and has the characteristics of anti-packet loss, anti-congestion, anti-jitter and so on.

2.2 SRT message basics

Interactive process

SRT's message foundation, that is, its interaction process is in turn the above five steps: handshake, important information (Capability), media (Media), control information (Control) and shutdown (Shutdown). Unlike the nine-way handshake of RTMP, the SRT connection establishment process requires only two RTTs. That is Handshake Request and Capability Announce. The Handshake Request between Caller and Listener is a simplified handshake to improve efficiency. After the handshake, Caller and Listener exchange important information. The first key step of the entire SRT is that Caller and Listener exchange The amount of delay and the size of the buffer; then the transfer of media data between the Caller and the Listener begins, and the precise time stamp encapsulated to restore the frame interval is also transmitted; the second key step of the SRT is the Listener to caller synchronization control information , To combat a series of sudden situations such as jitter and packet loss in the public network; finally, the transmission is closed after the media data transmission is completed.

Data message

The SRT message format is relatively simple, divided into data messages and control messages. The above figure shows the data structure of the data packet. Observing the structure, we can find that the upper two layers are the UDP part, and below this is the UDT part. If it is initialized to 0, it is regarded as a data message. FF represents the sequence of packets, 0b10 is the first fragmented message, 0b00 is the fragmented middle message, 0b01 is the last fragmented message, and 0b11 means that a single message is not fragmented. KK indicates whether it is encrypted, R indicates whether it is a retransmitted message, Timestamp indicates a timestamp, and Destination Socket ID is a socket id defined by SRT. From this data structure, we can see that the SRT data packet has an accurate 32-bit time stamp, the package sequence number is absolutely sufficient, and the structure is very simple.

The following figure shows the data structure of the control message. The table on the right shows the general types of control messages. Among them, ACK, NACK, ACKACK, etc. are worth paying attention to.

Control message

2.3 SRT packet loss retransmission

2.3.1 Send/Receive Buffer

SRT has both receive buffer and send buffer at the receiving and sending ends. The send buffer is sent strictly according to the timestamp interval, and the timer defaults to 10 milliseconds.

2.3.2 ACK

The most common packet loss retransmission is the ACK mechanism. Taking the above picture as an example, suppose that the sender buffer sends five data packets: 1, 2, 3, 4, 5 to the receiver buffer, and the receiver buffer will send an ACK to the sender buffer after successfully receiving these packets ——Affirmative response has indicated that the data packet was successfully received, and the sender received ACK ——Reclaim space after acknowledgment, delete the five data packets 1, 2, 3, 4, 5 and prepare to send data packet 6.

The send buffer processes the data exactly according to the timestamp interval. At certain intervals (related to ACKs, ACKACKs and Round Trip Time), the receiver sends ACK to the sender, so that the sender removes the received ack packet from the sender buffer. The space points in the buffer will be recycled. Latency Window is the delay window in the receivebuffer at the receiving end, and its role is to send data little by little according to the time stamp. And strictly according to the time stamp detection, the default detection period is 10 milliseconds.

2.3.3 ACK/ACKACK/RTT

After receiving the data packet, the receiving end feeds back the ACK indicating successful reception. For example, on the left side of the figure above, the receiving end feeds back ACK (11) to the sending end after receiving the eleventh data packet, while After receiving the ACK (11) from the receiving end, it will send another ACKACK to the opposite end to indicate receipt of ACK (11). The biggest significance of this process is that it allows the receiving end to calculate the RTT. The difference between the time the ACK is sent and the time the corresponding ACKACK is received is RTT—Round Trip Time (RTT) is a measure of time, indicating that the packet consumes one round trip Time. SRT cannot measure the time-consuming in one direction, so only RTT / 2 can be used to express the time-consuming in one direction. An ACK (from the receiver) triggers the transmission of ACKACK (from the sender) with almost no other delay. RTT can assess the current network quality. A high RTT indicates that the entire network has a large delay, and a low RTT indicates that the network delay is low. RTT is calculated by the receiving end, and the calculated RTT will be sent to the sending end through ACK, and the sending end can know the quality of the current network. The bandwidth situation is constantly changing, so RTT is also a real-time dynamic value that continuously changes with the network environment.

2.3.4 ACK information

The ACK message contains abundant key information. Among them, Last Acknowledge Packet Sequence Number indicates how many packets are currently received, RTT information is convenient for the sender to evaluate the quality of the network, and changes in RTT indicate changes in the network. Available Buffer Size indicates the buffer availability rate at the receiving end. SRT can be used for file transmission. Available Buffer Size is very useful for congestion control of file transmission. When the file transfer is too fast, the memory buffer of the receiving end cannot be successfully received. At this time, the sending end needs to consider the performance of the receiving end and whether the cache has enough space and ability to receive huge files in a short time in addition to considering the network. Packets Receiving Rate represents the rate of receiving packets per second, where the statistics are the number of packets; and ReceivingRate represents the bit rate of received packets.

In a word, the receiving end sends ACK to the sending end in a period of 10 milliseconds, which includes key data such as network RTT information, receiving end buffering information, and receiving end bit rate. The above three data are crucial and directly reflect the sending end to the receiving end. The state of the network environment between the terminals.

2.3.5 NACK

Under normal circumstances, QUIC or TCP only provides the ACK method, that is, the receiving end feeds back to the sending end which packets are successfully received; and WebRTC RTP or RTCP often chooses NACK, that is, the receiving end feeds back to the sending end which packets are not successfully received. NACK returns a list of packets not received by the receiver.

SRT supports both ACK and NACK, and the reason for this setting is in my opinion the strength of bandwidth preemption. For example, the receiving end sends an ACK to the sending end to indicate that the data packet has been successfully received. It is not excluded that the text of the ACK packet is lost, which causes the sending end to think that the data packet is lost and needs to be retransmitted overtime. Too.

Another situation is that the periodic sending of NACK may cause the sending end to send more packets. For example, the sending end successfully sends data packets No. 5 and No. 6, but has not received the ACK message from the receiving end. When the time for retransmission is reached When the sender sends data packets 5 and 6 again, after that the sender receives a NACK message from the receiver, indicating that the receiver did not successfully receive data packets 5 and 6; so the sender sends 5 for the third time Number packet and number 6 packet. In one cycle, the sender sent two packets. From the perspective of protocol principles, we found that SRT consumes more bandwidth than QUIC and other protocols during packet loss. Once packet loss occurs, SRT retransmits more than other protocols. SRT uses this to seize bandwidth to achieve audio and video synchronization.

2.4 SRT is sent based on time

Sending by time is a standard audio and video transmission feature. We know that the encoder sends data at the speed of the encoder, but sometimes the encoder will be too optimistic. Sending it completely according to the encoding output will cause the output to exceed the expected high bit rate. In this case, the SRT Packet will not be output fast enough, because the SRT will eventually be affected by an incorrect configuration that is too low.

2.5 Configurable bit rate

There are three configuration options in SRT: INPUTBW indicates the encoder input bandwidth, MAXBW indicates the maximum bandwidth, and OVERHEAD (%) indicates the overload rate. The configuration principle is shown in the above figure:

If INPUTBW and OVERHEAD (%) are not configured and only MAXBW is configured, the current encoder output bit rate is also MAXBW;

If MAXBW and INPUTBW are not configured and only OVERHEAD (%) is configured, the current encoder output bit rate is (1080 +) / 100, and the default is 25%;

The third case is to configure OVERHEAD (%) and INPUTBW without configuring MAXBW. The maximum output bit rate is (100 +) / 100.

The biggest feature of SRT is configurable. If there is a configuration, do it according to the configuration, and if there is no configuration, do it according to the actually measured bit rate. Under normal circumstances, we do not choose to configure, especially for the Internet, because a single configuration cannot guarantee that it can cope with different situations in different periods of the network. Usually we choose to use the actual measured encoding bit rate calculation: * (1080 +) / 100.

2.6 SRT simple congestion control-simple, too simple

Next we discuss the flow control of SRT. We know that under the packet loss retransmission mechanism, if the packet is lost in the network transmission, the sending end will resend. For example, in a certain network environment, the bandwidth between the sending end and the receiving end is only 1M. Now, due to the influence of background traffic and noise, the original 1M bandwidth is reduced from 100k to 900k, and 10% packet loss will occur at this time; The sending end did not receive the ACK from the receiving end or received NACK that thought packet loss occurred, so tried to retransmit the 10% of the data; but in fact the bandwidth between the sending end and the receiving end is 900k, plus the loss The transmission volume of packet retransmission is 1.1M, and the traffic sent when the bandwidth is narrowed does not decrease but increases, which will cause the network situation to become worse and worse, resulting in frequent network congestion and fast pause, and finally the network collapses. SRT's simple packet loss and retransmission cannot solve the problem of network congestion. The best solution to network congestion is to limit traffic and reduce the pressure on bandwidth traffic.

SRT's congestion control is too simple, only implemented in congctrl.cpp. It includes the following variables:

M_iFlowWindowSize represents the buffer size of the receiver, m_dCWndSize is equal to the bytes data sent in 1 second / (RTT + 10), sendbufer means that the sender did not send the buffer size. The bytes data sent in 1 second here specifically refers to the data sent in one RTT. The summary algorithm is as follows: m_iFlowWindowSize is used to measure whether the receiver's buffer is exhausted, and at the same time monitor whether the size of the data sent by the sender per RTT is greater than the size of the send buffer. Receiver cache generally works when transferring files.

In my opinion, SRT has not done enough to control congestion.

2.7 Summary of SRT protocol

SRT has obvious advantages in terms of fast connection, and the connection can be established after two handshake success; SRT has an excellent packet loss retransmission strategy. ACK, ACKACK, NACK, etc. provide rich control messages, as well as RTT, Receive bit rate, etc. But at the same time, we also found that after the SRT lost packets, the bandwidth occupied by the transmitted data was still large. The higher the packet loss rate, the greater the bandwidth occupied by the transmission.

SRT sends audio and video data according to time, according to the actual encoding bit rate to send audio and video data; but SRT congestion control is too simple, only for the receiver's cache is capable of receiving and encoding bit rate is sent too fast.

3.1 Application of SRT in SRS4.0
3.1.1 The last mile-SRT push flow

We recommend using SRT from the encoder to pushing the stream to the edge node. There are two main purposes: to improve the quality of the source stream and adaptive bit rate coding based on SRT, and to cooperate with the sending end and the receiving end to solve the last one kilometer push flow problem. It is my understanding of srt. SRS supports SRT's receiving end push flow. After receiving the SRT push flow from the push end, the edge SRS will convert it to RTMP and distribute it to the central node, and then all the edge nodes can perform RTMP pull flow.

3.1.2 SRT address format

As a transport protocol, one drawback of SRT is that it gives an undefined address. We don't know whether this is a push-stream address or a pull-stream address, so how to match?

In order to facilitate the SRT encoder to push the stream, SRS4.0 supports the simple configuration of the encoder. Such as server IP, server Port and streamid. According to SRT's official documentation, SRT is identified by StreamID. The simplified address format is shown in the figure above, and the address format with vhost virtual host configuration is shown in the following figure, where m = publish / request indicates the push / pull stream address. Obviously, compared with RTMP, the readability of SRT addresses is not good, and the addresses are long and complicated.

3.1.3 SRT test under various network conditions

Testing SRTs under various network conditions, it is not difficult to find that the increased packet loss rate leads to increased bandwidth consumption. When the network condition is poor or congestion occurs, the sender will send more data, which will cause the network condition to deteriorate further. The packet rate becomes higher, and in this vicious circle; in addition, the increase in RTT will also increase the delay, which will also lead to an increase in packet loss rate and greater bandwidth consumption.

The solution we propose here is to predict the network bandwidth-predict the network bandwidth through the current send bitrate, RTT, inflight and other data; at the same time, dynamically adjust the encoding bit rate, dynamically adjust the encoding bit rate 1 according to the predicted bandwidth to adapt to the real-time bandwidth, Avoid congestion and improve video fluency.

3.1.4 GCC algorithm adaptive coding architecture

The figure above shows the architecture of Google's congestion control algorithm GCC. As shown in the figure: the sending end sends the message to the receiving end, and the receiving end is composed of several parts, in which the calculation of the received message Delay is calculated based on D (i, j) = (Rj-Sj)-(Ri- Si) Derivative of change, get m (ti). In addition, the Kalman filter algorithm set at the receiving end will calculate a threshold value. Through the calculation and comparison of the Kalman filter, it is decided whether to increase the code rate or reduce the code rate. Estimate (Ar) and return the data to the sender.

3.2 SRT push stream with adaptive bit rate

3.2.1
The key parameters based on SRT adaptive bit rate coding The key parameters based on SRT adaptive bit rate coding are shown on the left side of the above figure: rtt_min represents the minimum RTT within 1s, send_bitrate_max represents the maximum transmission bitrate within 1s, and the normal situation is about 200 Count once every 300 ms; inflight indicates the number of packets that have been sent but have not received ACK. This situation may mean that the packets are still in the transmission link; BDP (bandwidth-delay product) BDP = send_bitrate_max * rtt_min represents the maximum number of bytes sent in an RTT. If BDP is greater than (1.2 x inflight), the network is in good condition, you can increase the coding rate; if BDP is less than (0.8 x inflight), it means that the network is in poor condition, you need to reduce the coding rate, otherwise, maintain the existing coding rate change. This can effectively avoid network congestion.

3.2.2 Adaptive rate coding based on SRT-example

We use test examples to verify the effect: Pushing the stream through the Internet, the data from the SRT encoder in the United States is pushed to the node in Hangzhou to establish SRS4.0. The open source address of the test is shown in the figure. The FFmpeg configuration is used to configure the coding rate to 1000 kbps. During the transmission process, the coding rate is dynamically adjusted according to the actual exit bandwidth. From the line 1 on the right side of the above figure, we can see that the adaptive bit rate can reach up to 1400kbps, which has a good adaptive effect; from the actual test experience, the video playback is smooth without obvious jams, even if occasional bandwidth jitter occurs. restore.

  1. SRT and QUIC

Next, we compare SRT and QUIC and summarize their characteristics.

4.1 The advantages and disadvantages of
SRT1.4 The advantages and disadvantages of SRT1.4 can be briefly summarized as follows: The advantages of SRT are that audio and video are sent and received according to timestamps, which can effectively ensure audio and video, and at the same time ACK / ACKACK / NACK multiple packet loss correction The mechanism can effectively reduce the delay and packet loss rate. SRT provides rich transport layer data information such as RTT, lost packet rate, and receive rate to the upper layer. Of course, the shortcomings of SRT cannot be ignored. For example, the congestion control of SRT is too simple, and the BBR algorithm needs to be integrated in the transport layer. The native SRT does not support connection migration.
Based on the above characteristics, I think that SRT is more suitable for the transmission of the encoder to the nearest node, that is, the RTT and other relevant information detected by SRT is used to implement adaptive bit rate encoding; in addition, SRT is also suitable for fixed network nodes and fixed network conditions. Environment, reasonable configuration of SRT parameters such as lantency, send / recv buffer, and head rate can achieve a multiplier effect.

4.2 Advantages and disadvantages of QUIC

The advantages of QUIC are fast connection and pluggable congestion control, including CUBIC and BBR. In addition, in terms of packet loss and retransmission, QUIC has more ACK blocks, up to 256, and more accurate calculation of RTT (There are independent Packet numbers in QUIC, and the packet numbers including retransmission packets are also different, which is very conducive to the accurate calculation of RTT); Finally, QUIC supports connection migration.

Of course, QUIC also has shortcomings: First, the larger the header of the message, the higher the proportion of sent messages. Second, packet loss retransmission is only native to ACK. Third, QUIC does not support packet loss. Once packet loss occurs, it will be disconnected if it is not restored before timeout.

Based on the above advantages and disadvantages, I think QUIC is more suitable for use in environments with a higher network packet loss rate, because QUIC has 0RTT fast connection capability, a larger ACK reply block in packet loss retransmission, and performs very well in congestion control. In addition, QUIC is also suitable for long-distance transmission. Because the network transmission RTT is higher, QUIC is 0RTT after the connection is disconnected, and the transmission of data is more efficient.

4.3mediago service: rtmp over quic

Mediago supports the QUIC protocol to transmit RTMP live streams, such as RTMP over TCP push-pull streams, RTMP over QUIC push-pull streams, and support for FLV. The server supports RTMP back-to-source between TCP-based servers and RTMP back-to-source between QUIC-based servers. There is a test link in the picture above. Interested friends can try it for themselves.

Reprinted to: https://segmentfault.com/a/1190000022071085

Published 200 original articles · Like 157 · Visit 410,000+

Guess you like

Origin blog.csdn.net/u014162133/article/details/105296430