WebRTC SDP detailed explanation and analysis

WebRTC is Web Real-Time Communication, the abbreviation of real-time communication on web pages. It is a Web implementation of the RTC protocol. The project is open sourced by Google and has established industry standards with IETF and W3C. In China, WebRTC has been supported by more and more manufacturers, and its application prospects have become broader. Therefore, we also set up a column to share Alibaba Cloud's internal WebRTC research work.

This article is the first in a series of Alibaba Cloud Video Cloud WebRTC technical column articles. The author will analyze and interpret for everyone from the perspective of WebRTC SDP examples and key attributes. It also shares some practical experience of Alibaba Cloud technical experts. Everyone is helpful or inspired. The follow-up WebRTC technical column series will continue to introduce the detailed explanation and analysis of WebRTC ICE/DTLS/SRTP/RTCP/TURN. Welcome to follow our official account.

Author:
forget Lei, Ali cloud senior technical expert, Ali cloud RTC server is responsible for research and development;
a Thai, Ali cloud senior development engineer, engaged in R & D Ali cloud RTC server

SDP key attributes

Overview

In a narrow sense, WebRTC refers to the browser side. How can the browser side directly exchange data? It certainly cannot be done independently, and must rely on the server. Generally rely on several servers:

  1. Signaling signaling server, that is, to exchange room and conference media information, as well as messages during the conference, the media description uses the SDP protocol, which is the focus of this article.
  2. The ICE server can be divided into a STUN server that helps two clients make holes to establish a P2P connection, and a TURN server that directly forwards if the connection fails. ICE's information is called Candidate, which can be exchanged through SDP or through Trickle.
  3. SFU or MCU server, if multiple people are in a meeting, each end will send data directly to other participants called MESH, but MESH has obvious limitations. SFU is a forwarding that allows the client to only flow upstream to other clients. The MCU is more powerful and can only have one stream for both uplink and downlink.

Note: In addition to transmission, WebRTC has another important feature that is security, which is DTLS. Some information of DTLS is transmitted through SDP. There will be related technical articles to introduce DTLS later.

Next, we formally introduce the SDP protocol.

What's SDP

The SDP key attribute diagram at the beginning of this article has helped us to get a glimpse of SDP from a global perspective. SDP describes media sessions, network information, security features, transmission strategies, etc. Each SDP attribute in the figure plays a different role in different application scenarios and should not be underestimated.

Next, we further give the official definition of SDP: SDP (Session Description Protocol) is a session description protocol, based on text, it is not a transport protocol itself, and needs to rely on other transport protocols (such as SIP and HTTP) to exchange Necessary media information, used for media negotiation between two session entities.

The Offer and Answer of WebRTC include SDP. Related RFCs include:

  1. 1998, RFC2327
  2. 2006, RFC4566

A good SDP example analysis of WebRTC

Offer and Answer

WebRTC uses the Offer-Answer model to exchange SDP. There is SDP in Offer and also in Answer. For example, Alice and Bob communicate via WebRTC:

// Alice Offer
v=0
o=- 2397106153131073818 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE video
a=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS
m=video 9 UDP/TLS/RTP/SAVPF 96 97
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:l5KU
a=ice-pwd:+Sxmm3PoJUERpeHYL0HW4/T9
a=ice-options:trickle
a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E:61:CB:9D:91:9B:44:F6:C9:AC:3B:37:1C:00:15:4C:5A:B5:67:74
a=setup:actpass
a=mid:video
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=***c-group:FID 2527104241
a=***c:2527104241 cname:JPmKBgFHH5YVFyaJ
a=***c:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56
a=***c:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS
a=***c:2527104241 label:c7072509-df47-4828-ad03-7d0274585a56

// Bob Answer
v=0
o=- 5443219974135798586 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE video
a=msid-semantic: WMS uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs
m=video 9 UDP/TLS/RTP/SAVPF 96 97
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:MUZf
a=ice-pwd:4QhikLcmGXnCfAzHDB++ZjM5
a=ice-options:trickle
a=fingerprint:sha-256 2A:5A:B8:43:66:05:B3:6A:E9:46:36:DF:DF:20:11:6A:F6:11:EA:D9:4E:26:E3:CE:5A:3A:C6:8D:03:49:7B:DE
a=setup:active
a=mid:video
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=***c-group:FID 3587783331
a=***c:3587783331 cname:INxZnBV2Sty1zlmN
a=***c:3587783331 msid:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs a3b297e7-cdbe-464e-a32c-347465ace055
a=***c:3587783331 mslabel:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs
a=***c:3587783331 label:a3b297e7-cdbe-464e-a32c-347465ace055

Remark: Using the Chrome browser, first open webrtc-internals , then open the Alice page and click the Share button, then open the Bob page and click Share, and see the Offer and Answer above.

After SDP exchange, Candidate will be exchanged:

// Alice Candidate
candidate: candidate:1912876010 1 udp 2122260223 30.2.220.94 52832 typ host generation 0 ufrag l5KU network-id 1 network-cost 10
candidate: candidate:1015535386 1 tcp 1518280447 30.2.220.94 9 typ host tcptype active generation 0 ufrag l5KU network-id 1 network-cost 10

// Bob Candidate
candidate:1912876010 1 udp 2122260223 30.2.220.94 51551 typ host generation 0 ufrag MUZf network-id 1 network-cost 10

Finally, the Candidate pair that Alice and Bob communicate with is the UDP channel:

file

Video information sent by Alice:

file

Video information (Bob's) received by Alice:

file

Generally speaking, the pusher initiates an Offer first, and the receiver gives an Answer. For example, the client pushes the stream to SFU, the client initiates an Offer to push the stream, the SFU gives the client an Answer, the client pushes the stream to the SFU, and the SFU forwards it to other clients. Both Licode and Janus do this. In this way, if the client needs to pull other client streams, it generally needs to use another PeerConnection to receive the SFU offer, generate an Answer, and respond to the SFU.

However, it is not necessary for the pusher to initiate an Offer. The receiver can also give an Offer, and the pusher can give an Answer. For example, for SFU such as MediaSoup, the client first gives an offer to SFU. SFU only checks the media characteristics of the offer, and then SFU will generate an offer (including streams of other clients in the meeting, if there is no one, there is no ***C) To the client, the client sends an Answer to SFU. The advantage of this method is that other clients join and stream changes (for example, when the video is turned off and the video is turned on), you can use Reoffer, that is, a new offer is initiated by SFU uniformly, and the client responds. The interaction mode between SFU and client is only One kind.

SDP Structure

SDP description is divided into two parts, namely session level description (session level) and media level description (media level). Refer to RFC4566 for its specific composition , and the ones with an asterisk (*) are optional. Common content is as follows:

Session description(会话级别描述)
         v=  (protocol version)
         o=  (originator and session identifier)
         s=  (session name)
         c=* (connection information -- not required if included in all media)
         One or more Time descriptions ("t=" and "r=" lines; see below)
         a=* (zero or more session attribute lines)
         Zero or more Media descriptions

Time description
         t=  (time the session is active)

Media description(媒体级别描述), if present
         m=  (media name and transport address)
         c=* (connection information -- optional if included at session level)
         a=* (zero or more media attribute lines)

Contrast with Alice's Offer (only includes video without audio):

// Session description
v=0
o=- 2397106153131073818 2 IN IP4 127.0.0.1
s=-
c=IN IP4 0.0.0.0

// Time description
t=0 0

// Session Attributes
a=group:BUNDLE video
a=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS

// Media description
m=video 9 UDP/TLS/RTP/SAVPF 96 97
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:l5KU
a=ice-pwd:+Sxmm3PoJUERpeHYL0HW4/T9
a=ice-options:trickle
a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E:61:CB:9D:91:9B:44:F6:C9:AC:3B:37:1C:00:15:4C:5A:B5:67:74
a=setup:actpass
a=mid:video
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=***c-group:FID 2527104241
a=***c:2527104241 cname:JPmKBgFHH5YVFyaJ
a=***c:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56
a=***c:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS
a=***c:2527104241 label:c7072509-df47-4828-ad03-7d0274585a56

SDP Line order dependent, such as a=rtpmap:96the latter is related to its setting until the next line a=rtpmap, or other attributes.

SDP Line does not have a unified Schema description, that is, there is no fixed rule that can parse all lines. SDP Grammer only describes SDP-related attributes. The specific expression of each attribute needs to be defined in RFC 4566 , for example:

a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]

When parsing SDP, SDP each Line is in key=...the form of key after parsing a, there may be two ways, refer to RFC4566 :

a=<attribute>
a=<attribute>:<value>

For example, c=IN IP4 0.0.0.0, and the key is c.
For example, a=rtcp-mux, the key is a, the attribute is rtcp-mux, and there is no value.
For example, a=rtpmap:96 VP8/90000, key is a, attribute is rtpmap, value=96 VP8/90000.

Sometimes it is not a colon (:), it must be &lt;attribute&gt;:&lt;value&gt;, in fact there will be a colon in the value, for example:

a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=***c:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS

Session Level Field

The SDP description fields at the session level include: v, o, s, c, b, t.

  • v(version)
    SDP protocol version, the value is fixed at 0.

  • o (origin)
    represents the initiator of the conversation.

  • s (session name)
    The name of the session. Each SDP has only one s description, and its value cannot be empty.

  • c (connection data)
    carries the connection information of the session, which is actually the IP address.
    The SDP session level description can include this field, and the description of each media level can also include this field. If both the session level and the media level have c line, then the c line of the media level shall prevail.
    Because WebRTC uses ICE candidates to exchange address information, c line is not used, but this does not mean that c line is useless. In SIP video conference scenarios, c line is essential, and this field will be introduced again at the end of the article.

  • b (bandwidth)
    represents the recommended bandwidth used by the session or media.

  • t(timing)
    specifies the start and end time of the session. If the start and end time are both 0, it means that the session is permanent.

For a more detailed description of the session level field, please refer to RFC 4566 .

Media Codecs

After the session level description is completed, there will be zero or more media level descriptions, such as:

// Session Description
v=0
......

// Audio Media Description
m=audio 9 UDP/TLS/RTP/SAVPF 111
......

// Video Media Description
m=video 9 UDP/TLS/RTP/SAVPF 96 97
......

This SDP describes an audio and a video, and its format refers to RFC4566 :

m=<media> <port> <proto> <fmt> ...

Among them, a string of numbers back 111and 96 97is fmt, representing Media Codec audio and video, will follow later rtpmap, rtcp-fb, fmtp these attributes do further detail.

m=audio 9 UDP/TLS/RTP/SAVPF 111
a=mid:audio
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

m=video 9 UDP/TLS/RTP/SAVPF 96 97
a=mid:video
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

Remark: Of course, the types of M line are not only audio and video, but also application (bfcp), text and other media types.

Remark: a=mid attribute can be considered as the unique ID described by each M. For example a = mid: audio, then the audioID is the string M described. Sometimes the value of the mid attribute can also be represented by numbers, such as a=mid:0, then 0 is also the ID described by this M. BUNDLE mid value and general strategy of grouping combination with transport properties, such as a = group: BUNDLE audio video, on behalf of this session will be mid audioand videomultiplexing transmission M described.

Remark: The number 9 of M line represents the transmission port of the media type. In the RTC scenario, the address information of ICE candidate is used for data transmission, so the port of M line is not used. However, in the SIP scenario, the port of M line is very important. At this time, port represents the RTP port and must be an even number. Combined with the IP address in the C line in the SDP session level description, we can know the transport address of this SIP media stream.

Remark: RTX means retransmission. For example, 97 of video is retransmission of apt=96. In other words, if the encoding format of 97 is used, it adds the retransmission function on the basis of 96 (VP8).

The total number of media streams is specified by ***C:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=***c:2582129002 cname:8Y1pmIKBijmWeALu
a=***c:2582129002 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H bab38910-40cd-4581-9a20-e3f558abb397
a=***c:2582129002 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H
a=***c:2582129002 label:bab38910-40cd-4581-9a20-e3f558abb397

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=***c:565530905 cname:8Y1pmIKBijmWeALu
a=***c:565530905 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H 2c533cfe-b6bf-41a8-93f0-1ca031436702
a=***c:565530905 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H
a=***c:565530905 label:2c533cfe-b6bf-41a8-93f0-1ca031436702

Remark: ***C contains the media stream that needs to be sent, and both Offer and Answer can contain ***C. For example, when the client communicates with MediaSoup, MediaSoup always sends an Offer to the client. The Offer of MediaSoup contains the media stream ***C that MediaSoup wants to send (forwarding other clients' streams to the client), and the client's Answer also contains The type of the ***C stream I want to push is sendrecv.

Remark: msid corresponds to NetStream.id , which means that different media sources are represented. These ***C can be different media sources.

How to determine the final encoding? The other party will give it in Answer. For example, if the Offer above gives multiple codes, one will be selected in Answer:

m=audio 9 UDP/TLS/RTP/SAVPF 111
m=video 9 UDP/TLS/RTP/SAVPF 100 102 127 125 108 124
a=rtpmap:100 H264/90000
a=rtpmap:102 H264/90000
a=rtpmap:127 H264/90000
a=rtpmap:125 H264/90000
a=rtpmap:108 red/90000
a=rtpmap:124 ulpfec/90000

Although Video encoding ranges from 100 to 125, they are all H.264, while 108 and 124 are FEC based on H.264.

PlanB and UnifiedPlan

In the MediaCodecs above, there is no provision on how to specify multiple streams. In fact, Audio and Video have multiple ***Cs, and the encoding of each ***C may be the same but may also be different. For example, when using a mobile terminal to access an Internet video conference, the encoding may be H.264, but there may be other encodings when connecting with other terminals.

If the codes of ***C are not the same, then putting these ***Cs in the same M description will cause problems. This is the key to PlanB and UnifiedPlan. For PlanB, there is only one M (audio) and M (video), and their encodings must be the same. When there are multiple media streams, they are distinguished according to ***C. UnifiedPlan can have multiple M (audio) and M (video), each stream has its own M description, so that it can support different encodings.

PlanB and UnifiedPlan are actually two different SDP negotiation methods of WebRTC in the multi-media source scenario. If the concepts of Stream and Track are introduced, then a Stream may contain AudioTrack and VideoTrack. When there are multiple streams, there will be more Tracks. If each Track uniquely corresponds to its own M description, then this is UnifiedPlan. If Each M line describes multiple Track (track id), then this is Plan B.

Note: When there is only one audio stream and one video stream, the formats of Plan B and UnifiedPlan are compatible with each other.

Remark: Chrome's early support is PlanB, and the latest version also supports Unified Plan , please refer to Need to implement WebRTC "Unified Plan" for multistream .

Refer to the figure below for PlanB:

file

Refer to the figure below for UnifiedPlan:

file

Candidate

Candidate is a candidate for transmission. The client will generate multiple Candidates, such as host type, relay type, UDP and TCP, as shown in the following figure:

sdpMid: audio, sdpMLineIndex: 0, candidate:2213672593 1 udp 2122260223 30.2.228.19 51068 typ host
sdpMid: video, sdpMLineIndex: 1, candidate:2213672593 1 udp 2122260223 30.2.228.19 55061 typ host

sdpMid: audio, sdpMLineIndex: 0, candidate:3446803041 1 tcp 1518280447 30.2.228.19 9 typ host
sdpMid: video, sdpMLineIndex: 1, candidate:3446803041 1 tcp 1518280447 30.2.228.19 9 typ host

sdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 udp 41885439 182.92.80.26 54400 typ relay raddr 42.120.74.91 rport 37714
sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 59241 typ relay raddr 42.120.74.91 rport 49618

Remark: We removed the following attributes, for example generation 0 ufrag kce9 network-id 1 network-cost 10, these descriptions belong to Candidate and are related to connectivity checks.

The client itself generates 6 Candidates, 3 Audio and 3 Video, 2 TCP and 4 UDP, 4 host and 2 relay. Of course, the other party will also have a lot of Candidates. The next step is to match your Candidates with the other party's Candidates (ICE Connectivity Checks) to form CandidatePair, which is the transmission channel. Candidate also comes with network attributes, such as network-cost will be used in ICE Connectivity Checks.

Remark: Regarding the Candidate type, there are srflx and prflx. The definition and distinction of these two Candidate types will be introduced in ICE related technical articles later.

Remark: Regarding ICE Connectivity Checks, we will give a detailed analysis later, involving the STUN protocol. SDP information related to ICE will be summarized below.

Both SDP and Candidate are exchanged through signaling. If the other party only gave the candidate of the relay, for example:

sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 51542 typ relay raddr 42.120.74.91 rport 56380

In this case, the last CandidatePair connected must be Relay to Relay, as shown in the following figure:

file

file

From this figure, we can see the transmission and reception rate of the transmission channel, the number of packets, RTT, and packet loss rate.

In fact, since our client also has a Candidate of the host type, it will try to connect directly with the Candidate of the host and the relay of the other party:

sdpMid: audio, sdpMLineIndex: 0, candidate:2213672593 1 udp 2122260223 30.2.228.19 51068 typ host

Statistics Conn-audio-1-1
googActiveConnection    false

file

Of course, this CandidatePair is not available because there is no connection.

Remark: WebRTC has the ability to switch between multiple Candidates. We will analyze it in ICE Connectivity Checks.

The Candidates above have generated two Relay Candidates, one is audio and the other is video. Why is only audio used? This is what the following BUNDLE involves.

Bundle and RTCP-MUX

During transmission, media channels can be multiplexed, one is audio and video multiplexing, and the other is RTCP and RTP multiplexing.

The multiplexing of RTCP and RTP means that Sender uses one transmission channel (single port) to send RTP and RTCP:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=rtcp-mux

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=rtcp-mux

At this time, Receiver must be ready to receive RTCP data on the RTP port, and needs to reserve some resources, such as RTCP bandwidth.

When audio and video are multiplexed, only one Candidate will be used for transmission, such as the client’s own SDP Offer and two relay Candidates:

a=group:BUNDLE audio video

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=mid:audio

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=mid:video
sdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 udp 41885439 182.92.80.26 54400 typ relay raddr 42.120.74.91 rport 37714
sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 59241 typ relay raddr 42.120.74.91 rport 49618

This means that the final audio and video may have independent Candidates, but if the other party is also BUNDLE, then only one Candidate will be used in the end. For example, if the other party’s Answer is:

a=group:BUNDLE audio video

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=mid:audio

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=mid:video
sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 51542 typ relay raddr 42.120.74.91 rport 56380

In the end they will only be transmitted with a Candidate. As shown below:

file

rtcp-mux multiplexes RTP and RTCP to a single port for transmission, which simplifies NAT traversal, while BUNDLE multiplexes multiple media streams to the same port for transmission, which not only changes the ICE-related SDP attributes such as candidate harvesting It has to be simple, and it further simplifies NAT traversal.

rtcp-mux is an important SDP attribute related to RTC transmission. The principles of its SDP negotiation are as follows:

  1. If Offer carries the rtcp-mux attribute, and the Answer party wants to reuse RTP and RTCP to a single port, then the Answer must also carry this attribute.
  2. If the Offer does not carry the rtcp-mux attribute, then the Answer must not carry the rtcp-mux attribute, and the Answer party prohibits RTP and RTCP from using a single port.
  3. The negotiation and use of rtcp-mux must be bidirectional.

for example. The client subscribes to the server's stream, and the client's Offer does not carry the rtcp-mux attribute, then the server will think that the client does not support rtcp-mux and will not follow the rtcp reuse process. On the contrary, the server will create two transmission channels, RTP and RTCP, respectively. Only when the ICE and DTLS of the two channels are successful, will the subscribed transmission channel be considered successful, and then send to the client.

Just imagine, if you missed the rtcp-mux attribute of Offer due to your negligence, then you will never wait for the day when the server is ready. Therefore, SDP seems to be just some text, very simple, but only in the actual combat of the project and encounter a few more pits, can you more deeply understand the meaning of SDP attributes and how these attributes play a role in the RTC scene .

Remark: For more detailed negotiation details about rtcp-mux, please refer to RFC 8035 .

Remark: For how to distinguish between rtp and rtcp by header fields in the rtcp-mux scenario, please refer to RFC 5761 .

ICE Connectivity

Here we only explain the information related to ICE Connectivity Checks in SDP, and we will analyze the specific process separately in other articles.

Information related to ICE in SDP includes:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=ice-ufrag:kce9
a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCE
a=ice-options:trickle

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=ice-ufrag:kce9
a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCE
a=ice-options:trickle

ufrag and pwd are the username and password used by the ICE short-term authentication algorithm. However, trickle shows that the SDP does not contain candidate information. Candidates are exchanged separately through signaling, so that Connectivity checks and Candidate harvesting can be processed in parallel to improve the speed of session establishment.

DTLS

Here we only explain the information about DTLS in SDP. The specific DTLS handshake process will be analyzed separately in DTLS-related technical articles.

The information related to DTLS in SDP includes:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126B0:A2:B3:AB:0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BD
a=setup:actpass

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=fingerprint:sha-256 B0:A2:B3:AB:0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BD
a=setup:actpass

The fingerprint is the signature of the Certificate in the DTLS process to prevent the client and server certificates from being tampered with.

In addition, setup refers to the role of DTLS, that is, who is DTLS Client (active) and who is DTLS Server (passive). If you can do both, it is actpass. Here we are actpass, then it is up to the other party to determine the final DTLS role in Answer:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=fingerprint:sha-256 B1:FD:D6:2D:94:4E:33:A1:8C:9D:EF:ED:EB:AC:CC:2D:E2:37:15:9B:24:8C:BF:F2:7D:6A:B3:81:23:AA:13:54
a=setup:active

The other party is active, that is, DTLS Client, so you can only be DTLS Server, and the other party will initiate DTLS ClientHello to start the DTLS process.

Stream Direction

There are four directions of media flow, namely sendonly, recvonly, sendrecv, and inactive. They can appear in both the session-level description and the media-level description.

  • sendonly means that only data is sent. For example, if the client pushes to SFU, it will carry the senonly attribute in its Offer (or Answer)
  • revonly means that only data is received. For example, if the client subscribes to the SFU stream, it will carry the recvonly attribute in its Offer (or Answer)
  • sendrecv means that it can be transmitted in both directions. For example, when a client joins a video conference, it not only needs to publish its own stream but also subscribe to someone else’s stream, then it needs to carry the sendrecv attribute in its Offer (or Answer)
  • Inactive means that data is forbidden to send. For example, in a video conference based on RTP, the host temporarily disables user A's voice, then user A's media level description about audio should carry the inactive attribute, indicating that no more audio data can be sent.

NOTE: RFC 4566 : The senonly and recvonly attributes are only used for media, not for media control related protocols. For example, in an RTP-based media session, even in recvonly mode, RTCP packets are still sent, and even in senonly mode, RTCP packets are still received and processed normally.

The four attributes of the media flow direction are very important, and the SDP must be carefully checked when assembling the SDP to ensure the correctness of the flow direction.

For example, the client goes to subscribe to the server's stream. If the attributes carried by the client's Offer at this time are not recvonly but sendonly, then even at the signaling level, it is indeed the semantics of subscription, but because some servers check the attributes of SDP is very comprehensive and strict (it should be so ), in this scenario, the server will not send media streams to the client, and the Answer from the server may not carry ***C at all.

RTCP Feedback

Next, let's talk about the media-level SDP attribute of rtcp-fb. It can tell us which RTCP messages the media session can respond to. It is an important SDP attribute related to QoS.

m=video 9 UDP/TLS/RTP/SAVPF 96
a=mid:video
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack pli

As in the above SDP information, this is the M description of a video, VP8 encoding, and the payload type is 96. The last three rtcp-fb attributes indicate that for the 96 media codec, it supports twcc in terms of network congestion control; supports nack processing in ARQ and can retransmit lost RTP packets; supports fir and pli in key frames Processing, capable of sending key frames.

When doing SIP, I encountered a pit: After sending a PLI request to a certain model of SIP device, but not receiving a key frame, after some tossing, I finally found that the rtcp-fb description of this device is as follows:

m=video 16402 RTP/AVP  34
a=rtpmap:34 H263/90000
a=fmtp:34 CIF4=1;CIF=1;QCIF=1;SQCIF=1
a=sendrecv
a=rtcp-fb:* ccm tmmbr
a=rtcp-fb:* ccm fir

That is to say, this device only supports FIR requests and does not have the ability to handle PLI requests (PS: Why can't I check SDP's rtcp feedback capability earlier, tears). I also want to emphasize this: For some very professional and rigorous systems or equipment, SDP fully embodies the capabilities they have, and it can also allow us to discover capabilities that they do not possess. Every attribute of SDP has its own meaning and must not be ignored.

Note: rtcp-fb cannot be used in the description of the session level, it can only be used in the description of the media level, and the proto field of its M description must specify AVPF.

Note: This format exists, a=rtcp-fb:* ccm fir, the asterisk is a wildcard, which means that all types of media codec described by the M support fir processing and key frame feedback.

Compare with SIP SDP

The difference between the RTC scenario and the SDP description in the SIP scenario is manifested in the three levels of transmission, media, and signaling.

Transmission Level

  1. Build connection process. The audio and video media stream connection process in the RTC scenario is generally ICE + DTLS, but there is no such process in the SIP scenario, so there is no ICE/DTLS related SDP attributes, such as ufrag, pwd, setup, fingerprint, etc.

  2. Port multiplexing. In the RTC scenario, audio and video streams and RTP/RTCP are generally multiplexed with a single port, each stream is distinguished by ***C, and RTP/RTCP is distinguished by the value of the header field of the data packet, while in the SIP scenario, no multiplexing Port, so there is no rtcp-mux attribute, nor grouping-related attributes, such as BUNDLE, and audio and video RTP and RTCP are independent ports for transmission, there are four in total, so naturally you can use ports to distinguish streams and RTP/RTCP, Therefore, there is no ***C attribute.

  3. Link detection. In the RTC scenario, the STUN detection link of the ICE is generally used to find the egress address of the opposite end after NAT mapping, called srflx. In the SIP scenario, you need to implement the function of discovering the opposite end address by yourself to obtain the SIP device after NAT mapping. Exit address.

  4. Address information. In the RTC scenario, the peer address information is exchanged through the candidate of SDP, and in the SIP scenario, the peer address information is exchanged through the ip of C line and the port of M line.
// RTC 场景
a=candidate:1 1 udp 2013266431 30.27.136.138 14306 typ host

// SIP 场景
c=IN IP4 30.41.5.131
m=audio 2352 RTP/AVP 107 108 114 104 105 9 18 8 0 101 123
m=video 2374 RTP/AVP 97 126 96 34 123

Media Level

  1. Screen sharing. In the SIP scenario, the BFCP protocol is used to negotiate screen sharing, and the a=content attribute is used to distinguish between main and shared streams (slides). In the RTC scenario, screen sharing is negotiated through external/service signaling. The mainstream and The SDP description of the shared stream is consistent and will not be distinguished.
  2. Media Codec. At present, the audio and video encoding in the RTC scene is generally Opus + H.264/VP8. In the SIP scene, for audio encoding, many SIP devices do not support Opus, and use older audio encoding, such as G722, PCMA, and PCMU. For video encoding, H.264 is generally supported, and VP8 is generally not supported.

Signaling Level

  1. SDP exchange. Both are Offer/Answer models. In the RTC scenario, SDP is exchanged mainly through HTTP/TCP protocol, and SDP information is usually carried in the HTTP body. In the SIP scenario, SDP can be exchanged through UDP/TCP/TLS protocol, and SDP information is carried in INVITE and 200 OK.

Summary

In fact, the textual SDP protocol format itself is very simple. The difficulty lies in the complex attributes and their meanings extended under different application scenarios (such as traditional SIP video conferencing or RTC scenarios). These SDP attributes are scattered in numerous RFCs and In the draft, it is difficult to fully understand and master without a certain amount of effort (PS: Whenever I say this, there are always 10,000 horses galloping in my heart. There are too many RFCs in WebRTC and they are related to each other. See After finishing these RFCs, be prepared for a 0.2 degree drop in vision).

In the next article, we will focus on WebRTC ICE, including connectivity detection, state switching, trickle and nomination. Thanks for reading.

The Alibaba Cloud Video Cloud Technology Official Account shares the video cloud industry and technology trends, creating "new content" and "new interactions."

Guess you like

Origin blog.51cto.com/14968479/2561319