Technology Sharing | Converged Conference Protocol Decryption

AnyRTC's converged conference solution supports H.323 protocol, SIP protocol, GB28181 national standard protocol, private protocol, etc. When actually communicating with customers, we are often asked whether SIP or H.323 is better? The client has built a video conferencing system in the early stage, with H.323 and Internet conferences such as Tencent Conference/Zoom, how to communicate? This article makes an in-depth discussion on how systems using different protocols communicate with each other, and the development and evolution of control protocols in the field of video conferencing.

When it comes to video conferencing, I have to talk about the H.323 protocol first. Many people may not be familiar with this protocol, because H.323 was born in the late 1990s and may be much older than today's young developers. , H.323 protocol is designed for multimedia communication on packet switching network, in short, it is the originator of VoIP communication protocol. The standard is formulated by ITU (International Telecommunication Union). ITU's articles and standards are not as friendly and open as IETF's RFC, and they are not easy to obtain from the Internet for free. The protocol process of H.323 is much more complicated than that of SIP. Here By analogy with the SIP protocol process, I let everyone get familiar with it simply and quickly.

1. Introduction to H.323

The main components defined by H.323 are: Gatekeeper (GK: Gatekeeper), gateway (Gateway), terminal (Terminal) and multipoint control unit (Multipoint Control Unit). The terminal (Terminal) is equivalent to the UA (User Agent) in SIP, and the gatekeeper (GK) is essentially an integrated service, the main functions include: terminal authentication, address resolution, bandwidth management and routing control, etc., which is equivalent to SIP server, the difference is that SIP assigns the aforementioned functions to different SIP Servers for execution: SIP Proxy Server, SIP Redirect Server and SIP Registrar (in actual technical implementation, they are often integrated on a network element entity), and this GK It has all these functions. As the name implies, the gateway is to handle the translation and intercommunication with other protocols. Multi-point control unit is what we often call the MCU in video conferencing. The name of MCU comes from the H.323 protocol, and the historical status of H.323 in the field of video conferencing can be seen.

H.323 protocol process

H.323 defines a set of protocol families, the core protocols are H.225 and H.245. Let's look at a typical H.323 call flow.

insert image description here

The first is the H.225 RAS message (Registration, Admission, Status) between the terminal and the GK, which is used for the terminal to register with the GK, call admission control and status query. Similar to the UA in the SIP process, the SIP register and registration refresh are performed to the SIP Proxy for terminal registration and status query. In commercial implementations, SIP Proxy usually enables Call Admission Control (CAC: Call Admission Control).

The terminal has completed the registration, and the call is successfully accepted by the GK. Then comes the H.225 call control process. SETUP carries the address of the calling party and the called party. The GK routes the call to the called party according to the called address information. The called party sends back a CONNECT message carrying Transport layer address (IP address + port number) of the H.245 control channel. Does it sound a bit like SIP INVITE and 200 OK? The difference is that the process of SIP INVITE/200 OK/ACK not only completes call routing, but also completes media capability negotiation through the SDP protocol in the message body. H The .225 process only completes the call routing and establishes the H.245 media control channel, and the media negotiation is left to H.245.

Description of media capabilities. SIP uses the SDP protocol to describe and negotiate the media capabilities between the calling party and the called party, while H.323 uses the H.245 protocol to describe and negotiate media capabilities. The H.245 protocol is divided into three processes: media capability exchange, master-slave determination and media channel establishment. Media capability exchange (Terminal Capability Set), H.323 terminals carry their media capabilities and priorities in the form of database tables (optional capability set: alternative Capability Set), and possible combinations of simultaneous processing of multiple media capabilities (simultaneous capability : Simultaneous Capabilities), for example, when using a certain video codec, certain (several) audio codecs are allowed to be used at the same time; Master Slave Determination (Master Slave Determination) is used to resolve call control channel conflicts, especially in multipoint video conferences Determine which one is the MCU; after this is to establish a media channel (Open Logical Channel). The communication parties can establish a one-way channel from the local end to the opposite end one by one, or a two-way channel can be established at one time, and the establishment of the media channel is completed. Does it sound a bit similar to the SDP negotiation in the SIP protocol process? However, unlike the offer-answer mode of SDP, the H.323 media negotiation process is much more complicated, and of course the function is more powerful. It can not only negotiate the ability of audio and video, but also the ability of combining audio and video, and can not only establish two-way symmetry channel, and an asymmetric channel can also be established, that is, different codecs can be used for sending and receiving. After the media channel is established, the media interaction can start, and the media stream is also RTP stream. In addition, there are two useful commands in H.245, Flow Control and Fast Update, which are used to notify the sender to slow down and retransmit the specified media packets after the receiver finds packet loss.

2. Comparison of H.323 and SIP protocols

SIP (Session Initiation Protocol, Session Initiation Protocol) is an IP telephony signaling protocol proposed by the IETF (Interne Engineering Task Force). As its name implies, SIP is used to initiate sessions, which can control the establishment and termination of multimedia sessions participated by multiple participants, and can dynamically adjust and modify session attributes, such as session bandwidth requirements and transmitted media types (Voice, video and data, etc.), media codec formats, support for multicast and unicast, etc., are mainly used in the Internet.

The H.323 standard is formulated by the ITU (International Telecommunication Union) and is a standard communication protocol. The H.323 protocol reflects the inheritance of the traditional PSTN everywhere, for example, the E.164 number format is used for addressing. The terminal side sends the called number. In addition to supporting the overall code sending (EnBlock, like SIP UA, the terminal side number is collected and sent at one time), H.323 can also support receiving and sending at the same time, which is also a typical traditional PSTN. feature.

insert image description here

So far, the H.323 protocol is still the preferred protocol for the intercommunication of video conferencing systems between different manufacturers, because the H.323 protocol has strict definitions, good interoperability between different manufacturers, and poor scalability. Few manufacturers use the H.323 protocol Expand on the Internet, but except for a few manufacturers, everyone in the industry knows it, so I won’t name them here.

In terms of encoding, H.323 adopts ASN.1 binary encoding, which has high efficiency and short messages, but poor readability; SIP protocol adopts plain text encoding, which has strong scalability and is very friendly to developers and users. Of course, you can say that both H.323 and SIP can be interpreted by using packet capture software such as Wireshark, but if you directly download the file from the TCP dump from the server, the text format of SIP can be read directly, and H.323 is just a book. up. In terms of transport layer protocols, H.323 uses UDP to transmit RAS messages, other H.225 and H.245 messages use TCP, and SIP protocol can use UDP or TCP.

Regarding the ability of media negotiation, H.323 is indeed more powerful and perfect than SIP. The main reason is that the computing power of the CPU and GPU of the video terminal was not as powerful as it is today, and the high-performance codec brought more CPU resource consumption of the terminal. There is a big difference, the media negotiation process is very critical, and the perfect media negotiation capability provided by the H.245 protocol was very meaningful at that time. Nowadays, with the exponential leap in the computing power of chips, even the most entry-level video terminals can support the current mainstream audio and video codecs, and media negotiation has become much simpler.

Three, summary

Things are always moving towards the direction of integration. In order to simplify the protocol process and shorten the complexity and delay of call connection, H.323 introduces the fast start protocol mechanism, which integrates the call control process and media channel establishment The process is integrated together, just like SIP INVITE and 200 OK carry SDP at the same time to complete the media negotiation and establish the media channel. The scalability of SIP writing allows him to show his talents in various fields of communication. The national standard GB.28181 is a set of monitoring protocols based on the SIP protocol, and our commonly used video chat software such as FaceTime and VoLTE are also developed based on the SIP protocol. . Therefore, the future of converged conferences must be a complete set of video communication solutions based on standard communication protocols.

insert image description here

Guess you like

Origin blog.csdn.net/anyRTC/article/details/130880759