The open source streaming media server nginx-rtmp-module, which is widely used in China, has always had problems such as few functions and great difficulty in clustering. In the online sharing of LiveVideoStack, PingOS open source project team development engineer and UCloud RTC R&D engineer Zhu Jianping introduced in detail the PingOS streaming media server based on nginx-rtmp-module in http-flv, http-ts, hls+, multi-process, retweet, Technical implementation details of back-to-source and clustered deployment.

Text / Zhu Jianping

finishing/LiveVideoStack

Live playback

https://www2.tutormeetplus.com/v2/render/playback?mode=playback&token=006643cdea15499d96f19ab676924e88

1. Nginx streaming extensions: http-flv, http-ts, hls+

The initial nginx-rtmp-module related model is practically the same as most streaming servers including SRS (1 producer, n consumers). Nginx has a problem: it only does the RTMP consumption model, it will be more difficult to extend the form of http-flv or http-ts. Since rtmp-session is only used by the RTMP protocol, if we want to extend http-flv, we first need to understand its basic distribution model (as shown in the figure above): all producers and consumers will be mounted in the same stream, producing The consumer is responsible for receiving data from the network, and the consumer obtains data from the buffer and sends it to the outside world.

If flv data is sent, the original rtmp-session can be retained. When the server receives an HTTP request, an rtmp-session is created. This session is not related to the network, but is only a logical session. Then inject this session into the stream. If it is injected into the stream as a consumer, the data can be obtained and distributed.

If the server receives an http-flv request at this time, it can create a logical session and inject it into the stream. In theory, what we can get is rtmp data. But what we need is flv data. Since flv data is similar to rtmp data, we can easily restore rtmp data to flv data by tag-header.

According to the above ideas, in the producer and consumer model, consumers can reuse the previous distribution process and implement the http-flv protocol by creating http-fake-session. We extend it, create an http-fake-session as a producer, and associate http-fake-session with an http client. After the association, the http client is responsible for downloading data from the remote server and passing it to the producer. The producer will These data can be distributed to the following rtmp-session through the distribution model. This also indirectly implements an http back-to-source function. Through the above ideas, we can quickly realize the playback and streaming of http-flv.

Again, we can continue to extend the protocol along the lines above. If we create the same rtmp-fake-session (a logical session, not related to the network) after receiving an http request, we insert it into the stream as a consumer. In this way, the data that needs to be distributed downward can be obtained from the stream. It should be noted that the rtmp data is initially saved in the stream instead of the ts data, and the ts data cannot be directly obtained.

1.1 Implementation of http-flv in Nginx

To implement http-flv based on Nginx, you need to pay attention to the following details: First, the implementation reuses the distribution model of Nginx and the http function module. (Nginx supports more perfect http protocol stack, including http1.0, http1.1 protocol)

In some online businesses, customers may need to add a suffix when downloading http-flv. According to the previous practical logic, we will filter the suffix in the code. If we encounter more complex requirements, such as modifying whether we need to enable http chunked encoding, we can only modify the code. If it is based on Nginx to implement http-flv by reusing the existing modules of http, we can implement these operations through the nginx-http-rewrite function. Therefore, using the native functions of nginx-http to develop http-flv can bring more benefits, such as significantly reducing the amount of code.

Here I have seen a situation: that is, the http module is reused, but the distribution process of rtmp is not reused. This will cause us to redo the distribution process in http-flv, and the control of the business will become very complicated. For example, if someone requests to play at this time, the service server needs to be notified of the message. At this time, if the two protocols of rtmp and http-flv are implemented separately, it means that if both are triggered, they need to report to the service server separately. Therefore, we need to pay double the code and logic maintenance work, which will undoubtedly increase the development and maintenance costs significantly.

Therefore, the simplest solution is that flv does not do any business-related processing, and only performs format conversion when it is distributed, which is equivalent to only sending rtmp format data during rtmp distribution, and only need to mark rtmp data during flv distribution. The tag-header of flv is then distributed, which saves the development of the business layer.

http-flv playback implementation

The figure shows the support of the rtmp cache for the two protocols rtmp and http-flv. Both http-flv and rtmp share a set of caches. In fact, rtmp itself transmits the data of flv, but the tag-header is thrown away. The only difference between the delivery of http-flv and the delivery of rtmp is that the send function is different: http-flv calls the send function of http, and the native send function is called when rtmp is delivered. protocol header. Sharing a piece of memory between the two can save memory, achieve business unification, and reduce development costs.

Implementation of http-flv back-to-source

The figure shows the implementation of http-flv back-to-source in nginx. The idea of http-flv back-to-source implementation is similar to that of http-flv playback implementation: that is, an http client is created when back-to-source is required, and what the http client does is to download the http data to the local. Before downloading data locally, the http client needs to create an rtmp fake session and inject it into the stream as a producer. Then the http client starts to download data from the network and splits the downloaded fIv data into rtmp data.

Why split into rtmp data? This is because the cached data type of the rtmp push stream is rtmp, so the flv data downloaded from the Internet needs to be split once, split into rtmp data, and then put into the cache. Finally, convert the data into rtmp or flv format according to actual requirements. In this way, according to the logic of rtmp fake session in http-flv playback, the back-to-source operation of http-flv can be quickly realized.

1.2 Implementation of http-ts in Nginx

The figure shows the implementation of http-ts in Nginx. Its implementation idea is basically the same as the implementation of http-flv, but it is different in operation. The difference is that http-ts requires an independent buffer for caching. Since the data formats of http-ts and http-flv are quite different, for flv data to rtmp, it is only necessary to split the data into small pieces and add a header in front. Even if the last frame or a block of the flv data is missing, it does not need to be filled.

But the ts data is different, its requirements are more strict, each block must be 188 bytes, including the ts header and the payload part. And if the database size is less than 188 bytes, it needs to be padded. The rtmp data block does not strictly require its length size. For ts data, in order to convert flv data into ts data, this process needs to consume some computation.

Since the data format of ts data and flv is too different, here we will completely separate the buffer of ts and the buffer of rtmp. However, this operation is not enabled by default and needs to be configured in the server. After the configuration is enabled, a mirrored ts data will be generated from the rtmp buffer. This part of the ts data will only be used by the http-ts and hls protocols. The server also involves a native hls service. Here we have not made any changes, but added the hls+ service to use this buffer.

Whether it is ts or hls+, they all register their own fake session, the purpose of which is to unify the business. For example, when a playback request comes in, we need to let the business server know that there is a current request. Similar to this kind of network notification and event notification interface, in the development process, everyone hopes to only need to write a piece of business data, instead of writing a notification for hls playback when making hls protocol, and writing again for ts when making ts protocol A notification, so the business code will become larger and larger, and finally the service will be almost difficult to maintain. Therefore, the role of fakesession is very large, which completely isolates the network layer from the business layer. Even if the delivery protocol of the server itself is not rtmp, you can create an rtmp-session and mount it to the business server.

In general, the only difference between http-ts and http-flv is the location where the buffer is obtained. http-flv needs to be obtained from rtmp buffer, and http-ts is obtained from ts buffer.

If you can understand the protocol process of http-flv, then it is not difficult to understand the implementation process of http-ts.

1.3 Implementation of hls+ in Nginx

The figure shows the implementation of hls+ in nginx. hls+ is different from traditional hls. Traditional hls has no state on the server side. The server side contains a lot of fragmented data, and the client is constantly performing downloads, while hls+ records the status of each client.

As for how to log the status of each client, I have previously tried to create a dummy connection to the hls+ connection to log the status. However, it is found that the business will be more complicated, and there will be many problems later, including the amount of code, bugs, and maintenance costs. So to change another way of thinking, it is still implemented in the way of fake session. Use fake session as a consumer to put in, according to each incoming http, connect, and bind through session ID. Since the client does not know the sessionID when sending the hls request for the first time, if the server obtains a connection without a session ID, it is considered that the client is entering for the first time. The client will receive a 302 reply, which will tell the client a new address that includes a session ID. After the client obtains the session ID, when it requests m3u8 again, it will add the session ID, and the server can obtain the corresponding session ID and identify the client. In this way, the playback status of each client can be recorded through the session ID.

Why record this status? This is mainly because the server does not directly write the data to the hard disk but puts it into the memory. It needs to know the download progress of each user and each client, and locate the ts data from the memory according to the different progress. hls+ and http-ts share a ts buffer, and hls+ locates the ts content from the buffer in real time. So for hls+, there is no real ts data generation, only the offset of each file in memory is recorded. Therefore, hls+ does not have the problem of reading and writing. When doing hls services, you may have encountered a problem before - the bottleneck of reading and writing hard disks. The read and write speed of the mechanical hard disk is relatively slow. The common solution is to mount a virtual hard disk and map the memory to the directory for reading and writing. If the hls+ solution is adopted, the mounting operation can be omitted, and there is not much consumption of memory. And if there are hls+ and http-ts requirements at the same time, the memory utilization is very high at this time.

2. Static push-pull flow

Static push-pull flow is mainly to meet the needs of clustering. If a single server is not enough to support the high concurrency of the service, then we need to consider the scalability of the server. In addition, if users are scattered all over the country, it is also necessary to open up between servers. But if the business is not so complicated, you can choose to use static push-pull flow.

The static push-pull streaming service configuration is shown in the figure above. First, let's look at the static pull-streaming: First, there is a target origin site. If static back-to-origin is used, the destination address will be configured in the configuration file, and the destination origin site can be changed at will.

The figure shows a simple static streaming model: if the data from the host is pushed to origin site A, then we need to ensure that the address of server A will not change.

In addition, if you want to build a complete streaming media system, you need to include static pull streaming and static push streaming. If a viewer requests playback from server C, then server C will pull the stream to server A, regardless of whether server A has a video stream, server C will pull it. Therefore, this model is only suitable for relatively simple business scenarios.

3. Dynamic control: dynamic back-to-source, dynamic retweet, authentication

Compared with the "brainless" push-pull flow of static push-pull flow, dynamic push-pull flow is more suitable for the needs of most people.

Nginx's RTMP service has different trigger stages for each function. Taking oclp_play as an example, when someone starts playing, a play message will be triggered, and the play message will carry a start parameter. During the playback process, the play message will still be triggered, but it will also carry the update parameter at this time. A play message is also triggered at the end of the play, and the parameter carried is down. With these parameters, we can notify the service server of the request for playback and the specific stages of playback.

3.1 Dynamic back-to-source

There are similar operations in the push stream process. There is publish in the push stream, which is also divided into three stages. Play and publish are mainly used in authentication operations. If the business server returns a 404 or non-200 result during the start phase, the server will interrupt the current play request, as will publish. In addition, pull and push are mainly used in the dynamic pull stream stage. When the server receives the play request and finds that there is no target stream in the current server, that is to say, the publish stream does not exist, the start phase of the pull is triggered. After sending the start request, if the business server returns a result of 302 and a new rtmp address or http-flv address is written in the location, the server will pull the rtmp stream or fIv from the marked target server flow, this process is called dynamic pull flow.

3.2 Dynamic retweets

Corresponding to dynamic pull flow is dynamic push flow, which is understood in roughly the same way as dynamic pull flow. If you push the stream to the server, the server will send a start request to the configured destination address. If a new rtmp address is added to the returned result, the media server will push the stream to the new rtmp address, which is the operation of dynamic stream push.

The premise of all this is to return the result of 302. If you do not want to push the stream out, then feedback to the server 400 or other non-200, the stream will be interrupted. Oclp_stream is rarely used, and is only triggered when this stream is created and disappeared. Regardless of whether it is play or publish, if only play or publish exists, it will be considered that the life cycle of this stream has not ended, and only when both of them disappear will it be considered that the life cycle of the stream has ended. Similarly, if a stream has not been published but is only requested for the first time, start will also be triggered at this time and it is considered that the stream has been created, but there is no producer. There are few applications in this scenario, and only systems with high business requirements may use this message.

The above figure shows a configuration example, which mainly includes querying the IP of the server, querying which stages the server play operation wants to support, etc.

Clustered deployment relies on the business (scheduling) server. If there is a demand for back-to-source, let edge server B query the business server in the oclp_hold stage. At this time, the business server will tell edge server B a 302 address, which contains the source address. Edge server B will pull the stream from the marked one (media server A) to achieve dynamic back-to-source.

Dynamic retweets are mainly to push local streams out. In CDN services, different clusters are responsible for different functions. For example, some clusters are responsible for recording, and some are only responsible for transcoding. At this time, we hope that the core machine can transfer the streams that need to be transcoded or recorded to the corresponding clusters as required. Dynamic retweeting is very important. If the business includes these different types, you need to add the configuration oclp_push to achieve dynamic retweeting.

3.3 Authentication

In the authentication operation, we only authenticate publish or play.

If the feedback is 200 during play, it means that the playback is allowed. If the feedback is 403, it means that the playback is not allowed. The same is true for publish. The service server controls whether a customer's service request can be allowed.

How to bring the authentication token when the front end is playing or publishing?

Mainly through variables: args=k=v&pargs=$pargs

When sending a play query, if args=k=v&pargs=$pargs is added, these parameters will be included when sending the request, so that all custom parameters of rtmp can be passed.

4. Multi-process: back-to-source between processes

The multi-process problem has many bugs in the native nginx rtmp. The current practice is to record the stream list on each process through shared memory. If the process of play has no stream, query the stream list and pull the stream back to the target process through the unix socket. In addition, inter-process back-to-source does not trigger ocl_playoclp_publish oclp_pull messages.

5. More operating instructions

PingOS：https://github.com/im-pingo/pingos

Nginx-based media server technology