In-depth analysis of HLS protocol

Table of contents

  • What does HLS mean?
  • Learn how HLS works
  • Component Details
  • Features provided by HLS
  • Disadvantages of HLS

What does HLS mean?

HLS (HTTP Live Streaming) is a live streaming protocol that uses the currently widely used HTTP technology to provide real-time video + audio experience to a wide audience.

Originally developed by Apple in 2009, it has found widespread adoption in devices ranging from desktops and mobile phones to smart TVs. The iPhone drove its initial adoption and has significant market share in mobile, and its default support for HLS is an excellent booster.

The benefits of this article, free C++ audio and video learning materials package, technical video/code, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull stream, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

Learn how HLS works

Let's understand how HLS works at a high level. From a client (any web player) perspective, these are the steps it takes to understand and play media:

  • The client makes a GET request to the server to get the master playlist.
  • The client selects the best bitrate from the list based on the user's network conditions.
  • Next, it makes a GET request to get the playlist corresponding to that bitrate.
  • It understands playlists, starts fetching different pieces of media, and starts playing them.
  • It loops through the last two steps.

Here's a rough plan of how clients will use HLS. Now let's look at the specific content of these playlists/segments. To generate this data, I'm going to use the media-related swiss army knife: ffmpeg.

Let's create a sample video with a simple command:

ffmpeg -f lavfi -i testsrc -t 30 -pix_fmt yuv420p testsrc.mp4

This creates a sample video with a duration of 30 seconds.

Now we create an HLS playlist with the following command:

ffmpeg -i testsrc.mp4 -c:v copy -c:a copy -f hls -hls_segment_filename data%02d.ts index.m3u8

This will create a playlist in the same directory with the files:

  • index.m3u8
  • data00.ts
  • data01.ts
  • data02.ts

Don't know what they are? Let's take a look at what this command does and what these files mean.

We start by breaking down the command.

We use -i to tell ffmpeg to take MP4 video as input, note that this can also accept RTMP input directly; just say something like this:

ffmpeg -listen 1 -i rtmp://127.0.0.1:1938/live -c:v copy -c:a copy -f hls -hls_segment_filename data%02d.ts index.m3u8

This is another exciting aspect that we will discuss further in this article.

Next, understand -c:v copy and -c:a copy. This is a way of copying encoded etc. audio/video data from source input to output. You can also encode in a different format here. For example, if you want to use multiple qualities in an HLS playlist.

-f tells ffmpeg what our output format is, in this case HLS.

-hls_segment_filename tells ffmpeg what format we want segment filenames to be in.

The last parameter is the name of our main playlist.

Now, start processing the file:

Open the main playlist in the editor as follows:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.000000,
data00.ts
#EXTINF:10.000000,
data01.ts
#EXTINF:10.000000,
data02.ts
#EXT-X-ENDLIST

Let's understand these tags :

  • EXTM3U: M3U (MP3 URL or full name Moving Picture Experts Group Audio Layer 3 Uniform Resource Locator) is the multimedia playlist format name, this line indicates that we are using the extended M3U format, according to the HLS specification: "It must be the first line Every media playlist and every master playlist."
  • EXT-X-VERSION: This tag indicates a compatible version of our playlist file, the playlist format has been iterated more than once and there are old clients that don't understand the new version, so this must be sent for any server-generated playlist format version label is greater than 1. Its occurrence multiple times will cause the client to reject the entire playlist.
  • These are specified by the Basic Tags specification.
  • We will now move on to what are called Media Playlist Tags. The first of these is EXT-X-TARGETDURATION. This is a required flag, again telling the client what the longest segment length it can expect is. Its value is in seconds.
  • EXT-X-MEDIA-SEQUENCE: This tag refers to the number of the first media segment. It is unnecessary and by default it is treated as 0 if absent. Its only limitation is that it should appear before the first media fragment.
  • EXTINF: This tag is located in Media Segment Tags and indicates the duration and metadata of each media segment. According to the specification, its format is: <duration>, [<title>]. When using any format above version 3, you need to specify the duration as a floating point number. The header is mostly metadata for humans and is optional, ffmpeg seems to skip it by default.
  • Each line following EXTINF specifies where to find the media segment. In our case they are in the same folder, so they are listed directly, but they could also be in the subfolder 480p, for example, which means they have the name: 480p/data01.ts. Note that this data01 is also a format we gave in the command above and it can be easily changed.
  • EXT-X-ENDLIST: I guess what's the hardest tag to understand? There is absolutely no way one can tell what it does by looking at it. Jokes aside, one exciting property besides indicating that the list is over is that it can appear anywhere in the playlist file. If building these files manually, you may need to be careful to put them in the correct location.

Astute readers may have noticed that there are not many bitrates listed here. Well, I want to start by explaining the basic structure with a simple example.

To understand ABR, let's use an already hosted URL, for example: https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel.ism/. m3u8

When downloading a playlist, it looks like this:

#EXTM3U
#EXT-X-VERSION:1
## Created with Unified Streaming Platform (version=1.11.20-26889)

# variants
#EXT-X-STREAM-INF:BANDWIDTH=493000,CODECS="mp4a.40.2,avc1.66.30",RESOLUTION=224x100,FRAME-RATE=24
tears-of-steel-audio_eng=64008-video_eng=401000.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=932000,CODECS="mp4a.40.2,avc1.66.30",RESOLUTION=448x200,FRAME-RATE=24
tears-of-steel-audio_eng=128002-video_eng=751000.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1197000,CODECS="mp4a.40.2,avc1.77.31",RESOLUTION=784x350,FRAME-RATE=24
tears-of-steel-audio_eng=128002-video_eng=1001000.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1727000,CODECS="mp4a.40.2,avc1.100.40",RESOLUTION=1680x750,FRAME-RATE=24,VIDEO-RANGE=SDR
tears-of-steel-audio_eng=128002-video_eng=1501000.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2468000,CODECS="mp4a.40.2,avc1.100.40",RESOLUTION=1680x750,FRAME-RATE=24,VIDEO-RANGE=SDR
tears-of-steel-audio_eng=128002-video_eng=2200000.m3u8

# variants
#EXT-X-STREAM-INF:BANDWIDTH=68000,CODECS="mp4a.40.2"
tears-of-steel-audio_eng=64008.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=136000,CODECS="mp4a.40.2"
tears-of-steel-audio_eng=128002.m3u8

Here we see a new tag EXT-X-STREAM-INF. This tab provides more information about all variants available for a given stream. This tag is followed by any number of attributes.

The benefits of this article, free C++ audio and video learning materials package, technical video/code, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull stream, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

  • BANDWIDTH: This is a mandatory property whose value tells us how many bits per second the variant will send.
  • CODECS: This is another mandatory attribute that specifies the codecs for video and audio respectively.
  • RESOLUTION: This is an optional attribute that mentions the pixel resolution of the variant.
  • VIDEO-RANGE: We can ignore this tag as it is not specified in the spec, although at the time of writing there is an open issue on hls.js to add support for it.
  • SUBTITLES: It is an optional attribute, specified as a quoted string. GROUP-ID It must match the attribute value on the EXT-X-MEDIA tag. This tag should also have SUBTITLES on its TYPE attribute.
  • CLOSED-CAPTIONS: It is an optional attribute, same as most SUBTITLES. It differs in two points: Its attribute value can be a quoted string or NONE. EXT-X-MEDIA The tag value for closed captions should be CLOSED-CAPTIONS.

The lines following these stream-inf tags specify where to find the corresponding variant streams, which is a playlist similar to what we saw before.

RTMP input

Earlier, we saw an RTMP input to the ffmpeg command. It helps us understand another property of live HLS playlists. Revisiting the command, it looks like this:

ffmpeg -listen 1 -i rtmp://127.0.0.1:1938/live -c:v copy -c:a copy -f hls -hls_segment_filename data%02d.ts 
index.m3u8

The command should give output similar to the one below, which seems to be stuck.

Now, let us fulfill your dream of becoming a streamer! We could use any RTMP source, even ffmpeg, but for our demo we'll use OBS to send data to our ffmpeg server listening for RTMP input. I created a simple screen share capture input in it, then navigate to the streaming section in settings, where you'll see a screen like this:

Fill in the service as custom and add the server rtmp://12.0.0.1:1938/live with an empty stream key. Now click Apply and press OK.

We'll log into the default view of OBS:

When we click Start Streaming, we stream to a local RTMP-to-HLS server. You will notice the folder where you run the ffmpeg command. with new filenames:

  • index.m3u8
  • data files with template data{0-9}{0-9}.ts

Let's check index.m3u8, the result I captured is:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:4
#EXT-X-MEDIA-SEQUENCE:3
#EXTINF:4.167000,
data03.ts
#EXTINF:4.167000,
data04.ts
#EXTINF:4.166000,
data05.ts
#EXTINF:4.167000,
data06.ts
#EXTINF:4.167000,
data07.ts

Notice that there is no closing playlist tag? This is because this is a live playlist that is constantly changing over time with new episodes. For example, after a while, my local playlist looks like this:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:4
#EXT-X-MEDIA-SEQUENCE:9
#EXTINF:4.167000,
data09.ts
#EXTINF:4.167000,
data10.ts
#EXTINF:4.166000,
data11.ts
#EXTINF:4.167000,
data12.ts
#EXTINF:0.640667,
data13.ts
#EXT-X-ENDLIST

Old fragments will be replaced by new fragments. This helps to provide the most up-to-date data when an event is live.

This covers playlists to an extent. Now let's talk about fragment files.

container format

Media segments use a container format to store the actual encoded video/audio data. A detailed discussion of container formats is beyond the scope of this article.

HLS originally only supported MPEG-2 TS containers. This decision differs from other container formats based on the HTTP protocol. For example, DASH has always used fMP4 (Fragmented MP4). Although in the end Apple announced support for fMP4 in HLS, it is now officially supported by the specification.

media segment size

The media segment size is undefined. Typically, the size of the media segment is around 10 seconds, but another part of the specification dictates the size of the media segment. According to the spec, at least three fragments must be fetched and ready to start playing. This means that for a 10 second clip, a 30 second buffer needs to be loaded before the client starts playing. We can change it at any time as needed, and this is also a common tuning parameter we often use when reducing latency.

Component Details

  • Ingest: Ingest (relatively less relevant) marks the point at which our system gets input from the user, this can be done in a variety of formats over different protocols, but platforms like Twitch and Youtube have popularized RTMP. However, this seems to be changing with the introduction of WebRTC in OBS, making it possible to use WHIP to fetch content faster with lower latency.
  • Processing: Media processing includes transcoding, demultiplexing, etc., depending on the ingest format used. Input here will be converted to MPEG-TS or fMP4 segments indexed in a playlist and ready to use.
  • Delivery: Delivery is an important aspect of ensuring the scalability of HLS. Web servers host playlists and clips, which are then delivered to end users via a CDN. The CDN caches these GET requests and scales itself to return static data, making the protocol highly scalable when requested under higher traffic conditions.
  • Playback: Smooth playback is critical to a good user experience. ABR must be sufficiently implemented on the player side so that users experience minimal experience degradation during streaming.

Features provided by HLS

  • Distribution via a common format, HTTP, makes it easily accessible anywhere Using HTTP, HLS can take advantage of the underlying TCP implementation, enabling retransmission handling out-of-the-box. Much of the internet relies on HTTP so it can be delivered automatically across all consumer devices. This becomes challenging for UDP-based protocols such as RTP, which sometimes require things like NAT traversal to work properly.
  • Easy CDN-level caching Scaling HLS is made easy as caching HTTP is widely supported and offloading it to a good CDN minimizes the load on origin servers while ensuring the same audio is delivered to all consumers+ video data.
  • Adaptive Bitrate Streaming (ABR) HLS supports ABR, so users with poor Internet connections can quickly switch to a lower bitrate and continue to enjoy streaming.
  • Built-in ad insertion support for HLS provides easy-to-use tags to dynamically insert ads into the stream.
  • Closed Captioning It also supports embedded closed captioning and even DRM content.
  • Using fragmented MP4 with fMP4 helps reduce encoding and delivery costs and improves compatibility with MPEG-DASH.

Disadvantages of HLS

  • HLS is known for its extremely high latency. One of the easy ways to make it slightly better is to fine-tune the segment size and then change the ingestion protocol to something with lower latency, such as RTMP. But for a clear live experience, plain HLS is not enough. The computing community offers community LL-HLS and Apple LL-HLS as solutions. Both have their own ways of solving the problem, and promise some decent latencies.
  • The HLS specification specifies that a client needs to load at least three segments before starting. Now, suppose you set the segment size to 10 seconds. You've seen that the stream has a 30 second delay from the start. This also makes adjusting the segment size very important.

final thoughts

Having open standards aids adoption and growth while making interoperability between open and closed software/hardware stacks easier. HLS plays the same role in live streaming—it's widely adopted, simple, and powerful. Rather than trying to reinvent the wheel from scratch, it takes a current widely accepted protocol and builds on top of it.

This article is reprinted, and the copyright belongs to the original author. By Fenil Jain. If you need to reprint, please indicate the source: https://www.nxrte.com/jishu/yinshipin/28911.html

The benefits of this article, free C++ audio and video learning materials package, technical video/code, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull stream, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/131795217