pit! The page short video loading is stuck and slow? Ali P8 boss teaches you two ways to open short videos in seconds!

Author: Busy fish technology - cloud o

Preface

With the rise of short videos, short videos can be seen everywhere in major APPs, such as feeds streams, detail pages, and so on. How to let users have a good video viewing experience becomes more and more important. When swiping to watch videos in most feeds, there is an obvious sense of waiting and the experience is not very good. In response to this problem, we launched a wave of optimization, the goal is: video playback seconds, video playback experience is good. No picture, no truth. In the previous comparison picture, the left is before optimization, and the right is after optimization:

problem analysis

Video format selection

Before formally analyzing the problem, it is necessary to explain: the videos on our homepage are all 320p H.264 encoded mp4 videos.

  • H.264 & H.265
    H.264, also known as MPEG-4AVC (Advanced Video Codec), is a video compression standard, as well as a widely used high-precision video recording, compression and distribution format. H.264 is famous because it is a codec standard for Blu-ray discs. All Blu-ray players must be able to decode H.264. Compared with the previous coding standards, H.264 has some new features, such as motion compensation for multiple reference frames, variable block size motion compensation, intra-frame prediction coding, etc. By using these new features, H.264 has more features than other coding standards. High video quality and lower bit rate.
    The coding architecture of H.265/HEVC is roughly similar to that of H.264/AVC, and mainly includes: intra prediction and inter prediction , Transformation (transform), quantization (quantization), deblocking filter (deblocking filter), entropy coding (entropy coding) and other modules. However, in the HEVC coding architecture, the whole is divided into three basic units, namely: coding unit (CU), prediction unit (PU) and transformation unit (TU).
    In general, H.265 has higher compression efficiency, lower transmission bit rate, and better video quality. It seems that using H.265 is a wise choice, but we choose H.264 here. The reason is: H.264 supports a wider range of models.
    PS: Xianyu H.265 video will be launched on the baby details page in the near future, so stay tuned for the experience!

  • TS & FLV & MP4

TS is an encapsulation format taken by Japanese high-definition cameras, and the full name is MPEG2-TS. TS stands for "Transport Stream". The characteristic of MPEG2-TS format is that it requires that any segment of the video stream can be independently decoded. The following command can convert mp4 to ts format. From the results, the ts file (4.3MB) is about 10% larger than the mp4 file (3.9MB).

json
ffmpeg -i input.mp4 -c copy output.ts

FLV is the abbreviation of FLASH VIDEO, FLV streaming media format is a video format developed with the introduction of Flash MX. Due to its extremely small file size and extremely fast loading speed, it is possible to watch video files on the Internet. Its emergence effectively solves the problem that after the video file is imported into Flash, the exported SWF file is bulky and cannot be used well on the Internet. And other issues. FLV only supports one audio stream and one video stream, and cannot contain multiple audio streams in one file. Audio sampling rate does not support 48k, and video encoding does not support H.265. Under the same encoding format, the file size is almost the same as mp4.

json ffmpeg -i input.mp4 -c copy output.flv

MP4 is a well-known video packaging format. MP4 or MPEG-4 Part 14 is a standard digital multimedia container format. The extension of MPEG-4 Part 14 is .mp4, which mainly stores digital audio and digital video, but can also store subtitles and still images. Because it can accommodate video streams that support bit streams, MP4 can be streamed during network transmission. Its compatibility is very good, almost all mobile devices are supported, and it can also be played in browsers and desktop systems. Based on the characteristics of the above several packaging formats, our final choice is MP4.

Play process

What is the playback process of a video on the client? What is the slow start of the playback? Can the time-consuming point be solved quickly and at low cost? Understanding the video playback process helps to find the breakthrough of the problem. The video can be divided into three stages from loading to playing:

  • Read (IO): "Get" content -> Get it from "local" or "server"
  • Parser: "Understand" the content -> Refer to "Format & Protocol" to "understand" the content
  • Render: "show" content -> "show" content through speakers/screen

It can be seen that the content acquisition is changed from "server" to "local", which will save a lot of time, and the cost is very low, which is a good entry point. The fact is the same, our optimization is carried out around this point.

PS: The network libraries and players we use are all internal to the group, and many optimizations have been made. This article does not involve network protocol, player optimization discussion.

Technical solutions

In view of the above analysis, what we have to do is to cache a part of the mp4 file in advance, and play the local mp4 file when the feeds slide to play. Since the user may continue to watch the video, after the local data is played, the data needs to be downloaded from the network for playback. Two problems need to be solved here:

  • How much data should be downloaded in advance
  • How to switch to network data after the buffered data is played

Location of MOOV BOX

For the first question, we have to analyze the file structure of mp4 to see how much data we should download. MP4 is composed of many boxes. Boxes can be nested inside the boxes:

The format information of MP4 is not described in detail here. However, it can be seen that the moov box is very important for playback. It provides information such as width and height, duration, bit rate, encoding format, frame list, key frame list, and so on. The player cannot play without getting the moov box. So the downloaded data should contain moov box plus dozens of frames of data.

A simple calculation is made: Xianyu short video is generally up to 30s long, the resolution in the feeds is 320p, the bit rate is 1141kb/s, and the data volume of the ftyp+moov video is about 31kb (open the file and you can see mdat It starts from the position of 31754byte), so the header information + 10 frames of data is approximately: (31kb + 1141kb/3)/8 = 51KB

Proxy

The second question: How to switch to network data after the buffered data is played? After the local data playback is completed, set a network address to the player, tell the player what the downloaded offset is, and then continue to download the data from the network to play. This seems feasible, but the player needs to provide support: callback for completion of local data playback; set network url and support offset. In addition, the server needs to support the range parameter, and a new network connection needs to be established when switching to network playback, which may cause a freeze.

In the end, we chose the proxy method, using proxy as an intermediary, responsible for pre-loading data, providing data to the player, and switching logic to be completed in proxy. The process before adding proxy is like this:

After adding proxy, the process is like this:

The benefits of this are obvious, we can do many things in the proxy: for example, local file cache data and network data switching work. Even use other protocols to communicate with CDN. We assume here that the preloading work has been completed, and see how the player interacts with the proxy. When playing, it will use a localhost url provided by the Proxy to play, so that the proxy server will receive the network request and return the locally preloaded data to the player. The player is completely unaware of the existence of proxy modules and pre-loaded modules. The player and the pre-loaded module are both Proxy clients, and the calling logic is the same. The illustration is as follows:

The following explains step by step the data loading process:

  • Client initiates an http request to obtain data, as shown by arrow 1
  • If the requested data exists in the file cache, the data will be returned directly, as shown by arrow 2
  • If the local file cache data is not enough, initiate a network request to request data from the CDN, as shown by arrow 3
  • Get network data and write to file cache, as shown by arrow 4
  • Return the requested data to the Client, as shown by arrow 2

Implementation module

Preloaded module

After determining the technical solution, there is still a lot of work to be done to preload the module. After the list network data analysis is completed, the video preload will be triggered. First, the md5 value will be generated according to the url, and then check whether the task corresponding to the md5 value exists. If it does, it will not be submitted repeatedly. After the task is generated, it will be submitted to the thread pool and processed in a background thread. When the network is switched from Wifi to 3G, the task will be cancelled to prevent the user's data traffic from being consumed.

When the preload task is executed in the thread pool, the process is as follows: First, the URL of a local agent will be obtained. Then initiate an http request. Proxy will receive the http request for processing, and start to do the real data preloading work. The preload module terminates after reading the specified amount of data. At this point, the pre-loading task has been completed. The flow chart is as follows:

When the user swipes quickly, how can we ensure that the video can continue to open in seconds? The pre-loading module maintains a state machine for each task. During Fling, it will pause the tasks that have been crossed, and increase the priority of the latest task to be displayed so that it can be executed first.

Proxy module

There is a local httpServer inside the Proxy that is responsible for intercepting http requests from the player and pre-loaded modules. The client will bring in the URL of the CDN when requesting, and will go to the CDN to get fresh data when there is no local cache data. Because there are multiple places requesting data from the Proxy, it is necessary to use the thread pool to handle the connection of multiple clients, so that multiple clients can be parallel, and will not be blocked because of previous client requests. File caching uses LruDiskCache. After the specified file size is exceeded, the old cached files will be deleted. This is a problem that is easy to overlook when using file caching. Since our scene video is played continuously and there is no seek situation, file caching is relatively simple, and there is no need to consider file segmentation. Inside the Proxy, the same url will be mapped to a client. If preloading and playing are performed at the same time, there will only be one copy of the data, and the data will not be downloaded repeatedly. Here is a schematic diagram of the internal structure of Proxy:

Problems encountered

In the test, it is found that some videos still play very slowly. Carefully check that the expected data size is cached locally, but there is still a long waiting time when playing. This kind of video has a feature: the moov box is at the end. For the video at the end of moov, it is played after the entire file is downloaded. The reason is that there are a lot of key information stored in the moov box, which was mentioned in the previous analysis of mp4 format. There are two solutions to this problem:

  • Solution one:

The server ensures that the head of the moov is in front when transcoding, and the video server that finds the incorrect position of the moov makes corrections.

PS: To view the position of the moov in the file, you can open it with a hex text editor, and search for the position of the moov by character. You can also use MediaParser on the MAC , and you can also use the ffmpeg command to generate an mp4 file with moov at the head or tail .

E.g:

Copy a file from 1.mp4, make the moov head at the end

json
ffmpeg -i 1.mp4 -c copy -f mp4 output.mp4 

Copy a file from 1.mp4 so that its moov head is at the head:

json
ffmpeg -i 1.mp4 -c copy -f mp4 -movflags faststart output2.mp4
  • Solution two

There is no need to modify the position of the moov box, but to process it on the playback side. The playback side needs to detect the stream information. If there is no previous moov, it will request the tail information of the file. Specifically: initiate an HTTP MP4 request and read the beginning of the response body. If moov is found at the beginning, continue reading mdat. If you find that there is no beginning, read the mdat first, and immediately RESET the connection, and then read the end of the file data through the Range header, because the previous HTTP request has already obtained the Content-Length and knows the entire size of the MP4 file, read it through the Range header Part of the file tail data is also possible. The schematic is as follows

The disadvantage of this scheme is that there will be two more http connections for the video at the end of the moov box.

to sum up

This article introduces common video encoding formats, video packaging formats, and the influence of moov header information on video playback. With the analysis of the playback process, we found the starting point of the problem. Simply put, it revolves around data preloading, completing the work of requesting data from the network ahead of time, and reading it directly from the cache during playback, and subsequent video reviews are all read from the cache, which not only solves the problem of slow initial video playback , It also solves the playback cache problem, which can be said to kill two birds with one stone. Proxy is the core idea of ​​this solution. The url of the local localhost is a key link. The video preloading module and the player module are completely decoupled, and it can still be used after changing the player. So far, the optimization of video feeds in seconds has been completed. According to the data after going online, the video opening speed is about 800ms.

Looking back, perhaps we can go a step further and verify the data received by the preload to ensure that accurate information is cached instead of a fixed value. More in-depth optimization can also be carried out to make the user experience of watching videos smoother.

Android audio and video development from entry to proficiency

The following audio and video learning route and school notes PDF e-book version can be downloaded for free on my GitHub , remember to give a Star~

Quick start channel: (click here) free download of group files.

1. Elementary Introduction:

A drawing picture
1. ImageView drawing picture
2. SurfaceView drawing picture
3. Custom View drawing picture

Second, AudioRecord API detailed explanation

3. Use AudioRecord to achieve recording and generate wav

  • Create an AudioRecord object
  • Initialize a buffer
  • start recording
  • Create a data stream. While reading the sound data from AudioRecord to the initialized buffer, import the data in the buffer into the data stream.
  • Close data stream
  • Stop recording

4. Use AudioTrack to play PCM audio
1. Basic use of
AudioTrack 2. Detailed
AudioTrack 3. Comparison of AudioTrack and MediaPlayer

5. Use Camera API to collect video data
1. Preview Camera data
2. Get the data callback of NV21

6. Use MediaExtractor and MediaMuxer API to parse and encapsulate mp4 files
1. Introduction to
MediaExtractor API
2. Introduction to MediaMuxer API 3. Use context

7. MediaCodec API detailed explanation
1. MediaCodec introduction
2. MediaCodec API description 3. MediaCodec
flow control

Due to the limited length of the article, the remaining content is too much, and the illustrations in the article are limited, the following can only be shown in the screenshot directory:

2. Intermediate advanced articles:

  • Android OpenGL ES Development (1): Introduction to OpenGL ES
  • Android OpenGL ES development (two): OpenGL ES environment construction
  • Android OpenGL ES Development (3): OpenGL ES defines the shape
  • Android OpenGL ES Development (4): OpenGL ES drawing shapes
  • Android OpenGL ES development (5): OpenGL ES uses projection and camera view
  • Android OpenGL ES Development (6): OpenGL ES adds motion effects
  • Android OpenGL ES development (7): OpenGL ES responds to touch events
  • Android OpenGL ES development (8): OpenGL ES shader language GLSL
  • Android OpenGL ES development (9): OpenGL ES texture mapping
  • Android OpenGL ES development (ten): interact with shaders through GLES20
  • Use OpenGL to display a picture
  • GLSurfaceviw draws Camera preview screen and realizes taking pictures
  • Use OpenGL ES to complete the video recording and realize the video watermark effect

Advanced inquiry:

  • In-depth study of audio and video coding, such as H.264, AAC, study the use of open source codec libraries, such as x.264, JM, etc.
  • In-depth study of audio and video-related network protocols, such as rtmp, hls, and packet formats, such as flv, mp4
  • In-depth study of some open source projects in the audio and video field, such as webrtc, ffmpeg, ijkplayer, librtmp, etc.
  • Port the ffmpeg library to the Android platform, combine the experience accumulated above, write a simple audio and video player
  • Port the x264 library to the Android platform and combine the accumulated experience above to complete the H264 soft editing function of video data
  • Port the librtmp library to the Android platform and combine the accumulated experience above to complete the Android RTMP streaming function

Audio and video codec technology

  • Audio and video codec technology (1): MPEG-4/H.264 AVC codec standard
  • Audio and video coding and decoding technology (two): AAC audio coding technology

Streaming protocol

  • Streaming media protocol (1): HLS protocol
  • Streaming media protocol (two): RTMP protocol

Multimedia file format

  • Multimedia file format (1): MP4 format
  • Multimedia file format (2): FLV format
  • Multimedia file format (3): M3U8 format
  • Multimedia file format (4): TS format
  • Multimedia file format (5): PCM / WAV format

FFmpeg learning record

  • FFmpeg command line tool learning (1): View the media file header information tool ffprobe
  • FFmpeg command-line tool learning (2): ffplay, a tool for playing media files
  • FFmpeg command line tool learning (3): media file conversion tool ffmpeg
  • FFmpeg command line tool learning (4): FFmpeg acquisition equipment
  • FFmpeg command line tool learning (5): FFmpeg adjusts the audio and video playback speed

  • FFmpeg learning (1): Introduction to FFmpeg
  • FFmpeg learning (2): install FFmpeg under Mac
  • FFmpeg learning (3): port FFmpeg to the Android platform
  • FFmpeg learning (4): FFmpeg API introduction and general API analysis
  • FFmpeg learning (5): FFmpeg codec API analysis
  • FFmpeg learning (6): FFmpeg core module libavformat and libavcodec analysis

  • FFmpeg structure learning (1): AVFormatContext analysis
  • FFmpeg structure learning (two): AVStream analysis
  • FFmpeg structure learning (3): AVPacket analysis
  • FFmpeg structure learning (four): AVFrame analysis
  • FFmpeg structure learning (5): AVCodec analysis
  • FFmpeg structure learning (6): AVCodecContext analysis
  • FFmpeg structure learning (7): AVIOContext analysis
  • FFmpeg structure learning (eight): the relationship between important structures in FFMPEG

The PDF e-book version of the above audio and video study notes can be downloaded for free on my GitHub , remember to give a Star~

Quick start channel: (click here) free download of group files.

End of sentence

Welcome to follow my short book, share Android dry goods, and exchange Android technology.
If you have any insights on the article, or any technical questions, you can leave a message in the comment area to discuss~

Just like it before leaving~

Guess you like

Origin blog.csdn.net/Androiddddd/article/details/112572219