Rate control (1): understand the rate control mode (x264, x264, vpx)

Rate control (1): understand the rate control mode (x264, x264, vpx)

What is "rate control"? It is a tool for the encoder to determine how many bits are allocated for each frame of video.


The goal of video encoding (lossy) is to save as much bit (rate) as possible while maintaining video quality as much as possible. Rate control is an important tool to balance code rate and quality.

There are many ways to control bit rate, you will learn about "1-pass", "2-pass", "CBR", "VBR", "VBV Encoding" and "CRF" etc.

The following are simple examples of different rate control modes, which tell you when and which mode should be used as an end user. Note that the specific details of RDO are not included here.

Preface: Variable bit rate VS fixed bit rate

Many people may be more familiar with audio encoders, especially those who have experienced the MP3 era. However, from the development history of CD, the first use of constant bit rate (Constant Bitrate, CBR) encoding, and then the development of variable bit rate (Variable Bitrate, VBR). VBR can ensure that the highest quality is maintained with the fewest bits used under a given limit.

Simply put, VBR allows the encoder to spend more bits in places that are difficult to encode, and less bits in places that are simple to encode . What do "difficult to code" and "easy to code" mean for coding? Generally, videos with a lot of motion require more bits, and videos with rich spatial details and complex textures are also more difficult to encode.

What are the coding scenarios?

Which bit rate control mode to choose often depends on your application scenario. There are usually the following common scenarios:

  1. Archive: Compress a file and save it to the hard disk or network disk. At this time, you want the quality of the encoded file to be as good as possible and the bit rate as low as possible, but you don't care about the specific size of the compressed file.

  2. Streaming: You want to transfer a file over the network. This is to ensure that the file bit rate does not exceed the network bandwidth, or you need to provide files with different bit rates under different bandwidths. (For example, switch the video from high-definition to low-definition when watching a video online when the network is not good).

  3. Live streaming: similar to 2, but you need to encode as soon as possible (real-time), and you can't predict the video content in advance during live streaming.

  4. Device-oriented encoding: For example, you want to store a file on a DVD or Blu-ray disc, and you want to encode the file to a certain size (just take up the disc space).

Understanding the usage scenarios can help you choose the rate control mode.

Rate control mode

The following introduces different rate control modes, which are based on the x264 , x265 and libvpx encoders in ffmpeg . You can find detailed parameter descriptions in the ffmpeg documentation .

Note: The encoder does not "stuff" bits by default. This means that when encoding a simple frame, the actual used bits may be lower than the set bits, and then the encoder will not waste bits to force the set bits.

Fixed QP (Constant QP, CQP)

The quantization parameter (QP) controls the compression size. The larger the QP, the higher the compression rate and the lower the quality. The smaller the QP, the lower the compression rate and the higher the quality. In H.264 and H.265, the range of QP is an integer between 0-51. You can easily set fixed QP to encode in x264 and x265. Note: libvpx does not have a fixed QP mode.

ffmpeg -i <input> -c:v libx264 -qp 23 <output>
ffmpeg -i <input> -c:v libx265 -x265-params qp=23 <output>

You can refer to this tutorial to learn more about the principles of QP.

Unless you know exactly what you want to do, do n't use this mode . Using the CQP mode will cause the bit rate to fluctuate greatly depending on the complexity of the scene, and you cannot control the actual bit rate.

Benefits : video coding research.

Disadvantage : almost all other applications.

Average Bitrate (Average Bitrate, ABR)

The following is a target code rate for a given encoder, and the encoder calculates how to achieve this code rate:

ffmpeg -i <input> -c:v libx264 -b:v 1M <output>
ffmpeg -i <input> -c:v libx265 -b:v 1M <output>
ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M <output>

Avoid using this mode! One of the main developers of x264 said that you should never use it . why? Because the encoder does not know what is not yet encoded, it has to guess how to achieve a given bit rate. This means that the bit rate has to be constantly changing, especially at the beginning. For HAS-type streams, this can cause huge fluctuations in quality in a short period of time.

ABR is not a constant rate mode but a variable rate mode.

Benefit : Fast coding.

Disadvantage : almost all other applications.

Constant Bitrate (CBR)

By setting nal-hrd, the encoder can be forced to maintain a specific bit rate.

ffmpeg -i <input> -c:v libx264 -x264-params "nal-hrd=cbr:force-cfr=1" -b:v 1M -minrate 1M -maxrate 1M -bufsize 2M <output>

The output file must be an MPEG-2 TS file, because mp4 does not support NAL padding. Note that this mode will waste bandwidth for simple videos, but it ensures that the bit rate of the entire stream is consistent. You can find more use cases here . It makes sense to use this mode in some applications, but you may want a lower bit rate when possible.

The command to use CBR for VP9 is as follows:

ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M -maxrate 1M -minrate 1M <output>

Benefits : maintain a constant bit rate; video streaming (for example: Twitch ).

Disadvantages : document storage; scenarios where bandwidth is used efficiently.

2-Pass Average Bitrate (2-Pass ABR)

If the encoder is allowed to encode twice (or more) then it can estimate in advance what will not be encoded in the future. It can calculate the coding cost in the first pass, and then use the bits more efficiently in the second pass. This mode makes the output quality the best at a specific bit rate.

For x264:

ffmpeg -i <input> -c:v libx264 -b:v 1M -pass 1 -f null /dev/null
ffmpeg -i <input> -c:v libx264 -b:v 1M -pass 2 <output>.mp4

For x265:

ffmpeg -i <input> -c:v libx265 -b:v 1M -x265-params pass=1 -f null /dev/null
ffmpeg -i <input> -c:v libx265 -b:v 1M -x265-params pass=2 <output>.mp4

For VP9:

ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M -pass 1 -f null /dev/null
ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M -pass 2 <output>.webm

This is the easiest way to encode a stream. But there are two points to note: you don't know the quality of the final result, so you have to perform multiple experiments to ensure that the given bit rate is sufficient to encode complex content. Another point is that the bit rate of this mode may have a local peak, which means that the sending capacity may exceed the receiving capacity of the client. For the bit rate selection, you can refer to YouTube ’s recommended settings, but note that these are optimized for you to upload high-quality photos. In practice, you can choose a lower bit rate.

Benefits : reach a specific code rate; device-oriented coding.

Disadvantages : if you need fast encoding (for example, live streaming).

Constant Quality (CQ) / Constant Rate Factor (CRF)

CRF can keep the quality of the entire video stream constant.

ffmpeg -i <input> -c:v libx264 -crf 23 <output>
ffmpeg -i <input> -c:v libx265 -crf 28 <output>
ffmpeg -i <input> -c:v libvpx-vp9 -crf 30 -b:v 0 <output>

In H.264 and H.265, CRF is an integer between 0 and 51 (similar to QP). The default value of x264 is 23 and the default value of x265 is 28. An increase or decrease of 6 in CRF will result in a halving or doubling of the code rate. For VP9, ​​CRF ranges from 0 to 63, and the recommended value is 15-35.

The disadvantage of this mode is that the code rate and code rate fluctuations of the final file cannot be determined.

Benefits : document storage; to achieve the best possible quality.

Disadvantages : streaming media; requires a specific bitrate (or file size).

VBV(Video Buffering Verifier)

For VBV, it can be ensured that the code rate does not exceed a certain maximum value. This is very useful for streaming, you can now be sure that you will not send more bits than you promised. VBV can be used with 2-pass VBR (used in both passes) or CRF.

ffmpeg -i <input> -c:v libx264 -crf 23 -maxrate 1M -bufsize 2M <output>
ffmpeg -i <input> -c:v libx265 -crf 28 -x265-params vbv-maxrate=1000:vbv-bufsize=2000 <output>

VP9 has a similar mode, not called VBV, but the principle is the same:

ffmpeg -i <input> -c:v libvpx-vp9 -crf 30 -b:v 2M <output>

If you use VBV in a live stream and you want to speed up the encoding process, you can use the -tune zerolatency and -preset ultrafast options. This will sacrifice some quality to speed up encoding.

Use this mode in the restricted ABR-VBV:

ffmpeg -i <input> -c:v libx264 -b:v 1M -maxrate 1M -bufsize 2M -pass 1 -f null /dev/null
ffmpeg -i <input> -c:v libx264 -b:v 1M -maxrate 1M -bufsize 2M -pass 2 <output>

For x265:

ffmpeg -i <input> -c:v libx265 -b:v 1M -x265-params pass=1:vbv-maxrate=1000:vbv-bufsize=2000 -f null /dev/null
ffmpeg -i <input> -c:v libx265 -b:v 1M -x265-params pass=2:vbv-maxrate=1000:vbv-bufsize=2000 <output>

For VP9:

ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M -maxrate 1M -bufsize 2M -pass 1 -f null /dev/null
ffmpeg -i <input> -c:v libvpx-vp9 -b:v 1M -maxrate 1M -bufsize 2M -pass 2 <output>

How to set bufsize? It depends on the fluctuation of the bit rate you expect. A good setting method is to set bufsize to twice the maximum rate. If the client cache is relatively small, set bufsize equal to maxrate. If you want to limit the bit rate of the code stream, set bufsize to half of the maximum rate or less.

Benefits : Bandwidth-limited streaming media; live streaming (using CRF, 1-pass); VoD streaming.

Disadvantage : document storage.

Comparative Experiment

The following is a comparison of different rate control algorithms. Using Big Buck Bunny and Tears of Steel sequences, each sequence intercepts three segments (each segment is 30 seconds). Use libx264 encoder, except the rate control mode is different, the other are the default settings. Set different target code rates (750, 1500, 3000, 7500kbit/s) and maximum code rates (for VBV) and QP/CRF values ​​(17, 23, 29, 35).

Note that this experiment is not sufficient, you can try more sequences and use different encoders.

The figure below is the result of using different rate control modes. On the left is the result of 3000kbit/s, and on the right is the result of 7500kbit/s. The images of the other two results are similar and will not be shown here. Each line represents the bit rate change of the code stream in different modes.

It can be seen from BBB1 that ABR (blue-green line) and ABR+VBV (purple line) misestimated the video complexity at the beginning. In fact, the beginning of the BBB video is relatively smooth, and the motion is relatively small and only a few bits are needed to ensure quality. The 2-pass mode correctly estimated the complexity at the beginning, and initially used a low bit rate to save bandwidth. The rich spatial details in the last 1/3 of the video make the 2-pass mode consume a lot of bits, which exceeds the initial savings.

In the BBB2 video, the different modes are actually better than expected. But 2-pass has more fluctuations than other modes.

 

The following is the experimental situation of CQP and CRF. Only the results of CRF/CQP of 17 and 23 are shown here. CRF works better.

 

The following is the result of CRF+VBV at different bit rates. Choosing a suitable target bit rate and maximum bit rate for CRF usually requires multiple attempts, and it depends entirely on the video source.

 

Translated from Understanding Rate Control Modes (x264, x265, vpx)

If you are interested, please follow the WeChat public account Video Coding

 

Guess you like

Origin blog.csdn.net/Dillon2015/article/details/105825814