Introduction to ffmpeg - basic concepts of audio and video

Regarding audio and video, I believe that everyone has seen movies (videos) and listened to music (audio), at least they should know that mp4 is a video file and mp3 is an audio file.

What attributes does an audio and video file have? Taking video as an example, we can view the information of the media file through the ffmpeg -i command.

» ffmpeg -i r1ori.mp4
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with Apple LLVM version 10.0.0 (clang-1000.10.44.4)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/jdk1.8.0_251.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.8.0_251.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gpl --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-chromaprint --enable-frei0r --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libgme --enable-libgsm --enable-libmodplug --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-librsvg --enable-librtmp --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtesseract --enable-libtwolame --enable-libvidstab --enable-libwavpack --enable-libwebp --enable-libzmq --enable-opencl --enable-openssl --enable-videotoolbox --enable-libopenjpeg --disable-decoder=jpeg2000 --extra-cflags=-I/usr/local/Cellar/openjpeg/2.3.0/include/openjpeg-2.3 --enable-nonfree
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:58.53, start: 0.000000, bitrate: 1870 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 544x960, 1732 kb/s, 29.83 fps, 29.83 tbr, 11456 tbn, 59.67 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

In addition to the meta information of the video, it also includes more configurations that we compiled at the beginning. You can choose the -hide_banner parameter to hide this information. The complete command is as follows

»ffmpeg -i r1ori.mp4 -hide_banner
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:58.53, start: 0.000000, bitrate: 1870 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 544x960, 1732 kb/s, 29.83 fps, 29.83 tbr, 11456 tbn, 59.67 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
At least one output file must be specified

We mainly look at a few data

  1. Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori.mp4': # Input #0 means the first file we input through the ffmpeg -i parameter, the subscript starts from 0, that is to say We can input multiple files, in fact ffmpeg also supports outputting multiple files
  2. Metadata represents video metadata
  3. The Duration line contains the video playback duration is 58.53 seconds, the start playback time is 0, and the bitrate of the entire file is 1870kbit/s
  4. Stream #0:0(und): Video: h264, this line indicates that the first stream of the file is a video stream, the encoding format is H264 format (the encapsulation format is AVC1), the data of each frame is represented as yuv420p, the resolution is is 544x960, the bit rate of the video stream is 1732kbit/s, and the frame rate is 29.83 frames per second.
  5. Stream #0:1(und): Audio: aac, this line indicates that the second stream of the file is an audio stream, the encoding format is ACC (the encapsulation format is MP4A), and the profile adopted is the LC specification, and the sampling rate is 44.1 KHz, the channel is stereo, the bit rate is 129kbit/s

Some unfamiliar nouns began to appear, we will introduce them in turn.

container

Like the video file above, different data streams (video streams, audio streams, and some subtitle streams, etc.) are encapsulated in a file, which we call a container. Like we are familiar with mp4, avi, rmvb, etc. are multimedia container formats, under normal circumstances, the suffix of a multimedia file is its container format.

We can understand a container as a bottle, a jar, or something like that.

 Audio and video free learning package , the article is finally received.

Encoding and decoding (codec)

Encoding: The video and audio are recorded and stored in a certain format or specification, which is called codec. Coding can be understood as the processing of the contents of the container.

Common video encoding formats are h264, h265, etc., and common audio encoding formats are mp3, aac, etc.

Decoding: It is to decode the encoded data of video and audio into uncompressed video and audio raw data. For example, if we want to add echo to a piece of audio, we need to first decode and then encode the audio file.

Soft solution: software decoding, which allows the CPU to decode the video file through software.

Hard solution: that is, hardware decoding. In order to reduce the pressure on the CPU, the GPU is used to process part of the video data that was originally processed by the CPU.

The soft solution needs to process a large amount of video information, so the soft solution is very CPU-intensive, and a FFmpeg command may kill the CPU.

In comparison, the efficiency of the hard solution is very high, but the shortcomings of the hard solution are also obvious. It cannot handle subtitles, picture quality, etc. as well as the soft solution. If I remember correctly, Qiniu cloud platform (a relatively professional audio and video platform) does not support hard solution yet.

ffmpeg is the most common soft decoding open source library. It actually performs soft decoding through codec algorithms such as H264, H265, and MPEG-4.

In today's audio and video field, ffmpeg supports almost all audio and video codecs and is very powerful.

Transcoding: That is, encoding conversion, which is the conversion of video from one format to another. For example, convert a flv file to an mp4 file.

ffmpeg -i input.flv output.mp4

bit rate

Bit rate, also known as bit rate, indicates the number of bytes output by the encoder per second, the unit is Kbps, b is the bit (bit) This is the measurement unit of the computer file size, 1KB=8Kb, case-sensitive, s is the second (second) ) p for each (per).

for example

Under the same compression algorithm (we will introduce several different compression algorithms later), the higher the bit rate, the higher the video quality.

For compressed files, according to the above understanding, the rough calculation method of the code rate = file size / duration.

For example, the size of r1ori.mp4 is 13.7 MB and the duration is about 59 seconds, then its bit rate is approximately equal to (13.7 x 1024 x 8) / 59 = 1900 kb/s

Formula: 1MB=8Mb=1024KB=8192Kb

Because there are still some parameters, we can only get an approximate value for this code rate.

Fixed rate and variable rate

In the early years, a fixed bit rate (Constant Bitrate, CBR) was selected for audio encoding, followed by a variable bit rate (Variable Bitrate, VBR), which refers to the bit rate output by the encoder. Fixed, so it is difficult to balance "calm picture" and "dramatic picture". Relatively speaking, variable bit rate can control the encoder very well. When there are more details and the picture is relatively violent, use more Bits, for relatively calm pictures, use lower bits. In this way, under the condition of certain output quality, VBR has more advantages, and we will also give priority to variable bit rate for storage.

frame and frame rate

A frame refers to a picture.

Frame rate (frames per second, fps), that is, how many frames are output per second, you can also understand how many times the picture is output per second.

Everyone must have a deep experience when playing the game. When the game is stuck, the picture is jumping from frame to frame, which is very unsmooth.

The frame rate affects the smoothness of the picture. The higher the frame rate, the smoother the picture.

Due to the phenomenon of persistence of vision (that is, when the object is moving rapidly, after the image seen by the human eye disappears, the human eye can continue to retain the image of about 1/24 second of the image), so for general movie video, The minimum frame rate required is 24, which is 1/24 = 0.042 seconds of exposure per frame.

Resolution

Resolutions should be familiar to everyone, such as Blu-ray 1080P, ultra-clear 720P, and high-definition 540P commonly seen on a video website.

Resolution can be understood as the size of the video screen, that is, the width and height of the video. 720P means the height is 720 pixels.

After understanding the bit rate and frame rate, we found that it cannot be absolutely said that the higher the resolution, the clearer the video, and the more important thing is how to balance the relationship between the bit rate, frame rate and resolution.

In general, we are more willing to accept videos with smaller video volume and higher definition, one is convenient for storage, and the other is that they look cool.

Lossy and Lossless

First of all, let's talk about what is the raw data of audio and video? Raw data refers to the data collected by audio and video equipment without any processing. The raw data of audio is in pcm format, and the raw data of video is in yuv format.

Lossy and lossless, that is, with or without loss, here is a term for multimedia data compression. Lossy compression is also called destructive compression, of course not the kind of damage that cannot be decompressed after compression. For example, our common mp3 and mp4 files are lossy compression.

Taking audio coding as an example, the sound in the audio comes from nature, we capture the sound through technical solutions, and then store it according to a certain algorithm.

At this stage, the sound we have stored cannot be completely restored to the sound of nature, and any audio encoding is lossy.

Some students may have to ask questions. I see an article introduction. Isn't the original audio data in pcm format?

In fact, pcm encoding is only infinitely close to lossless, and it can achieve the highest fidelity of the signal. Therefore, pcm encoding is agreed to be lossless compression.

Good audio, I want to hear the most authentic sound from nature, why compress it?

Raw data is too large to store

Even if it is stored, it is inconvenient to transmit and requires huge bandwidth.

Now the compression ratio of video is very high, such as 4k 8k that everyone is familiar with now, it seems to fully meet the needs

Multiplexers and Demultiplexers

For containers, note that this is for containers, and we often have two frequent operations.

Taking out the audio and video data in the container, we call it decapsulation, which is done by the demuxer decapsulator (also known as the demultiplexer).

Packing the processed audio and video data into the container is called encapsulation, which is completed by the muxer encapsulator (also known as the multiplexer).

We will continue to update the concepts related to audio and video under this article. If you feel that there are any concepts that are difficult to understand, you can leave me a message, and I will collect and supplement.

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/126394743