Audio and video development -FFmpeg

　　Audio and video development is a very complex, large developer topics, kicking the tires which, combined with a look at OEIP (open source projects) new examples.

　　You can open flv, mp4 file type, and protocol rtmp audio and video data, audio player using SDL.

　　The wheat / mixed sound data collection and information gathering written and video media file or RMTP agreement.

Pictures main attributes

　　Comprising a length / width / channel number / pixel format (U8 / U16 / F32), and the arrangement of the format RGBA / YUV. Wherein the channel of the pixel format, as in opencv, CV_8UC1 / CV_8UC4, showing a channel format U8 four channels. Format are arranged, simply divided into RGBA class, such as BGRA, BGR, R which are generally used for the game textures, RGBA / BGR / R itself represents a mean channel format, the rear combination RGBA32, RGBAF32 used to represent channel in the pixel data format. And YUV type, media type is generally used, such as collecting device, audio and video files, streaming sliding the like, part of the two YUV format. First, the number of pixels corresponding to UV, such as YUV420, YUV422, YUV444, their same point is a point corresponding to a pixel Y, difference is expressed as a YUV420 U / V corresponding to four pixels, a UV 422 corresponding to two pixels, UV444 UV indicates a point corresponding to a pixel, YUV pixel data format is generally U8.

　　We can compare some common format eventually occupy bytes in size, as in the 1080P.

RGBA32 1920 * 1080 * 4 = 8,294,400 4 wherein each pixel includes four RGBA channels.
YUV420 1920 * 1080 + 980 * 540 * 2 in front of the size occupied by Y, followed by UV occupancy size.
YUV422 1920 * 1080 + 980 * 1080 * 2 Y is occupied by the size of the front, followed by UV occupancy size
YUV444 1920 * 1080 * 3 YUV each pixel of a three-channel.

　　The second format is sorted, can be subdivided as YUV420 YUV420I, YUV420P, YUV420SP (NV12) , simply, P is a suffix showing three YUV stored separately, and Y is a SP separately from, UV intertwined, and I even simpler, YUV intertwined, simply, and RGBA format similar arrangement.
YUV420P / YUV422P generally used to push and pull on the video file and streaming, I think, because the brightness Y to represent the most sensitive part of the human eye, the early black and white TV only Y everyone else can see, this format is compatible with the most convenient, directly UV discard line, Y / U / V may not be contiguous in memory distribution. Wherein YUV420I / YUV422I interleaver collecting device is generally used, the memory continuously, to facilitate processing, YUV420SP (NV12) also collecting device of this general all, this simple format, the use of only one convenient Y, the second is UV and Y may be uniform in width, then align the back, YUV is a unity, the memory is also conveniently continuously.

Audio basic properties are as follows

　　Sampling rate: the human ear to hear as the highest frequency is 22kHz, such as to completely reproduce the frequency, sampling rate is the frequency required * 2, so in general it is common to 44100 sample rate.

　　Number of channels: common mono, two-channel, there are also more unusual channels.

　　Data format: Generally S16 (16 bit signed integration), F32 (32-bit floating point), the other is not used as U8, D64, these two decibel range represented by a small, too large a.

　　And an image type of representation has a plurality of channels P (plane), I (interleave), similarly, the audio data is typically collected I, P transmission, in OEIP items, are transformed into appropriate format generally mono S16, there is no need to manage a P or I.

The basic concept of the development of audio and video

　　With a basic understanding of the above, let us know some concepts (Here are some of my basic understanding, if there is wrong, welcome to point out).

　　Coding: the original audio and video data compression, simply, the size 1080PYUV420 1920 * 1080 * 3/2 = 3,110,400 Byte, 1 seconds, then there is almost 25 frames 78M. Video as YUV-> H264, Audio PCM-> ACC procedure is coded. In FFmpeg, a similar process is AVFrame-> AVPacket

　　Decoding: The audio and video compressed data is converted into the original audio and video, H264-> YUV, ACC-> PCM process. In the FFmpeg is AVPacket-> AVFrame.

　　In coding and decoding, the image per frame depends on the original size of the original image size composition according to the length and width, and sorting the pixel format, and the audio channel per frame, the number of each sampling point is fixed for a specific codec such as AAC 1024 (there are also special 2048 case), MP3 1152, the corresponding field indicates AVFrame.nb_sampes, AVCodecContext.frame_size in. such dual-channel U16 amount per frame, the data in the AAC in 2 * sizeof (U16) * 1024. determines the coding rate and quality , under normal circumstances, rate quality is good, but the resulting file or network like occupation, the right online presentation rate, lower rate of 4M 1080P general use, you use the 1M line, but the picture may move when to paste, and rate control also have different control strategies, choose their own control strategy based on demand, this part of the internet has a detailed explanation.

　　Media file: FLV / MP4 these different media formats encoding information stored in different ways, in different media formats support different encoding format, most media formats support h264 / acc encoding information, so this is more commonly used two encoding formats .

　　Multimedia protocol: RTMP / RTSP these, information on network transmission media files and control package.

　　Audio and video streams: stream into an audio stream, video stream, subtitle stream, etc. These, wherein the media file may include one or more audio and video streams, each video stream is the same attributes (length and width, pixel formats, etc.) coded data into the original video stream.

　　Multiplexing: for example, to an audio stream with a video stream into one media file is multiplexed.

　　Demultiplexing: As above, the decomposition of a media file corresponding to the audio stream and video stream.

FFmpeg main target

　　AVFormatContext: multimedia protocol or media file, and if the agreement would resolve a media file information contained in the agreement, the principal is now read / write archive, read / write head and the end of the file method files and so on. You can put an object thought to be a media file.

　　AVCodec: codec, encoding and decoding, or note with the same codecId, but different object that contains the main function pointers, tell how the frame-> packet / packet-> frame.

　　AVCodecContext: codec environment, in simple terms, AVCodec say how codec, this is to tell him the appropriate attribute settings, such as the corresponding video, the length and width, as well as the corresponding coded set contains B frames, GOP is how much all this, it can be understood, if we assume that AVCodec and AVCodecContext is a class, which is equivalent to AVCodec set of methods, which is equivalent to AVCodecContext set of variables.

　　AVStream: General media file comprises at least one audio stream or a video stream, the multiplexing / demultiplexing a nexus relationship between codec. You can understand AVStream list information includes audio and video coding. AVStream also contain codec information corresponding AVCodecContext contained behind speaks both information multiplexing and demultiplexing to that from that copy.

　　AVFrame: the original audio and video information, comprising a fixed-length data.

　　AVPacket: audio and video encoding information, comprising a variable length data.

FFmpeg common API analysis

　　Reading a media file appropriate action to resolve the API.

　　avformat_open_input open AVFormatContext according to media files / Protocol address.

　　avformat_find_stream_info find audio and video streams in the corresponding index AVFormatContext.

　　avcodec_find_decoder open the corresponding audio and video stream decoder according to the index.

　　avcodec_alloc_context3 generating decoder The decoder environment.

　　avcodec_parameters_to_context stream decoded parameters (length and width of the image, audio, and the basic properties of the frame_size) copied to the decoder environment.

　　avcodec_open2 open decoder environment.

　　av_read_frame each AVPacket read from the media file AVFormatContext.

　　The index corresponds AVPacket avcodec_send_packet, issue a corresponding decoder to decode the stream.

　　avcodec_receive_frame obtain raw data decoder decodes, as in the video stream, P frames because B frames relationship, one does not necessarily get a AVPacket AVFrame, P frames before and after such consideration, it may be the time after several Packet, it reads the data frames and more, so avcodec_send_packet / avcodec_receive_frame wording would be such a case.

　　Write operation corresponding to the media file with the parsing API (non-IO mode):

　　avformat_alloc_output_context2 generated according to a format corresponding to the AVFormatContext, fixed number of data formats are different, such as the FLV format, time base audio stream and the video stream is milliseconds, I tried to turn this value will be later changed back again after avformat_write_header.

　　avcodec_find_encoder / avcodec_find_encoder_by_name choose their desired encoder.

　　avcodec_alloc_context3 selected encoder generates selection encoder environment, different from the above decoding process, where we need to fill their respective information, such as the need to know the length and width of the image encoding, bit rate, GOP and other settings.

　　avcodec_open2 open decoder environment.

　　avformat_new_stream generate corresponding video and audio stream information, fill in the corresponding encoder to AVFormatContext.

　　avcodec_parameters_from_context copy set parameter encoder to a stream.

　　Resolution, and the pointer avio_open agreement protocol, how to read and write protocol information protocol header, protocol content.

　　avformat_write_header write header information.

　　avcodec_send_frame the end of the compressed data to the encoder.

　　Data to get coding avcodec_receive_packet, decoding and the like, P frames a Frame impossible to decide a packet, may be several front Frame, in order to obtain a series of packet.

　　av_interleaved_write_frame the audio and video data encoded in the cross-write media files

　　av_write_trailer writing is completed, the filling part of the value to be calculated based on all the data is written.

　　There is also a case of using IO mode, may be utilized keyframe image data is written directly to the audio IO, and then reads the attributes of the corresponding audio and video frames from the stream directly brought directly for video streaming uncertain length and width, etc. .

　　Differences can be seen from the API to read and write, read the information inside the media file AVFormatContext all have, to read stream decoding information obtained from the stream, the decoder opens, reads each packet from the AVFormatContext, a decoder for decoding the packet. Write the media file is to generate the AVFormatContext a blank, and then open the selected encoder, generates a stream, and then writes the data stream per frame, the encoder encodes finalized using files.

　　Related pit, image media files are generally used in the P YUV format, the YUV format storage block, as well as the concept of the corresponding align, for example, suppose you are wide 1080, but in the YUV block, Y is the width may 1088 (assuming the use of 32 was set flush), wherein each row of data at index zero-padded 1080-1087, corresponding to all the image processing proposed in my OEIP treatment, OEIP image data to the GPU whole process needed tight hash data, it is necessary to use av_image_copy_to_buffer / av_image_fill_arrays process.

　　P format related pit, media files on multi-channel audio is used, while the audio capture and playback equipment for general use I format, so the general use swr_convert converter, audio player using SDL library on a few API calls on the line, in this is not to say, you can view the OEIP code in processing, audio capture Winodws use the WASAPI.

　　Time-based concept: the concept of audio and video stream has a time base, the more important, flv audio and video are (1, 1000), if it is mp4, time-based video is (1,90000), audio generally located corresponding sampling rate. Time-based, you can simply understood as one second scale, flv stream correspondence is milliseconds, and time-based mp4 video stream corresponding to 1/90 of a millisecond, what sense, such as your video corresponds to 25 frames in flv where, per frame, separated by 40 time groups, and in the mp4, the separated 360 timebase, when coding, we need pts on frams / dts / duration corresponding time base units, noted that the conversion in OEIP in we all turn out / transfer related to the user all the time in milliseconds, which converted our own internal process.

　　Therefore, this OEIP , add media files / protocols available source of input and output, Similarly, they easily appear Unity3D / UE4 Lane, such as media files / content presentation in the protocol to correspond directly Unity3D / UE4 Lane Texture2D display, or the Unity3D / UE4 in the Texture2D / RTT data stored in the video, or push out.

　　Reference:
　　https://www.cnblogs.com/leisure_chn/category/1351812.html leaves more than FFmpeg development