Audio and video learning path D1 (principle of audio and video recording)

I would like to use this series of posts to record my personal audio and video learning and growth process. If I can help you, I will be honored;

Audio and video recording principle framework:

        

Camera:

The data collected by the camera can   be represented by rgb/yuv  , rgb is easy to understand, that is, the three primary colors, red, green and blue, yuv will talk about it later.

Image frame:

The collected image frames, for example, 25 frames per second, are 25 frames.

Image Processing:

What is this module used for? For example, if our image is dark, we can brighten it up a bit, and then we can operate in this module.

Image frame queue:

That is to say, the data we collected will be put into an image frame queue first, and wait for the encoding thread to fetch these data for encoding.

Video encoding:

Why do video encoding? Because if you don't do video encoding, then the volume of this video is very, very large . For example, if you download a 1k picture, it may have 3mb, but there are many pictures in a video file, then the video will consume a lot of space if it is not compressed, so this module is necessary. We write the compressed data into a file according to a certain format.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

microphone:

The data collected by the microphone is expressed in PCM

Sample frame:

The so-called sampling frame is to use N sampling points to compress the audio data into one frame; (if you don’t understand it, you can substitute it into the video file, for example, when recording ordinary 1k video, it uses 1920*1080 pixels to fuse into a frame of picture data);

As for how much data is collected for encoding compression, for example, 44.1k is common (44.1k sampling points are collected in one second). How many sampling points are used as one frame of data for compression? One second of data can be compressed for one frame, but the delay is relatively large, so it can be divided into smaller ones. For example, one frame can be made in more than 20 milliseconds The data is compressed like this;

When I talk about pcm later, I will say how many sampling points are more reasonable to make a frame of data; just understand it for the time being

Audio processing:

Sometimes some operations are done on the sound, such as changing the voice and so on.

Sample frame queue:

It is similar to the image frame queue. It also puts the data into the queue first, and waits for the encoding thread to read the data from the queue and then compress it.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

multiplexer:

When our audio and video data is written into the file, it is written according to certain rules, so that when we play it, it will be parsed according to the reverse rules, so that the audio and video data can be extracted and played .
The so-called rules such as MP4, the format of media files such as avi are rules;

clock:

When we play, the audio and video must be synchronized. If there is no clock information record, then sometimes after we do some operations (such as dragging and playing), the picture and audio will be out of sync. The video is playing the data of the first second, and the audio is playing the data of the tenth second; so we must add this clock when recording. ( The audio and video both use the same clock when recording ), and each frame of data collected will be stamped with a corresponding time stamp .

Guess you like

Origin blog.csdn.net/qq_25704799/article/details/130716075