The latest personal audio and video development learning insights!

Foreword:

Hello everyone, it’s been a long time since I wrote an original article. This article is mainly to share with you the introductory learning route of audio and video. Although I have written it before, this time I will combine the latest contact with people of different levels to make a summary and share. ! Of course, it is still mainly about getting started, and the technical points are all the requirements of current enterprises that must be mastered!

Let me explain in advance that there are inevitably deficiencies in the article, and I hope everyone can correct me!

1. Is the threshold for learning audio and video high?

Undoubtedly, there are technical barriers in the audio and video industry, and there are very few learning materials (including books, etc., very few!). It can be said that there are very few audio and video books on the market. If you don’t believe me, you can go to various websites by yourself. do a search.

Not only in terms of books, but even the tools available are very few. . . .

In short, the learning materials are very unfriendly to newcomers!

2. How to learn embedded audio and video?

I personally think that first of all, you must have a basic understanding and mastery of basic audio and video theoretical knowledge, for example:

  • 1. What are pixels and resolution?

  • 2. What is frame rate and bit rate?

  • 3. What are rgb, yuv, pcm?

I just briefly listed a few examples, which will involve practical engineering issues in the work, so I won’t expand on them, but just tell you that these are the basic knowledge points that must be mastered.

Next, what I want to say is, after all, it is definitely not enough for us to learn the theory alone, especially when we are engaged in embedded, we must deal with hardware, so you must prepare a development board; on the market, I personally The recommended chip platforms are:

  • 1. Rockchip

  • 2. HiSilicon

Relatively speaking, HiSilicon has a lot of learning materials, but Rockchip’s AI case is good, and there are many recognition algorithm models!

In this article, I will take the Rockchip development board I have as an example to tell you how to learn?

First of all, we must look at the official documents provided by the sdk:

2b301a21b6b8fe53129273d51f804671.png 3fd960f9d3dc067e0d72284548db1d1e.png

Because our focus is on audio and video, we focus on the multimedia framework first. Different chip platforms will have their own packaged multimedia framework (mpp for short):

c00a7b9c13d36ad05323092ef794da48.png

Expansion of the mpp platform:

002c6f2f2d5f4fbb7cc48033510e89eb.png

You can combine the manual to summarize the process yourself: encoding-audio and video processing-decoding!

Then there are a lot of interfaces in the manual, so you don’t need to read all of them. At this time, you can combine the official mpp source code to learn, but I recommend the actual project demo given by the official to compile and burn it into the development board. Take a look at the actual effect:

e820560a18f82fa29ddc31d3fa089bf2.png

For example, in the above coding case, at the same time, this code stream can be sent to the client through the rtsp protocol, so you can use vlc or ffmpeg to pull the stream, and you can see the pictures captured by the camera in real time; then you can study carefully The code inside, so that you can quickly grasp the coding process; when looking at the code, you can read it in conjunction with the manual just now, and each interface can be found in the manual, so I just said that you don’t have to memorize these interfaces by rote, It's not necessary at all!

In this encoding case, when you are watching, you can modify some parameters, such as the bit rate or frame rate mentioned above. After the modification, you can verify whether there will be any changes in the picture. If there are changes, through this kind of practical experiment, you can quickly master some basic theoretical knowledge without having to memorize it by rote!

Going a little further, you can adapt different sensors, which involves a little driver knowledge, as long as you have adapted it once, you will basically have experience next time; and in the usual development process, we will often contact to the following functional requirements:

  • osd overlay display

  • Motion Detection

  • smart coding

  • capture

  • zoom display

  • privacy mask

These officials have given cases, we can grasp from the cases, and then go to the actual development to combine the business to adapt.

3. What other audio and video knowledge points need to be mastered?

1. Streaming media protocol:

During interviews or actual work, I have come into contact with more streaming media transmission stream protocols:

  • rtsp

  • rtmp

So how do we learn these protocols?

I believe that most people may read theoretical knowledge first, that is, read some blogs; well, from my personal point of view, there is nothing wrong with this, but at the same time, I believe that most people who are just starting to learn, after reading After that, you are still confused, or you still haven't grasped the commonality of learning an unfamiliar transmission protocol:

  • That is, protocol header + protocol body

Well, with this commonality, you may still be confused, because there are too many knowledge points in one agreement, not to mention that you are learning for the first time and are very unfamiliar.

We usually develop and master new things, and we have to go through a period of tossing before we can understand more deeply.

So when we are learning, especially the streaming media transmission protocol, I recommend capturing packets to learn the transmission protocol, such as the above encoding rtsp code case, you can use wireshark to capture packets for analysis, for example:

68bb83e4ee9ecbc7cca98692047db636.png

ffplay pull stream:

ddba57937a27e0e476dcbdddd4c22a92.png

Start packet capture analysis:

  • The interactive request process of rtsp:

4e163a1a3f091321885eeb2f853e0277.png
  • The rtp transmission code stream process:

41b264b9728fb834939f6dec92f83410.png
  • rtcp transmission information record:

9bdcb3c5611c376605ba601f829d871f.png

Through such practical analysis experiments, this will give you a deeper understanding of the rtsp protocol.

The last is the process of code implementation. You can find a lot of codes on the Internet. You can refer to it in C language to implement it. Of course, there is a demo in the rfc3550 manual:

https://www.rfc-editor.org/rfc/rfc3550
e0a765d5ad169fa406d9f712a48da98e.png

The above is an experience sharing of my learning of streaming media transmission protocol!

2. Packaging container and h264:

Here I give some common packaging containers to learn:

  • flv

  • mp4

  • ts

Then encode and decode the code stream. This must first learn h264. The main thing here is to master:

  • What are I, B, P frames?

  • NALU code stream structure

  • The realization of rtp's packaging method for h264 means that the h264 naked stream data is packaged

There are a lot of details I haven’t said here. This is just a summary of the knowledge outline that you need to master.

Multimedia open source library:

  • ffmpeg

  • gstreamer

Here I recommend learning ffmpeg, and how to learn it. I can’t finish it here today. I will publish an article later to explain how to learn ffmpeg!

v4l2:

If you are more interested in the underlying things, you can also learn v4l2 in the Linux kernel:

df0bf9ff8c4eb12888ab7dd595533bec.png

at last:

In this article, I have not expanded on the audio-related things. My personal suggestion is that as long as you master the video learning method, the audio learning method is similar, and you will naturally have your own learning method!

Well, that's all for today's sharing. The article is bound to have shortcomings!

The following is a personal communication WeChat:

3e1604aace6d7c628b294d47fdd53357.png

Guess you like

Origin blog.csdn.net/Dada_ping/article/details/130023554