Audio and video development growth road and audio and video knowledge summary

 Audio and video involve voice signal processing, digital image processing, information theory, packaging formats, codecs, streaming media protocols, network transmission, rendering, algorithms, etc. In real life, audio and video play an increasingly important role, such as video conference, live broadcast, short video, player, voice chat, etc. Therefore, engaging in audio and video is a more meaningful thing, and opportunities and challenges coexist. This article will introduce from several dimensions: audio and video development foundation, advanced audio and video growth, audio and video work direction, audio and video open source library, streaming media protocol and books.

content

1. Basics of audio and video development

1. Audio Basics

2. General basis

3. Video Basics

2. Advanced growth of audio and video

1. Advanced audio

2. General advanced

3. Video advanced

3. Audio and video work direction

4. Audio and video open source library

1. Multimedia processing

2. Streaming media transmission

3. Player

4. Codec

5. Audio processing

6. Streaming media server

7. Audio and video analysis

8. Video rendering

5. Streaming media protocol

1. Streaming media transmission protocol

2. Streaming media application protocol

3. WebRTC signaling protocol

4. Audio and video coding protocol

5. Audio and video packaging format

Six, audio and video books

1. Audio class

2. Video category

3. Language


1. Basics of audio and video development

1. Audio Basics

Audio includes: sampling rate, number of channels and channel layout, sampling format, PCM and waveform, sound quality, audio coding format, audio packaging format. For more detailed audio and video concepts, please refer to: Entering the World of Audio and Video - Basic Concepts of Audio and Video .

2. General basis

Common include: coding principles, C/C++ foundation, video analysis tools, FFmpeg common commands, platform-related multimedia APIs.

3. Video Basics

Video includes: frame rate, bit rate, resolution, pixel format, color space, I frame P frame B frame, DTS and PTS, YUV and RGB, bit depth and color gamut, video encoding format, video packaging format. Specifically as shown in the figure below:

2. Advanced growth of audio and video

1. Advanced audio

Audio and video advanced growth is also divided into: audio, general, video. The audio includes: recording, microphone capture, audio codec, audio playback, audio analysis, and sound effects.

2. General advanced

Common include: familiarity with streaming media protocols, audio and video transmission, audio and video synchronous playback, platform-related multimedia applications, FFmpeg-related API applications, OpenGL rendering, and audio and video editing.

3. Video advanced

Video includes: video recording, camera capture, video encoding and decoding, video playback, filter effects, and video transcoding. In-depth learning based on familiarity with audio and video, as shown in the following figure:

3. Audio and video work direction

I believe that many partners have wandered and lost at the crossroads in the direction of audio and video work. My personal suggestion is to choose the direction you like. If you haven't found what you love, try to find it. To borrow Mr. Lei's words, "don't be afraid to make a choice." Once you've made the choice, make your job a career, not just work for the job's sake, to borrow Jobs' words "Do what you love, love what you do." Summarize the audio and video work directions into 9 (may not be accurate enough, just personal opinions): live broadcast, transmission, algorithm, video player, streaming media backend, short video, audio playback, video editing, image processing. The specific directions are as follows:

4. Audio and video open source library

1. Multimedia processing

Multimedia processing includes: FFmpeg, libav, Gstreamer. Among them, FFmpeg is the most commonly used audio and video processing library, including encapsulation formats, codecs, filters, image scaling, audio resampling and other modules.

2. Streaming media transmission

Streaming media transmission includes WebRTC, live555. Among them, WebRTC is the most commonly used RTC library. The more famous modules are JitterBuffer, NetEQ, pacer, and network bandwidth estimation.

3. Player

Players include: ijkplayer, exoplayer, vlc. Among them, ijkplayer is an open source cross-platform player at Station B, exoplayer is an open source Android platform player by Google, and vlc is open sourced by VideoLAN non-profit organization.

4. Codec

Common codecs include: aac, mp3, opus, vp9, x264, av1. Among them, aac is generally used for on-demand and short videos, and opus is used for RTC live broadcast. vp9 is Google's open source encoder, VideoLAN provides x264 encoder, and av1 is a new generation of video encoder open sourced by AOMedia (Open Media Alliance).

5. Audio processing

Open source libraries for audio processing include: sox, soundtouch, speex. Among them, sox is called the Swiss Army Knife of the audio processing industry, which can do various sound effects and provide various filters. soundtouch is used for variable-speed pitch change and variable-speed pitch change. Speex is strictly an encoder, but it has rich audio processing modules: PLC (packet loss concealment), VAD (silence detection), DTX (discontinuous transmission), AEC (echo cancellation), NS (noise) inhibition).

6. Streaming media server

The mainstream streaming media servers are: SRS, janus. SRS is a simple and efficient video server that supports RTMP, WebRTC, HLS, HTTP-FLV, and SRT. Janus is an open source WebRTC-based streaming media server of MeetEcho, which is strictly a gateway.

7. Audio and video analysis

Audio and video development is inseparable from analysis tools, and it is very important to master the use of analysis tools. Common audio and video analysis tools include but are not limited to: Mp4Parser, VideoEye, and Audacity. Among them, Mp4Parser is used to analyze the mp4 format and its structure. VideoEye is Raytheon's open-source Windows-based platform analysis video stream tool (here we pay tribute to Raytheon's open source spirit). Audacity is an open source audio editor that can be used to add various sound effects and analyze audio waveforms.

8. Video rendering

Video rendering related open source libraries include: GPUImage, Grafika, LearnOpenGL. Among them, GPUImage can be used to add various filter effects. Grafika is a rendering example library based on the Android platform open sourced by a Google engineer. LearnOpenGL is mainly a learning OpenGL tutorial supporting its website.

The relevant open source websites and addresses are as follows:

FFmpeg https://ffmpeg.org/
WebRTC https://webrtc.org.cn/
RTC community https://rtcdeveloper.agora.io/
RFC protocol https://www.rfc-editor.org/rfc/
OpenGL https://learnopengl-cn.github.io/
GPUImage https://github.com/BradLarson/GPUImage
VideoLan https://www.videolan.org/projects/
AOMedia https://aomedia.org/
xiph.org https://gitlab.xiph.org/
VP9 https://www.encoding.com/vp9/
soundtouch http://soundtouch.surina.net/
sox http://sox.sourceforge.net/

5. Streaming media protocol

1. Streaming media transmission protocol

Common streaming media transmission protocols are: RTP, SRTP, RTMP, RTSP, RTCP, etc. Among them, RTP (Real-time Transport Protocol) is a real-time transport protocol, and SRTP is a secure real-time transport protocol, that is, encrypted transmission based on RTP to prevent audio and video data from being stolen. RTMP (Real Time Messaging Protocol) is Adobe's open source real-time message transmission protocol, based on TCP, the basic protocols include: RTMPE, RTMPS, RTMPT. RTSP (Real Time Streaming Protocol) is a real-time streaming protocol, and the fields include: OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, TEARDOWN, etc. RTCP (RTP Control Protocol) is the RTP transmission control protocol, which is used to count packet loss and transmission delay.

2. Streaming media application protocol

Streaming media application protocols are: HLS, DASH. Among them, HLS is Apple's open source streaming media transmission application protocol, which also involves m3u8 protocol and ts stream. DASH is a streaming media protocol widely used by Google. It uses fmp4 slices and supports seamless switching between adaptive bit rates and multi-bit rates.

3. WebRTC signaling protocol

WebRTC signaling protocols are: SDP, ICE, NAT, STUN, TURN. Of course, the network transmission protocol of WebRTC also uses the streaming media transmission protocol mentioned above.

4. Audio and video coding protocol

Common audio coding protocols are: MP3, AAC, OPUS, FLAC, AC3, EAC3, AMR_NB, PCM_S16LE. Video coding protocols include: H264, HEVC, VP9, ​​MPEG4, AV1, etc. For related audio and video codec protocols, please refer to: Entering the World of Audio and Video - Audio and Video Coding and Entering the World of Audio and Video - Audio and Video Decoding .

5. Audio and video packaging format

Commonly used video packaging formats are: mp4, mov, mkv, webm, flv, avi, ts, mpg, wmv, etc. Commonly used audio packaging formats are: mp3, m4a, flac, ogg, wav, wma, amr, etc. The encapsulation format is a multimedia container, including multimedia information, audio and video streams. The multimedia information includes: duration, resolution, frame rate, bit rate, sampling rate, number of channels, etc., which are related concepts of the audio and video development foundation mentioned above. The audio and video stream is a stream composed of several frames obtained by encoding and compressing the original data, and the subtitle stream is generally composed of text or bitmaps in a specific format. Regarding the encapsulation format, you can refer to the articles written before: Entering the World of Audio and Video - Audio Encapsulation Format and Entering the World of Audio and Video - Video Encapsulation Format .

The agreements involved are as follows:

streaming RTP(rfc3550)
SRTP(rfc3711)
RTMP(adobe)
RTSP(rfc7826)
RTCP(rfc5506)
streaming application HLS(rfc8216)
DASH(iso23009)
WebRTC protocol SDP (rfc3264)
ICE(rfc5245)
NAT
TURN
STUN
encoding protocol aac、mp3、opus、ac3等
h264, h265, vp9, av1, etc.
Package format mp3, m4a, ogg, wav, etc.
mp4、mkv、flv、ts、avi等

Six, audio and video books

1. Audio class

Audio books include: The music of theory, DSP noise reduction, Audio Signal Process and Coding, digital audio principles and applications, audio signal processing and coding, etc.

2. Video category

Video books include: digital image and video processing, advanced audio and video development guide, video coding full-angle detailed explanation, new-generation video compression coding standard H.264/AVC, new-generation high-efficiency video coding H.265/HEVC, digital image processing ( Gonzalez Edition), multimedia signal encoding and transmission, OpenGL programming guide, WebRTC native development combat, FFmpeg from entry to mastery, online video technology essentials, etc.

3. Language

Books related to C/C++ include: C language programming, C++ programming ideas, C++ Primmer Plus, C++ programming language, Effective C++. Another recommendation is "Programmer's Self-cultivation". For related books, please refer to the blog I wrote before: Books I have read in those years .

C and C++ standard documentation can be found at: cppreference website . C language includes C89, C95, C99, C11, C17, C23, and C++ language includes C++98, C++11, C++14, C17, C20, C23. By studying the API documentation, you can quickly grasp the header files of the C/C++ language, which libraries are supported, and the differences between different versions. 

concluding remarks

I hope that all friends will continue to grow on the road of audio and video, with a heart that loves technology, delay gratification, and code their dreams. Since you choose a distant place, you have to travel through trials and tribulations, not afraid of obstacles on the road, heroes do not ask where they come from, and hard work will eventually pay off.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324447879&siteId=291194637
Recommended