Technical points of C++ audio and video development

Generally speaking, there are certain technical thresholds for audio and video development. I think it is necessary to steadily accumulate 3-5 years in this field in order to have an overall and profound understanding of audio and video-related development knowledge.

Technically speaking, we need to accumulate from the following two categories of knowledge points:

1. C/C++ general development knowledge

The main programming languages for audio and video development are C and C++.

This piece of expertise accumulation is universal, not limited to a specific industry, and belongs to the programmer's technical foundation.

You can focus on the following aspects:

The underlying working principle of the computer system The
operating system principle
The compiling, linking and loading mechanism of the program The
ideas behind the C/C++ language features, the underlying working principle, applicable scenarios, and what kind of problems exist.
Software design principles and design patterns.
Data structures and algorithms
. Thread concurrent programming principle
Network programming
Cross-platform
operating system API
software debugging

2. Professional knowledge in audio and video field

This piece belongs to the professional knowledge engaged in the audio and video industry.

There is a lot of professional knowledge in this area, and a lot of professional knowledge is involved behind each functional module.

The development of audio and video can be divided into two parts, and the content involved is roughly as follows:

Audio and video client development,
client application development,
audio and video engine development,
audio and video engine SDK
audio and video engine framework,
audio and video engine functional modules,
audio/video capture,
audio/video rendering,
audio/video data processing,
audio/video encoding/decoding,
recording,
streaming
audio and video Synchronization
Streaming media server development
General server development knowledge, you need to pay attention to the following points:
High stability,
high performance,
high concurrency, and
high availability
Streaming server development
SFU vs MCU
streaming media protocol conversion
Audio and video transmission protocol under weak network
recording & transcoding
...
above In the content, client application development, audio and video engine SDK, audio and video engine framework, general server development, etc. mainly involve C/C++ general development knowledge, but to design these parts, you must have a relatively deep knowledge of audio and video related knowledge and product business Understanding can be done.

Usually, audio and video architects pay more attention to these parts.

The development of the underlying functional modules of the audio and video engine and the SFU/MCU streaming media server are closely related to the professional knowledge of audio and video.

Audio and video capture module
Video data can be obtained in the following ways:
USB camera
professional hardware video capture card (there are soft compression card and hard compression card)
network camera (support RTSP protocol)
the screen recording API provided by the operating system to
read audio Video files and decode
subscribe to the stream on the streaming server.
Audio data can be obtained in the following ways:
sound card,
speaker playback sound loop collection (depending on the operating system API),
read audio and video files and decode,
subscribe to the stream on the streaming server to
support audio input network cameras (support RTSP protocol)
support audio input of video capture card
in the phone, the operating system SDK will provide audio and video capture interface to
audio / video rendering
video rendering usually need to know about OpenGL, and audio rendering need to know OpenAL
can The open source library SDL is used to quickly implement the rendering module
. Using the DirectShow framework under Windows, the operating system provides the corresponding video and audio rendering modules (which can be seen through GraphEdit)
. The renderer in DirectShow involves the strategy of audio and video synchronization. Of course, also You can completely implement audio and video synchronization modules by yourself.
Audio/video data processing.
These modules basically perform some algorithmic processing on the original video or audio data before encoding or after decoding.
Video processing mainly includes resolution conversion, color space conversion, frame rate conversion, image enhancement, multi-channel video splicing, adding subtitles, adding LOGO pictures, etc. This has a relatively large impact on the overall performance and often requires the use of SIMD instructions for assembly optimization Or use GPU algorithm for acceleration.
Audio processing mainly includes echo cancellation, noise suppression, automatic gain, mixing, etc. This area often involves more signal processing and mathematical knowledge, and is a more complex piece of
audio. Audio/video encoding/decoding
Video encoding/decoding
To understand the basic encoding principles of video, and be familiar with the key parameters of video encoding and bitstream format
, H.264 is currently used more and more. H.265 is gradually being used. There are many other video encodings, such as AVS,
Video encoding such as VP8 and VP9 has a relatively large impact on the performance of the audio and video engine. This part basically needs to be accelerated by GPU. The current Intel set display supports H.264 and H.265 relatively well. NVIDIA's independent graphics card There are restrictions on the number of encoding channels; mobile phones generally have corresponding hardware acceleration modules; on hardware with better performance, you can consider the open source X264
audio encoding/decoding.
To understand the basic encoding principles of audio and be familiar with the key parameters of audio And stream format
AAC is currently used more frequently, and there are many other audio encodings, such as G7.11, G.722, OPUS, etc.
On the PC, the general audio related modules have no obvious impact on performance, but in HiSilicon In embedded systems, the impact of audio modules on performance cannot be ignored, because HiSilicon basically does not provide hardware acceleration modules for audio, and ARM CPU performance is also a bit weak.
Recording
requires understanding of container formats such as FLV, MP4, TS, etc.
for special recording methods Pay attention to the processing methods of the software, for example, add the recording function of the beginning and end of the film, and add the recording of
MP4. Pay attention to the impact of the moov box at the beginning or end of the file on the writing and on-demand recording of the recorded file
Strategies for uniform mixing of audio and video during recording
Streaming
Understand the working principles of video interaction, live broadcast and on-demand
key evaluation indicators
Latency
First screen time
Synchronization
Fluency
Picture quality/Sound quality
Understand the following audio and video transmission protocols
RTMP
HTTP + FLV / Websocket + FLV
HLS
RTP & RTCP
RTSP
SIP
WebRTC
H.323
Audio and video transmission protocol under weak network
Understand the principle of TCP protocol stack
Reliable UDP transmission protocol
KCP
SRT
QUIC
FEC + Packet loss retransmission mechanism (such as NACK)
The development of audio and video is not Starting completely from scratch, there are many open source libraries that you can rely on, but to use these libraries well, you need a deep understanding of the above-mentioned audio and video expertise.

The more common audio and video open source libraries are as follows:

ffmpeg
can directly use the command line of ffmpeg to realize common functions such as transcoding and slicing.
You can develop your own audio and video module based on the FFmpeg API package.
Live555 is
more complete RTSP library
x264 is
more commonly used H.264 encoding library
fdkaac is
more commonly used AAC codec library
Librtmp
supports the rtmp protocol, you need to further improve
pjsip
support the sip protocol
during productization. webrtc
Google’s open source webrtc library has a better audio/video engine. Real-time evaluation of the network status can be used for reference. The echo cancellation module is also a well-known
SDL
comparison The well-known audio and video rendering library
SRS
is a well-known RTMP streaming server in China, supports HLS, HTTP+FLV, version 4.0 starts to support WebRTC
OWT
Intel's open source WebRTC suite, supports WebRTC client SDK and distributed WebRTC MCU server
OpenCV
famous the video library of algorithms
addition, the encoding and decoding of video can be based on NVENC Intel Media SDK and NVIDIA to achieve.

On HiSilicon Embedded, HiSilicon chips (such as Hi3531D, etc.) provide hardware audio and video capture, audio and video rendering, video encoding/decoding, video image processing and other core functions, which require the help of the SDK provided by HiSilicon for development Up.

-Finish-

Recommend an audio and video streaming media learning syllabus, the knowledge points of audio and video streaming media posts have more video materials can be added to the group: 832218493 for free!

Click to watch the interview
Insert picture description here

Insert picture description here