[Deep Learning] Use ffmpg and gstreamer for video streaming and encoding and decoding (1): ffmpg

Why Video Codec

The main reason why video streams need to be encoded and decoded is that video files have a large amount of data, and direct transmission of video files will occupy a large amount of network bandwidth and storage space. By encoding and decoding video, the video data can be compressed into a smaller volume for more efficient transmission and storage.
Specifically, encoding is to convert original video data into compressed video data, and decoding is to restore compressed video data to original video data. Both encoding and decoding processes need to adopt certain algorithms and rules in order to achieve the minimum data loss and the highest data compression ratio in the process of compression and decompression. Commonly used video codecs include H.264, H.265, VP8, VP9, ​​etc.

The main advantages of video codecs are:

Save storage space: Through video encoding, the size of video files can be compressed to about 10%~50% of the original size, thus saving storage space.

Reduce network bandwidth: Video encoding can reduce the data volume of video streams, thereby reducing the network bandwidth required for data transmission.

Improve video transmission quality: Video encoding can reduce delay and jitter during data transmission, thereby improving video transmission quality and stability.

Encrypted video stream: By encoding the video, the video stream can be encrypted to protect data security and privacy.
In short, video codec is an important means to achieve efficient video transmission and storage, and is widely used in video surveillance, video live broadcast, video calls and other fields.

network bandwidth

The network bandwidth in video streaming refers to the network transmission bandwidth occupied during video streaming. In the process of video streaming, the video stream needs to be transmitted to the client through the network, so a certain network bandwidth is required to support data transmission.
Network bandwidth refers to the amount of data that can be transmitted per unit of time, usually expressed in Mbps (megabits per second) or Gbps (gigabits per second). The larger the network bandwidth, the larger the amount of data transmitted and the faster the transmission speed. Therefore, in the process of video streaming, it is necessary to ensure that the network bandwidth is large enough to transmit video data quickly and stably.
The network bandwidth in video streaming will be affected by various factors, such as network bandwidth limitation, network congestion, data transmission distance, etc. If the network bandwidth is insufficient, problems such as video freeze, blurred picture, and out-of-sync sound will occur, affecting the viewing experience of users. Therefore, when streaming video, it is necessary to fully consider the network bandwidth and select an appropriate transmission protocol, bit rate, and resolution in order to achieve better video transmission effects.

Common Video Encoding Formats

H.264/AVC: H.264/AVC is a widely used video coding format, which has the advantages of high compression ratio, high image quality and low delay. It usually adopts CBR (constant bit rate) or VBR (variable bit rate) encoding method, and the bandwidth consumed depends on factors such as video resolution, frame rate, and bit rate.

H.265/HEVC: H.265/HEVC is the successor version of H.264/AVC, with higher compression ratio and lower bit rate, which can reduce bandwidth consumption with the same image quality. However, compared to H.264/AVC, H.265/HEVC is more computationally complex and requires higher computing performance and a better network environment.

Video resolution and the empirical bandwidth it takes up

1080p, CIF, D1, and 720p are standard representations of video resolutions, and the specific explanations are as follows:

1080p: Also known as full HD, it refers to a video format with a video resolution of 1920×1080 pixels, where "p" means progressive scanning.

CIF: Refers to the format with a video resolution of 352×288 pixels, where "CIF" stands for "Common Intermediate Format", which is a standard format.

D1: Refers to the format with a video resolution of 720×576 pixels, where "D1" stands for "Digital 1", which is a standard format.

720p: Also known as high-definition, it refers to a format with a video resolution of 1280×720 pixels, where "p" means progressive scanning.
Different video resolutions consume different amounts of bandwidth. Generally speaking, the higher the resolution, the greater the bandwidth required. According to experience, the bandwidth required for 1080p video streams is generally between 5-10Mbps, the bandwidth required for CIF video streams is generally between 0.2-0.5Mbps, the bandwidth required for D1 video streams is generally between 1-2Mbps, and the bandwidth required for 720p video streams is generally between 2-4Mbps. However, it should be noted that the actual bandwidth consumption is also affected by factors such as video encoding method, frame rate, bit rate, compression ratio, network environment and device performance, so the specific bandwidth consumption needs to be evaluated according to the actual situation.

Gigabit Ethernet port and 100M Ethernet port

The difference between a Gigabit Ethernet port and a 100M Ethernet port is mainly in the transmission rate and the maximum supported bandwidth. Specifically, the transmission rate supported by the Gigabit Ethernet port is 1Gbps (that is, 1000Mbps), while the transmission rate supported by the 100M Ethernet port is 100Mbps, so the transmission rate of the Gigabit Ethernet port is 10 times that of the 100M Ethernet port. At the same time, the maximum bandwidth supported by the Gigabit Ethernet port is also greater than that of the 100M Ethernet port, which can reach 1Gbps.
As for why 4 network cables can also run, this is because Gigabit Ethernet technology adopts full-duplex communication mode, and adopts adaptive rate and duplex mode, which can automatically adjust the transmission rate and duplex mode in different transmission environments. Therefore, even if the network cable only has 4 wires, the operation of the Gigabit Ethernet port can be realized through adaptive technology. However, in order to ensure the normal operation of Gigabit Ethernet, it is recommended to use 8 network cables that meet the Gigabit Ethernet standard to avoid problems such as reduced transmission rate or network instability.

The difference between hardware codec and software codec

The main difference between hardware codec and software codec lies in the implementation of codec.
Software encoding and decoding is implemented on the CPU, and software algorithms are required to encode and decode video data. The advantage of this method is that it has high flexibility, can run on different platforms and systems, supports multiple codec formats and algorithms, and has good compatibility. The disadvantage is that the speed is relatively slow, especially for high-definition video and high frame rate video, it needs to consume a lot of CPU resources, and it is easy to cause the system to freeze.
Hardware codec is realized by GPU or dedicated hardware module, which can quickly codec video data. The advantage of this method is that the speed is fast, real-time processing can be realized for high-definition video and high frame rate video, and the burden and power consumption of the CPU can be reduced at the same time. The disadvantage is that hardware codecs usually only support limited codec formats and algorithms, which is not flexible enough, and the compatibility between different platforms and systems is also different.
In general, hardware codecs and software codecs have their own advantages and disadvantages, and they need to be selected according to specific application scenarios and requirements. For scenarios that require high-speed codec and real-time processing, hardware codec is a better choice; for scenarios with higher requirements for compatibility and flexibility, software codec is more suitable.

Introduction to streaming tools

GStreamer: A powerful multimedia framework that can be used for audio and video acquisition, encoding, decoding, processing and transmission.

FFmpeg: An open source audio and video processing tool that can be used for audio and video acquisition, encoding, decoding, transcoding, processing, etc.

Both FFmpeg and GStreamer are streaming media processing frameworks, and they are widely used in audio and video codec, transcoding, filtering, and acquisition.
FFmpeg is an open source audio and video processing library that supports codec, transcoding, filtering and other operations in various audio and video formats. It can run on multiple platforms, including Windows, Linux, macOS, etc., and is widely used in multimedia software development, streaming media servers, video editing, transcoding, and other fields. FFmpeg provides a series of command line tools, which can conveniently perform various audio and video processing operations, such as playback, recording, transcoding, etc. At the same time, FFmpeg also provides a series of API interfaces, allowing developers to integrate audio and video processing functions in their own applications.
GStreamer is an open source multimedia framework based on plug-ins, which can perform operations such as audio and video collection, codec, processing and playback. It provides a set of components and plug-in architecture, which can realize different functions by combining different plug-ins. GStreamer can run on multiple platforms, including Linux, macOS, Windows, etc., and is widely used in digital TV, audio and video editing, streaming media servers, and other fields. GStreamer provides a complete set of API interfaces, which can realize audio and video processing functions by writing codes in C/C++, Python and other languages.
In terms of difference, FFmpeg pays more attention to the processing of audio and video encoding and decoding, and is widely used in media format conversion, encoding and decoding, etc.; while GStreamer pays more attention to the overall framework of audio and video processing, and is widely used in audio and video acquisition, streaming media transmission, playback, etc. At the same time, GStreamer has a wider range of applications and can be used in multimedia software development, digital TV, streaming media servers and other fields.

Install the ffmpg library

Install the necessary dependencies

Before compiling FFmpeg, you need to install some necessary dependent libraries, including:
h264 decoding library
nasm

First install nasm dependencies, after the installation is complete, it will help install h264.

wget https://www.nasm.us/pub/nasm/releasebuilds/2.14/nasm-2.14.tar.gz
/home/imagedepth/streamWork/nasm-2.14.tar.gz
 tar zxvf /home/imagedepth/streamWork/nasm-2.14.tar.gz -C ./
 cd nasm-2.14
 ./configure
 make
 make install

Install x264 as follows:

use:

git clone https://code.videolan.org/videolan/x264.git
cd x264
./configure --prefix=/usr/x264/ --includedir=/usr/local/include --libdir=/usr/local/lib --enable-shared
make
make install

If it appears: ERROR: x264 not found using pkg-config

vim /etc/profile
末尾加入内容export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig,具体看各位自己x264的安装路径
source /etc/profile
然后再./configure ...就没问题了

Install the ffmpg library

git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
cd ffmpeg
./configure --enable-gpl --enable-libx264 --enable-shared
make
make install
#添加动态库搜索路径,解决找不到libx264.so.164的问题
sudo sh -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/local.conf'
sudo ldconfig

After installation, executable files are placed in /usr/local/bin by default, library files are placed in /usr/local/lib by default, configuration files are placed in /usr/local/etc by default, other resource files are placed in /usr/local/share, and header files are placed in /usr/local/include

the code

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
extern "C" {
    
    
#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include <libswscale/swscale.h>
#include <libavutil/imgutils.h>
}
int main(int argc, char* argv[]) {
    
    
    // 输入流地址,对应大华摄像头
    const char* input_url = "rtsp://admin:[email protected]:554/cam/realmonitor?channel=1&subtype=0";
    // 输出视频宽度
    const int output_width = 640;
    // 输出视频高度
    const int output_height = 480;
    // 初始化FFmpeg
    // av_register_all();
    avformat_network_init();
    // 打开输入流
    AVFormatContext* input_format_context = avformat_alloc_context();
    if (avformat_open_input(&input_format_context, input_url, nullptr, nullptr) != 0) {
    
    
        std::cerr << "Could not open input stream: " << input_url << std::endl;
        return EXIT_FAILURE;
    }
    // 获取输入流信息
    if (avformat_find_stream_info(input_format_context, nullptr) < 0) {
    
    
        std::cerr << "Could not find stream information." << std::endl;
        return EXIT_FAILURE;
    }
    // 获取视频流索引
    int video_stream_index = -1;
    for (unsigned int i = 0; i < input_format_context->nb_streams; i++) {
    
    
        if (input_format_context->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
    
    
            video_stream_index = i;
            break;
        }
    }
    if (video_stream_index == -1) {
    
    
        std::cerr << "Could not find video stream." << std::endl;
        return EXIT_FAILURE;
    }
    // 获取视频流解码器
    AVCodecParameters* codec_parameters = input_format_context->streams[video_stream_index]->codecpar;
    AVCodec* codec = const_cast<AVCodec*>(avcodec_find_decoder(codec_parameters->codec_id));
    if (!codec) {
    
    
        std::cerr << "Could not find decoder." << std::endl;
        return EXIT_FAILURE;
    }
    // 打开视频流解码器
    AVCodecContext* codec_context = avcodec_alloc_context3(codec);
    if (avcodec_parameters_to_context(codec_context, codec_parameters) < 0) {
    
    
        std::cerr << "Could not create codec context." << std::endl;
        return EXIT_FAILURE;
    }
    if (avcodec_open2(codec_context, codec, nullptr) < 0) {
    
    
        std::cerr << "Could not open codec." << std::endl;
        return EXIT_FAILURE;
    }
    // 初始化视频帧
    AVFrame* frame = av_frame_alloc();
    if (!frame) {
    
    
        std::cerr << "Could not allocate frame." << std::endl;
        return EXIT_FAILURE;
    }
    // 初始化视频帧RGB格式
    AVFrame* frame_rgb = av_frame_alloc();
    if (!frame_rgb) {
    
    
        std::cerr << "Could not allocate frame." << std::endl;
        return EXIT_FAILURE;
    }
    int num_bytes = av_image_get_buffer_size(AV_PIX_FMT_RGB24, output_width, output_height, 1);
    uint8_t* buffer = (uint8_t*)av_malloc(num_bytes * sizeof(uint8_t));
    av_image_fill_arrays(frame_rgb->data, frame_rgb->linesize, buffer, AV_PIX_FMT_RGB24, output_width, output_height, 1);
    // 初始化图像转换器
    SwsContext* sws_context = sws_getContext(codec_context->width, codec_context->height, codec_context->pix_fmt, output_width, output_height, AV_PIX_FMT_RGB24, SWS_BILINEAR, nullptr, nullptr, nullptr);
    // 读取视频帧
    AVPacket packet;
    av_init_packet(&packet);
    while (av_read_frame(input_format_context, &packet) >= 0) {
    
    
        if (packet.stream_index == video_stream_index) {
    
    
            // 解码视频帧
            int ret = avcodec_send_packet(codec_context, &packet);
            if (ret < 0) {
    
    
                std::cerr << "Error sending a packet for decoding." << std::endl;
                av_packet_unref(&packet);
                continue;
            }
            while (ret >= 0) {
    
    
                ret = avcodec_receive_frame(codec_context, frame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
    
    
                    break;
                }
                else if (ret < 0) {
    
    
                    std::cerr << "Error during decoding." << std::endl;
                    return EXIT_FAILURE;
                }
                // 转换图像格式
                sws_scale(sws_context, (const uint8_t* const*)frame->data, frame->linesize, 0, codec_context->height, frame_rgb->data, frame_rgb->linesize);
                // 将图像数据复制到OpenCV的Mat中
                cv::Mat mat(output_height, output_width, CV_8UC3, frame_rgb->data[0]);
                // 显示图像
                cv::imwrite("output.jpg", mat, {
    
    cv::IMWRITE_JPEG_QUALITY, 80});
                // cv::imshow("Video", mat);
                // cv::waitKey(1);
            }
            av_packet_unref(&packet);
        }
    }
    // 释放资源
    av_free(buffer);
    av_frame_free(&frame_rgb);
    av_frame_free(&frame);
    avcodec_close(codec_context);
    avformat_close_input(&input_format_context);
    avformat_network_deinit();
    return EXIT_SUCCESS;
}

Corresponding cmakelist:

cmake_minimum_required(VERSION 3.0)
project(pro)
add_definitions(-std=c++11)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Release)
#set(CMAKE_BUILD_TYPE Debug)
set(EXECUTABLE_OUTPUT_PATH ${PROJECT_SOURCE_DIR}/workspace)
set(ffmpeg_libs_DIR /usr/local/lib)
set(ffmpeg_headers_DIR /usr/local/include)
set(OpenCV_DIR   "/usr/local/opencv/lib64/cmake/opencv4")

find_package(OpenCV)
include_directories(
    ${PROJECT_SOURCE_DIR}/src
    ${OpenCV_INCLUDE_DIRS}
    ${ffmpeg_headers_DIR} 
)
link_directories(${OpenCV_LIBRARY_DIRS} ${ffmpeg_libs_DIR} )
set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -O0 -Wfatal-errors -pthread -w -g")
#递归地添加的相关文件
file(GLOB_RECURSE cpp_srcs ${PROJECT_SOURCE_DIR}/src/*.cpp)
file(GLOB_RECURSE c_srcs ${PROJECT_SOURCE_DIR}/src/*.c)
add_executable(pro ${cpp_srcs} ${c_srcs})


target_link_libraries( pro ${OpenCV_LIBS} ${ffmpeg_libs_DIR} avcodec.a avformat.a avutil.a swscale.a avfilter.a swresample.a z bz2 x264)

Pull effect:
insert image description here

Guess you like

Origin blog.csdn.net/hh1357102/article/details/129865113