Ejemplo de decodificación de hardware

std::move, std::forward transferencia de memoria en C++11

ambiente:

Intel® Pentium® Silver N6000 @ 1.10GHz 4 núcleos 4 hilos

modelo de GPU

lspci -nn | grep -i vga

00:02.0 VGA compatible controller [0300]: Intel Corporation JasperLake [UHD Graphics] [8086:4e71] (rev 01)

sudo lshw -C display

*
-descripción de la pantalla: Controlador compatible con VGA
Producto: JasperLake [UHD Graphics]
Proveedor: Intel Corporation
ID física: 2
Información del bus: pci@0000:00:02.0
Nombre lógico: /dev/fb0
Versión: 01
Ancho: 64 bits
Reloj: 33MHz
capacidades: pciexpress msi pm vga_controller bus_master cap_list rom fb
configuración: profundidad=32 controlador=i915 latencia=0 modo=800x480 resolución=1920,1080 visual=truecolor xres=800 años=480
recursos: iomemory:600-5ff iomemory:400-3ff irq:137 memoria:6000000000-6000ffffff memoria:4000000000-400fffffff
ioport:3000(tamaño=64) memoria:c0000-dffff

Pruebe la velocidad de escritura io del sistema, cree un archivo de prueba, use el modo IO para escribir directamente una operación de datos de 1 GB

dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct

1+0 registros en
1+0 registros en salida 1073741824 bytes (1,1 GB, 1,0 GiB)
copiados, 23,9032 s, 44,9 MB/s

Pruebe la velocidad de lectura del sistema io, lea datos de 1 GB del archivo de prueba y deséchelos en el dispositivo /dev/null

dd if=testfile of=/dev/null bs=1G count=1 iflag=direct

1+0 registros en
1+0 registros en salida 1073741824 bytes (1,1 GB, 1,0 GiB)
copiados, 2,54137 s, 423 MB/s

Cómo utilizar:
./hw_decode vaapi juren-30s.mp4 juren-30s.mp4
Verificar reproducción:
ffplay -video_size 1920x1080 -pixel_format yuv420p juren-30s.yuv

av_hwdevice_iterate_types(type)

Si los parámetros que completa son incorrectos, entonces esta función es equivalente al siguiente comando para enumerar los métodos de aceleración de hardware admitidos.

ffmpeg -hwaccels

inserte la descripción de la imagen aquí type corresponde a las siguientes macros y encontrará las macros correspondientes según su entrada, como vaapi / qsv
type = av_hwdevice_find_type_by_name(argv[1])
enum AVHWDeviceType { AV_HWDEVICE_TYPE_NONE, AV_HWDEVICE_TYPE_VDPAU , AV_HWDEVICE_TYPE_CUDA , AV_HWDEVICE_T YPE_ VAAPI, AV_HWDEVICE_TYPE_DXVA2, AV_HWDEVICE_TYPE_QSV, AV_HWDEVICE_TYPE_VIDEOTOOLBOX, AV_HWDEVICE_TYPE_D3D11VA, AV_HWDEVICE_TYPE_DRM , AV_HWDEVICE_TYPE_OPENCL , AV_HWDEVICE_TYPE_MEDIACODEC, AV_HWDEVICE_TYPE_VULKAN, };

Esta función encontrará el decodificador correspondiente al vídeo en la estructura de flujo de input_ctx.

av_find_best_stream(input_ctx, AVMEDIA_TYPE_VIDEO, -1, -1, &decoder, 0);

Hay otra forma de encontrarlo manualmente en la estructura:

int video_index = -1;    
for (int i = 0; i < ic->nb_streams; i++)
    {
    
    
        if (ic->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO)
        {
    
    
            video_index = i;
        }
    }
        
ZlogInfo("id = %d\n", ic->streams[video_index]->codecpar->codec_id);

av_hwdevice_get_type_name(type)

La función anterior es opuesta a la función type = av_hwdevice_find_type_by_name("qsv"); el valor de retorno de av_hwdevice_get_type_name(type) es "qsv", y el valor de retorno de la primera es AV_HWDEVICE_TYPE_QSV.

av_hwframe_transfer_data(sw_frame, frame, 0))

Esta función consume una gran cantidad de CPU. Probada en una PC i9, 28 núcleos y 56 subprocesos, la decodificación física consume un total del 15% de la CPU y ella misma representa el 14%, lo que significa que la decodificación física en sí sólo ocupa el 1%.
Por cierto, la siguiente función también debería poder realizar la conversión de formato de píxeles de GPU a CPU

int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
              const int srcStride[], int srcSliceY, int srcSliceH,
              uint8_t *const dst[], const int dstStride[]);

// Copie solo el campo "metadatos" de src a dst.

av_frame_copy_props(sw_frame, frame);

@return the buffer size in bytes, a negative error code in case of failure
int av_image_get_buffer_size(enum AVPixelFormat pix_fmt, int width, int height, int align);

Calcule cuántos bytes se necesitan para almacenar dicha imagen de acuerdo con los parámetros dados

int av_hwframe_transfer_get_formats(AVBufferRef *hwframe_ctx,
                                    enum AVHWFrameTransferDirection dir,
                                    enum AVPixelFormat **formats, int flags);

La función anterior se utiliza para obtener el formato de píxeles del cuadro convertido por av_hwframe_transfer_data.

int av_hwframe_map(AVFrame *dst, const AVFrame *src, int flags);

La función anterior es para mapeo. El paso de convertir YUV/NV12 a RGB se cambia de conversión de CPU a conversión de GPU.
Simplemente use av_hwframe_map para reemplazar la posición original, y el tiempo es 2/3 del original. Referencia
La siguiente es una comparación de las dos funciones, la identificación total de CPU restante, la GPU ocupada y la identificación ocupada por la aplicación misma.

av_hwframe_transfer_data(sw_pframe, praw_frame, 0);//1080p50 CPU:65 GPU:75 cpu:128%
av_hwframe_map(sw_pframe, praw_frame, 0);//1080p50 CPU:75 GPU:72 cpu:103%

av_hwframe_unmap

#include <stdio.h>

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/pixdesc.h>
#include <libavutil/hwcontext.h>
#include <libavutil/opt.h>
#include <libavutil/avassert.h>
#include <libavutil/imgutils.h>

static AVBufferRef *hw_device_ctx = NULL;
static enum AVPixelFormat hw_pix_fmt;
static FILE *output_file = NULL;

static int hw_decoder_init(AVCodecContext *ctx, const enum AVHWDeviceType type)
{
    
    
    int err = 0;

    if ((err = av_hwdevice_ctx_create(&hw_device_ctx, type,
                                      NULL, NULL, 0)) < 0) {
    
    
        fprintf(stderr, "Failed to create specified HW device.\n");
        return err;
    }
    ctx->hw_device_ctx = av_buffer_ref(hw_device_ctx);

    return err;
}


static enum AVPixelFormat get_hw_format(AVCodecContext *ctx,
                                        const enum AVPixelFormat *pix_fmts)
{
    
    
    const enum AVPixelFormat *p;

    for (p = pix_fmts; *p != -1; p++) {
    
    
        if (*p == hw_pix_fmt)//确保我们需要的硬件加速像素格式，是被支持的
            return *p;
    }

    fprintf(stderr, "Failed to get HW surface format.\n");
    return AV_PIX_FMT_NONE;
}

static int decode_write(AVCodecContext *avctx, AVPacket *packet)
{
    
    
    AVFrame *frame = NULL, *sw_frame = NULL;
    AVFrame *tmp_frame = NULL;
    uint8_t *buffer = NULL;
    int size;
    int ret = 0;

    ret = avcodec_send_packet(avctx, packet);
    if (ret < 0) {
    
    
        fprintf(stderr, "Error during decoding\n");
        return ret;
    }

    while (1) {
    
    
        if (!(frame = av_frame_alloc()) || !(sw_frame = av_frame_alloc())) {
    
    
            fprintf(stderr, "Can not alloc frame\n");
            ret = AVERROR(ENOMEM);
            goto fail;
        }

        ret = avcodec_receive_frame(avctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
    
    
            av_frame_free(&frame);
            av_frame_free(&sw_frame);
            return 0;
        } else if (ret < 0) {
    
    
            fprintf(stderr, "Error while decoding\n");
            goto fail;
        }

        if (frame->format == hw_pix_fmt) {
    
    
            /* retrieve data from GPU to CPU */
            if ((ret = av_hwframe_transfer_data(sw_frame, frame, 0)) < 0) {
    
    
                fprintf(stderr, "Error transferring the data to system memory\n");
                goto fail;
            }
            tmp_frame = sw_frame;
        } else
            tmp_frame = frame;

        size = av_image_get_buffer_size(tmp_frame->format, tmp_frame->width,
                                        tmp_frame->height, 1);
        buffer = av_malloc(size);
        if (!buffer) {
    
    
            fprintf(stderr, "Can not alloc buffer\n");
            ret = AVERROR(ENOMEM);
            goto fail;
        }
        ret = av_image_copy_to_buffer(buffer, size,
                                      (const uint8_t * const *)tmp_frame->data,
                                      (const int *)tmp_frame->linesize, tmp_frame->format,
                                      tmp_frame->width, tmp_frame->height, 1);
        if (ret < 0) {
    
    
            fprintf(stderr, "Can not copy image to buffer\n");
            goto fail;
        }

        if ((ret = fwrite(buffer, 1, size, output_file)) < 0) {
    
    
            fprintf(stderr, "Failed to dump raw data.\n");
            goto fail;
        }

    fail:
        av_frame_free(&frame);
        av_frame_free(&sw_frame);
        av_freep(&buffer);
        if (ret < 0)
            return ret;
    }
}

int main(int argc, char *argv[])
{
    
    
    AVFormatContext *input_ctx = NULL;
    int video_stream, ret;
    AVStream *video = NULL;
    AVCodecContext *decoder_ctx = NULL;
    const AVCodec *decoder = NULL;
    AVPacket *packet = NULL;
    enum AVHWDeviceType type;
    int i;

    if (argc < 4) {
    
    
        fprintf(stderr, "Usage: %s <device type> <input file> <output file>\n", argv[0]);
        return -1;
    }

    type = av_hwdevice_find_type_by_name(argv[1]);
    if (type == AV_HWDEVICE_TYPE_NONE) {
    
    
        fprintf(stderr, "Device type %s is not supported.\n", argv[1]);
        fprintf(stderr, "Available device types:");
        while((type = av_hwdevice_iterate_types(type)) != AV_HWDEVICE_TYPE_NONE)
            fprintf(stderr, " %s", av_hwdevice_get_type_name(type));
        fprintf(stderr, "\n");
        return -1;
    }

    packet = av_packet_alloc();
    if (!packet) {
    
    
        fprintf(stderr, "Failed to allocate AVPacket\n");
        return -1;
    }

    /* open the input file */
    if (avformat_open_input(&input_ctx, argv[2], NULL, NULL) != 0) {
    
    
        fprintf(stderr, "Cannot open input file '%s'\n", argv[2]);
        return -1;
    }

    if (avformat_find_stream_info(input_ctx, NULL) < 0) {
    
    
        fprintf(stderr, "Cannot find input stream information.\n");
        return -1;
    }

    /* find the video stream information */
    ret = av_find_best_stream(input_ctx, AVMEDIA_TYPE_VIDEO, -1, -1, &decoder, 0);
    if (ret < 0) {
    
    
        fprintf(stderr, "Cannot find a video stream in the input file\n");
        return -1;
    }
    video_stream = ret;

    for (i = 0;; i++) {
    
    
    		//这里为什么要循环内？比如解码器是264但支持264的硬件解码有很多，英伟达，intel，intel也有不同的方法如qsv和vaapi，这里把这些一一列举。
    		//每种方法对应着一个AVCodecHWConfig结构体
        const AVCodecHWConfig *config = avcodec_get_hw_config(decoder, i);
        if (!config) {
    
    
            fprintf(stderr, "Decoder %s does not support device type %s.\n",
                    decoder->name, av_hwdevice_get_type_name(type));
            return -1;
        }
        if (config->methods & AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX &&
            config->device_type == type) {
    
    //比如qsv，这里type就是AV_HWDEVICE_TYPE_QSV
            hw_pix_fmt = config->pix_fmt;//支持的像素格式
            break;
        }
    }

    if (!(decoder_ctx = avcodec_alloc_context3(decoder)))
        return AVERROR(ENOMEM);

    video = input_ctx->streams[video_stream];
    if (avcodec_parameters_to_context(decoder_ctx, video->codecpar) < 0)
        return -1;

    decoder_ctx->get_format  = get_hw_format;

    if (hw_decoder_init(decoder_ctx, type) < 0)
        return -1;

    if ((ret = avcodec_open2(decoder_ctx, decoder, NULL)) < 0) {
    
    
        fprintf(stderr, "Failed to open codec for stream #%u\n", video_stream);
        return -1;
    }

    /* open the file to dump raw data */
    output_file = fopen(argv[3], "w+b");

    /* actual decoding and dump the raw data */
    while (ret >= 0) {
    
    
        if ((ret = av_read_frame(input_ctx, packet)) < 0)
            break;

        if (video_stream == packet->stream_index)
            ret = decode_write(decoder_ctx, packet);

        av_packet_unref(packet);
    }

    /* flush the decoder */
    ret = decode_write(decoder_ctx, NULL);

    if (output_file)
        fclose(output_file);
    av_packet_free(&packet);
    avcodec_free_context(&decoder_ctx);
    avformat_close_input(&input_ctx);
    av_buffer_unref(&hw_device_ctx);

    return 0;
}

enum {
    
    
    /**
     * The mapping must be readable.
     */
    AV_HWFRAME_MAP_READ      = 1 << 0,
    /**
     * The mapping must be writeable.
     */
    AV_HWFRAME_MAP_WRITE     = 1 << 1,
    /**
     * The mapped frame will be overwritten completely in subsequent
     * operations, so the current frame data need not be loaded.  Any values
     * which are not overwritten are unspecified.
     */
    AV_HWFRAME_MAP_OVERWRITE = 1 << 2,
    /**
     * The mapping must be direct.  That is, there must not be any copying in
     * the map or unmap steps.  Note that performance of direct mappings may
     * be much lower than normal memory.
     */
    AV_HWFRAME_MAP_DIRECT    = 1 << 3,
};

AV_HWFRAME_MAP_READ: Permite leer datos de cuadros mapeados.
AV_HWFRAME_MAP_WRITE: permite escribir datos de cuadros mapeados.
AV_HWFRAME_MAP_DIRECT: si es posible, utilice el mapeo directo para evitar la copia de datos. Si no se admite el mapeo directo, se recurre a una copia interna.
Completar 0 es usar el valor predeterminado. La prueba encontró que AV_HWFRAME_MAP_DIRECT y completar 0 no tienen diferencia en el consumo de CPU, lo que indica que el mapeo directo es el valor predeterminado.
Esta función es para mapeo y los datos directamente en el hardware de la GPU no se pueden usar directamente, por lo que se requiere mapeo. Puede haber una copia en el proceso de mapeo, pero esta copia debe usar dma pero no la CPU, la CPU no realiza programación, es solo la interacción entre la GPU y el hardware de memoria, y la CPU no es muy costosa. AV_HWFRAME_MAP_DIRECT es un mapeo directo sin copiar, que nos permite acceder a la memoria en la GPU.
El src de esta función debe ser un marco de hardware y dst puede ser un marco AV o un marco de hardware.
banderas: banderas de mapeo, utilizadas para especificar el comportamiento del mapeo. Puede ser una combinación OR bit a bit de los valores anteriores.

int av_hwframe_map(AVFrame *dst, const AVFrame *src, int flags)

Después de probar las funciones anteriores, se descubre que
copiar datos de 1080p50 a la memoria compartida consume 10 ID de la CPU y
envía datos de 1080p50 a través de udp, que casi no consume ID
y los coloca en la memoria de video de la GPU.

dst->datos[0]:139931650500608 dst->datos[0]:139931646380032
dst->datos[0]:139931642701824 dst->datos[0]:139931638581248

int av_hwframe_transfer_data(AVFrame *dst, const AVFrame *src, int flags);

El src/dst de esta función puede tener al menos un marco de hardware, que admita la realización de copias simples. Puede generar diferentes formatos de píxeles, la única premisa es que las siguientes funciones admiten el formato de píxeles y no puede realizar la conversión del tamaño de resolución.

           int num_formats = av_hwframe_transfer_get_formats(device_ref, AV_HWFRAME_TRANSFER_DIRECTION_FROM, &formats, 0);
            if (num_formats < 0)
            {
    
    
                // 错误处理
                av_buffer_unref(&device_ref);
                cout << "av_hwframe_transfer_get_formats" << endl;
                // return ;
            }
            // AV_HWFRAME_TRANSFER_DIRECTION_FROM
            cout << "9-09-00-0-0-0-0-0-0-0-0-0-0-0-0:" << num_formats << endl;
            // 遍历打印格式列表
            for (int i = 0; i < 5; i++)
            {
    
    
                cout << "22222222222222222222222222:" << num_formats << endl;
                const char *format_name = av_get_pix_fmt_name(formats[i]);
                // printf("Format: %s\n", format_name);
                cout << "Format:" << format_name << endl;
            }

enum AVHWFrameTransferDirection {
    
    
    /**
     * Transfer the data from the queried hw frame.
     */
    AV_HWFRAME_TRANSFER_DIRECTION_FROM,

    /**
     * Transfer the data to the queried hw frame.
     */
    AV_HWFRAME_TRANSFER_DIRECTION_TO,
};

Ejemplo de decodificación de hardware

Supongo que te gusta