Take yolov8-pose as a case to learn how to write the callback function of deepstream

1. Description of pipeline elements:

In the given code, the following elements are used to build the DeepStream pipeline:

  1. source: This element is an input source for reading a video stream from a file. It can be any video input source supported by GStreamer, such as file, camera, network stream, etc.

  2. streammux: This element is a stream multiplexer, which is used to combine multiple streams into one stream and pack multiple frames into batch data. It can combine video streams from different sources into a unified input.

  3. pgie: This element is the primary inference engine (Primary GIE), which is used to perform object detection and inference. It performs inference based on a given profile, recognizes objects in an image and extracts their features.

  4. nvtracker: This element is the tracker in DeepStream, which is used to track the recognized objects. It uses previously recognized object features and features in the current frame for matching and tracking to achieve continuous tracking of objects.

  5. nvvidconv: This element is a video converter for converting video frame format from NV12 to RGBA. In some cases, it is necessary to convert video frames from one format to another to accommodate different elements.

  6. nvosd: This element is an On-Screen Display (OSD) element, which is used to draw recognition results, bounding boxes, labels and other information on the converted RGBA buffer.

  7. nvvidconv_postosd: This element is a video converter used to convert the converted RGBA format video frame to NV12 format again. This is common before sending video frames to the encoder.

  8. caps: This element is the Caps Filter element, which is used to set the constraints of the video format. It can specify specific formats and parameters for input or output streams to ensure stream compatibility.

  9. encoder: This element is a video encoder used to encode raw video frames into a specific video encoding format, such as H.264 or H.265. It sets the bitrate, encoding quality, etc. according to the specified parameters.

  10. rtppay: This element is used to pack encoded data into RTP (Real-time Transport Protocol) packets. RTP is a commonly used real-time streaming protocol.

  11. sink: This element is a UDPSink element, which is used to send RTP data packets to the network through UDP protocol. It specifies where the data is received by specifying the destination IP address and port number.

The above are some key elements used in the given code snippet, which play different roles in the DeepStream pipeline, responsible for functions such as video input, inference, tracking, conversion, drawing, and output.

2. Construction of pipline:

This is a DeepStream pipeline built with GStreamer. Let's explain the main build process in the code step by step:

  1. First, a series of variables and parameters required to build the pipeline are defined, including GstElementpointer, bit rate, encoding format, port number, etc.

  2. sourceNext, the various elements of GStreamer, such as , streammux, pgie, etc. , are created nvtracker. These elements handle functions such as input, inference, tracking, and output of video streams.

  3. Parameters for each element are set. For example, set streammuxbatch size and output resolution, set pgieconfiguration file path, set nvtrackerproperties, etc.

  4. Add individual elements to the pipeline. Use gst_bin_add_many()functions to add elements to GStreamer's pipeline for management and chaining.

  5. Connect data flow between elements. gst_element_link_many()Link elements together using functions to define the flow path of data.

  6. Add probes. Add probes gst_pad_add_probe()to pgie_src_padand with functions osd_sink_padfor fetching metadata and manipulating buffers.

  7. Create an RTSP server. Use to gst_rtsp_server_new()create an RTSP server, set the service port number of the server, and mount the RTSP stream to the server.

  8. Set the pipeline state to "playing". Use gst_element_set_state()the function to set the pipeline to the playing state, starting the processing and output of the video stream.

  9. Start the main loop. Use g_main_loop_run()a function to start GStreamer's main loop, which is used to process events and messages.

  10. Waiting to exit. Waits for the main loop to end until an exit signal is received.

  11. Clean up and release resources. After exiting the main loop, cleanup is done by setting the pipeline status to NULL, releasing pipeline resources, and cleaning up other resources.

The above is the main process of the code to build the DeepStream pipeline. This pipeline is used to read video files, perform inference and tracking, and then output the processing results and publish them to the network via RTSP streams.

3. Main function of pgie probe function

The main functions of this Pgie callback function are as follows:

  1. Get the buffer of GStreamer, and get batch metadata from it.

  2. Traverse the metadata of each frame.

  3. For each frame, iterate over its user metadata.

  4. If the type of user metadata is tensor output, convert it to the NvDsInferTensorMeta type.

  5. Get information about the input shape and output layer of the model.

  6. Convert the data of the output layer from C type to Python numpy array.

  7. Post-processing the output of the model, including adjusting dimensions, adding fake class probabilities, mapping coordinates to screen sizes, etc.

  8. Further post-processing is performed on the processed output, including non-maximum suppression, etc.

  9. If there are valid prediction results, these results are added to the frame's object metadata and displayed on the frame.

  10. Update the frame rate of the frame.

  11. Marks the frame as having been inferred.

The general steps to implement this function are as follows:

  1. Get the buffer of GStreamer, and get batch metadata from it. This step uses the gst_buffer_get_nvds_batch_meta() function to get batch metadata.

  2. Traverse the metadata of each frame. This step can be done in C++ using standard iterators or loops.

  3. For each frame, iterate over its user metadata. This step can be done in C++ using standard iterators or loops.

  4. If the type of user metadata is tensor output, convert it to the NvDsInferTensorMeta type. This step uses NvDsInferNetworkInfo and NvDsInferLayerInfo to get this information.

  5. Get information about the input shape and output layer of the model. This step can be done in C++ using the DeepStream API.

  6. Convert the data of the output layer from C type to C++ array or vector. This step can be done in C++ using standard arrays or vectors.

  7. Post-processing the output of the model, including adjusting dimensions, adding fake class probabilities, mapping coordinates to screen sizes, etc. You can use the nvds_add_display_meta_to_frame() function to add display metadata to the frame.

  8. Further post-processing is performed on the processed output, including non-maximum suppression, etc. This step may need to use or implement the corresponding algorithm in C++.

  9. If there are valid prediction results, these results are added to the frame's object metadata and displayed on the frame. This step can be done in C++ using the DeepStream API.

  10. Update the frame rate of the frame. In this step, frame_meta->bInferDone can be set to true to mark that the frame has been inferred.

  11. Marks the frame as having been inferred. This step can be done in C++ using the DeepStream API.

The above are the general steps to convert this Python function to C++. The specific code implementation may vary according to your specific needs and environment.

4. Implement this callback function step by step

4.1 Get the buffer of GStreamer

static GstPadProbeReturn pose_src_pad_buffer_probe(GstPad *pad, GstPadProbeInfo *info, gpointer u_data)
{
    
    
    g_print("pose_src_pad_buffer_probe called\n");

    // 获取GstBuffer
    GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info);
    if (!buf) {
    
    
        g_print("Unable to get GstBuffer\n");
        return GST_PAD_PROBE_OK;
    }

    // 获取batch metadata
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    if (!batch_meta) {
    
    
        g_print("Unable to get batch metadata\n");
        return GST_PAD_PROBE_OK;
    }

    // 打印一些信息
    g_print("Successfully got GstBuffer and batch metadata\n");
    g_print("Batch meta frame count: %d\n", batch_meta->num_frames_in_batch);

    return GST_PAD_PROBE_OK;
}

4.2 遍历: batch metadata -> frame_meta_list -> user metadata

static GstPadProbeReturn pose_src_pad_buffer_probe(GstPad *pad, GstPadProbeInfo *info, gpointer u_data)
{
    
    
    g_print("pose_src_pad_buffer_probe called\n");

    // 获取GstBuffer
    GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info);
    if (!buf) {
    
    
        g_print("Unable to get GstBuffer\n");
        return GST_PAD_PROBE_OK;
    }

    // 获取batch metadata
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    if (!batch_meta) {
    
    
        g_print("Unable to get batch metadata\n");
        return GST_PAD_PROBE_OK;
    }

    // 遍历每一帧的元数据
    for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
    
    
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);

        // 对于每一帧,遍历其用户metadata
        for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) {
    
    
            NvDsUserMeta *user_meta = (NvDsUserMeta *)(l_user->data);
            g_print("Successfully got user metadata\n");
            g_print("User metadata type: %d\n", user_meta->base_meta.meta_type);
        }
    }

    return GST_PAD_PROBE_OK;
}
User metadata type: 12

This is part of the definition of the NvDsMetaType enumeration:

typedef enum
{
  NVDS_META_INVALID = 0,
  NVDS_META_FRAME_INFO,
  NVDS_META_EVENT_MSG,
  NVDS_META_STREAM_INFO,
  NVDS_META_SOURCE_INFO,
  NVDS_META_USER,
  NVDS_META_RESERVED_1,
  NVDS_META_RESERVED_2,
  NVDS_META_RESERVED_3,
  NVDS_META_RESERVED_4,
  NVDS_META_RESERVED_5,
  NVDS_META_RESERVED_6,
  NVDSINFER_TENSOR_OUTPUT_META = 12,
  /* More types */
} NvDsMetaType;

This means that this user metadata is an inference tensor output metadata, which contains the results of model inference

4.3 Take out this data

When using this, there is an explanation of what the corresponding header file is, and there are also comments in my code
https://docs.nvidia.com/metropolis/deepstream/4.0/dev-guide/DeepStream_Development_Guide/baggage/structNvDsInferTensorMeta.html

static GstPadProbeReturn pose_src_pad_buffer_probe(GstPad *pad, GstPadProbeInfo *info, gpointer u_data)
{
    
    
    g_print("pose_src_pad_buffer_probe called\n");

    // 获取GstBuffer
    GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info);
    if (!buf) {
    
    
        g_print("Unable to get GstBuffer\n");
        return GST_PAD_PROBE_OK;
    }

    // 获取batch metadata
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    if (!batch_meta) {
    
    
        g_print("Unable to get batch metadata\n");
        return GST_PAD_PROBE_OK;
    }

    // 遍历每一帧的元数据
    for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
    
    
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);

        // 对于每一帧,遍历其用户metadata
        for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) 
        {
    
    
            NvDsUserMeta *user_meta = (NvDsUserMeta *)(l_user->data);
            
            // 如果用户metadata的类型是tensor output,那么将其转换为NvDsInferTensorMeta类型
            if (user_meta->base_meta.meta_type == 12) {
    
    
                NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta *)(user_meta->user_meta_data);
                g_print("Successfully casted user metadata to tensor metadata\n");
            }
        }
    }

    return GST_PAD_PROBE_OK;
}

4.4 Use this changed Tensor_Meta to get the input and output of the model

This step is done to ensure that the data is read correctly, because this project is done in Yolov8-pose, the input is 3x640x640 and the output is 56x8400

56 = bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56

If yolov7-pose is used here, the output is 57

bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57

static GstPadProbeReturn pose_src_pad_buffer_probe(GstPad *pad, GstPadProbeInfo *info, gpointer u_data)
{
    
    
    g_print("pose_src_pad_buffer_probe called\n");

    // 获取GstBuffer
    GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info);
    if (!buf) {
    
    
        g_print("Unable to get GstBuffer\n");
        return GST_PAD_PROBE_OK;
    }

    // 获取batch metadata
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    if (!batch_meta) {
    
    
        g_print("Unable to get batch metadata\n");
        return GST_PAD_PROBE_OK;
    }

    // 遍历每一帧的元数据
    for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
    
    
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);

        // 对于每一帧,遍历其用户metadata
        for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) 
        {
    
    
            NvDsUserMeta *user_meta = (NvDsUserMeta *)(l_user->data);
            
            // 如果用户metadata的类型是tensor output,那么将其转换为NvDsInferTensorMeta类型
            if (user_meta->base_meta.meta_type == 12) {
    
    
                NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta *)(user_meta->user_meta_data);

                // 获取模型的输入形状
                NvDsInferNetworkInfo network_info = tensor_meta->network_info;
                g_print("Model input shape: %d x %d x %d\n", network_info.channels, network_info.height, network_info.width);

                // 获取模型的输出层信息
                for (unsigned int i = 0; i < tensor_meta->num_output_layers; i++) {
    
    
                    NvDsInferLayerInfo output_layer_info = tensor_meta->output_layers_info[i];
                    NvDsInferDims dims = output_layer_info.inferDims;
                    g_print("Output layer %d: %s, dimensions: ", i, output_layer_info.layerName);
                    for (int j = 0; j < dims.numDims; j++) {
    
    
                        g_print("%d ", dims.d[j]);
                    }
                    g_print("\n");
                }

            }
        }
    }

    return GST_PAD_PROBE_OK;
}

Align with the results of TensorRT reasoning

INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT images          3x640x640       
1   OUTPUT kFLOAT output0         56x8400  

Here is what we printed out

Model input shape: 3 x 640 x 640
Output layer 0: output0, dimensions: 56 8400

Guess you like

Origin blog.csdn.net/bobchen1017/article/details/131669184