Mediapipe learning records

Study documents

1. Google MediaPipe: The technical implementation behind device-side machine learning [complete solution] - Jishu Community - connecting developers and intelligent computing ecosystem

2. [Reprint] Google MediaPipe: The technical implementation behind device-side machine learning [complete solution]

3. MediaPipe framework structure - take a look

Handtracking package

1. Mediapipe - Encapsulate Mediapipe handtracking into a dynamic link library dll/so to embed gesture recognition function in desktop applications_HW140701's blog-CSDN blog

2. Mediapipe - Encapsulate Mediapipe handtracking into a dynamic link library dll/so to embed gesture recognition function in desktop applications - StubbornHuang Blog

HolisticTracking package

1. Mediapipe – Encapsulate Mediapipe HolisticTracking into a dynamic link library dll/so to embed full-body joint point recognition, gesture recognition, and hand-up and hand-off detection and recognition functions in desktop applications_HW140701’s Blog-CSDN Blog

2. Mediapipe – Encapsulate Mediapipe HolisticTracking into a dynamic link library dll/so to embed full-body joint point recognition, gesture recognition, and hand-up and hand-off detection and recognition functions in desktop applications - StubbornHuang Blog

Github link:

GitHub - HW140701/GoogleMediapipePackageDll: package google mediapipe hand and holistic tracking into a dynamic link library

Learning record:

1. Common machine learning pipelines

Common machine learning pipelines: from data inflow, data preprocessing, engine inference calculation, rendering results, data outflow

1. Image Transform: Video or image access requires image transformation such as Resize or cropping to a size accepted by the model, as well as rotating the image, etc.; 2. Image
To Tensor: The processed image is converted to a type recognized by the model, such as TensorType , but if the GPU is used for inference, it is also necessary to convert the CPU Tensor to the GPU Tensor, which will involve OpenGL and other operations. However, if it is GL, the previous step may be done on the GPU during the Image Transform process; 3.
Inference : This part is the core of reasoning. The model and input Tensor are given, and the output tensor is output;
4. Tensor To Landmarks: After getting the output tensor of reasoning, it needs to be translated into information about label points and detection points, such as x and y axes. Information, etc., also requires a lot of work;
5. Renderer: After obtaining the Landmark coordinate information, it still needs to be rendered with the original image, and the final rendered image will be placed on the mobile phone screen or video.

2. Comparison between common machine learning pipelines and mediapipe

 

MediaPipe has provided many preset Calculator, mainly divided into the following four categories:

1. Calculator for processing media data such as images
2. Calculator related to TensorFlow and TFLite for inference
3. Post-processing Calculator
4. Auxiliary Calculator
 

The implementation of Calculator is all C++. To use it, you need to define the interface type, that is, the input and output type
                                                               definition methods, open, process, close

3. Source code structure of Mediapipe

The structure of the core source code in MediaPipe is as follows. BUILD is the Bazel compilation file, calculators are the computing units of the graph structure, docs is the development documents, examples are the application examples of mediapipe, and framework is the framework including computing unit attributes, context environment, data flow management, and scheduling. Queues, thread pools, timestamps, etc., gpu is the dependency file of OpenGL, graphs is the graph structure of various examples of mediapipe (edge ​​detection, face detection, posture tracking, etc.), java is the dependency of Android application development, MediaPipe. tulsiproj is the relevant configuration file, models is the tflite model of each application, modules is the sample component, objc is the objective-c language related file, and util is the tool code.

The acceleration part of the framework is mainly in the framework. The source code includes the base class of the computing unit, the definition of the data type of the computing unit, the state control of the computing unit, the context management of the computing unit, the input stream and output stream in the graph structure, the scheduler queue, and the thread. Pools, timestamp synchronization, etc. The following mainly analyzes how the scheduler queue (scheduler_queue), thread pool (thread_pool), and timestamp (timestamp) can achieve data flow timestamp synchronization by scheduling the data flow, and then GPU calculation and rendering, thereby achieving the maximum data throughput of the mediapipe pipeline.

4. Some understanding of the mediapipe framework

(1) Using MediaPipe, machine learning tasks can be constructed as a data flow pipeline represented by a graphical module, which can include inference models and streaming media processing functions.

(2) MediaPipe solves these problems by abstracting individual perception models into modules and connecting them into maintainable graphs.

(3) With MediaPipe, the data stream processing pipeline can be built as a modular component graph, including inference processing models and media processing functions. Input video and audio stream data into the graph, and process the data through the graph model pipeline built by each functional module. The final result data, such as object detection or face point annotation, is output from the graph. The input is the video data frame collected by the camera and output to the display screen through the processing of each module in the figure.

(4) MediaPipe does not define the internal structure of the neural network, but specifies a larger-scale processing graph embedding one or more models.

5. Understanding of mediapipe build files


cc_binary declares a build rule for compiling and generating an executable file.

*_binary rules: Specify the generation of executable programs in the corresponding language. cc_binary represents a c++ executable program, and jave_binary represents a java executable program.

The executable file name (target name) is specified by the name attribute. The value type of the name attribute can be seen to be of string type.
The srcs attribute specifies the source file. The type of the value of the srcs attribute can be seen as a list of strings.

srcs represents source files and deps represents dependencies.

The parameter visibility = ["//visibility:public"] in cc_library indicates that the library is visible to all packages.

When we use Google's projects, there is another class called absl::Status.

6. Some files that each solution in mediapipe depends on when compiling

1、Handtracking

 

 2、Face_detection

 

 

Guess you like

Origin blog.csdn.net/pingchangxin_6/article/details/128148053