Mediapipe – Encapsulate Mediapipe HolisticTracking into a dynamic link library dll/so to embed full-body joint point recognition, gesture recognition, and hand-up and hand-off detection and recognition functions into desktop applications.

1 Mediapipe HolisticTracking

Mediapipe Holistic Tracking can perform real-time tracking and detection of a total of 543 key points on the body, face and hands, including 468 joint points on the face, 33 key points on the body, and 21 key points on each hand.

Insert image description here
For the storage order of all key points and the index map of annotations, please refer to my other article: https://www.stubbornhuang.com/1916/

Test program for packaged dll:
Insert image description here

1.1 The purpose of encapsulating Mediapipe Holistic Tracking

Previously, I tried to dynamically encapsulate Mediapipe's hand tracking function HandTracking. The essence is to do gesture recognition and arm lifting and letting go detection based on HandTracking. HandTracking has no problem with gesture recognition, but it is doing arm raising and letting go detection. The process mainly compares the position of the joint point of the wrist at the height of the video frame, which results in the hand-raising and hand-off detection of the arm based on HandTracking not being very accurate.

In view of this problem and the experience of encapsulating the HandTracking function in the early stage, if the HolisticTracking function of Mediapipe can be encapsulated to directly detect the joint points of the whole body, then the arm lifting and letting go detection only needs to compare the y coordinates of the wrist and elbow. , this method is undoubtedly the most accurate.

2 Mediapipe C++ compilation environment construction under Windows

You can refer to my article:

Compile Mediapipe Windows C++ environment.

If you can run through the above compilation process, it means that there is no problem in building the entire compilation environment, and you can read the following steps.

3 Mediapipe HolisticTracking package

dll related codes, compiled files and test programs are open source on Github:

Welcome everyone to star!

The bazel compilation files and code files required for this project are located in the warehouse dll/holistic_tracking_dll folder, and the test project of the packaged dll is located under the MediapipeTest.sln project of dll_use_example.

3.1 dll interface design

The dll interface design is as follows:

#ifndef HOLISTIC_TRACKING_API_H
#define HOLISTIC_TRACKING_API_H

#define EXPORT

/* 定义动态链接库dll的导出 */
#include <malloc.h>
#ifdef _WIN32
#ifdef EXPORT
#define EXPORT_API __declspec(dllexport)
#else
#define EXPORT_API __declspec(dllimport)
#endif
#else
#include <stdlib.h>

#ifdef EXPORT
#define EXPORT_API __attribute__((visibility ("default")))
#else
#endif

#endif


#ifdef __cplusplus
extern "C" {
    
    
#endif 

#ifndef EXPORT_API
#define EXPORT_API
#endif

	/*
	@brief 初始化Google Mediapipe
	@param[in] model_path 需要加载的模型路径
	@return 返回操作成功或者失败
		0 失败
		1 成功
	*/
	EXPORT_API int MediapipeHolisticTrackingInit(const char* model_path);

	/*
	@brief 检测视频帧
	@param[in] image_width 视频帧宽度
	@param[in] image_height 视频帧高度
	@param[in] image_data 视频帧数据
	@param[in] show_result_image 是否显示结果图片
	@param[out] gesture_result - 手势识别结果
	@return 返回操作成功或者失败
		0 失败
		1 成功
	*/
	EXPORT_API int MediapipeHolisticTrackingDetectFrameDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);

	/*
	@brief 检测摄像头
	@param[in] show_image 是否显示结果图片
	@return 返回操作成功或者失败
	0 失败
	1 成功
	*/
	EXPORT_API int MediapipeHolisticTrackingDetectCamera(bool show_image = false);
	/*
	@brief Google Mediapipe释放
	@return 返回操作成功或者失败
		0 失败
		1 成功
	*/
	EXPORT_API int MediapipeHolisticTrackingRelease();


#ifdef __cplusplus
}
#endif 

#endif // !HOLISTIC_TRACKING_API_H

3.2 dll calling process

First, pass the model file of the Holistic through the MediapipeHolisticTrackingInit interface function to initialize the Holistic graph.

Then directly call the MediapipeHolisticTrackingDetectFrameDirect interface function to pass the OpenCV video frame data that needs to be detected for detection, and return the detection result through the given int array pointer. If you need the identification coordinates of the joint points of each part, you can refer to the project code to add a new detection interface to return coordinate data.

Or use the MediapipeHolisticTrackingDetectCamera interface function to directly use OpenCV inside the dll to open the camera for holistic tracking and detection. Of course, this function is only used for testing functions.

Finally, the memory is released through the MediapipeHolisticTrackingRelease function after all video frame processing is completed or after the video processing is completed.

3.3 Packaging process

3.3.1 How to determine the data input and output streams of Graph

For the tracking function of holistic tracking on the CPU, the holistic_tracking_cpu.pbtxt file can be found in the mediapipe\mediapipe\graphs\holistic_tracking folder of Mediapipe's official warehouse.

The contents of the file are as follows:

# Tracks and renders pose + hands + face landmarks.

# CPU image. (ImageFrame)
input_stream: "input_video"

# CPU image with rendered results. (ImageFrame)
output_stream: "output_video"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:output_video"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
  node_options: {
    [type.googleapis.com/mediapipe.FlowLimiterCalculatorOptions] {
      max_in_flight: 1
      max_in_queue: 1
      # Timeout is disabled (set to 0) as first frame processing can take more
      # than 1 second.
      in_flight_timeout: 0
    }
  }
}

node {
  calculator: "HolisticLandmarkCpu"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "POSE_LANDMARKS:pose_landmarks"
  output_stream: "POSE_ROI:pose_roi"
  output_stream: "POSE_DETECTION:pose_detection"
  output_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}

# Gets image size.
node {
  calculator: "ImagePropertiesCalculator"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "SIZE:image_size"
}

# Converts pose, hands and face landmarks to a render data vector.
node {
  calculator: "HolisticTrackingToRenderData"
  input_stream: "IMAGE_SIZE:image_size"
  input_stream: "POSE_LANDMARKS:pose_landmarks"
  input_stream: "POSE_ROI:pose_roi"
  input_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  input_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
  input_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "RENDER_DATA_VECTOR:render_data_vector"
}

# Draws annotations and overlays them on top of the input images.
node {
  calculator: "AnnotationOverlayCalculator"
  input_stream: "IMAGE:throttled_input_video"
  input_stream: "VECTOR:render_data_vector"
  output_stream: "IMAGE:output_video"
}

From lines 3-7 in the holistic_tracking_cpu.pbtxt file, it is found that the original input stream of the Graph is: input_video, and the final output stream is: output_video. Then on lines 39-48 of the file there is the following:

node {
  calculator: "HolisticLandmarkCpu"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "POSE_LANDMARKS:pose_landmarks"
  output_stream: "POSE_ROI:pose_roi"
  output_stream: "POSE_DETECTION:pose_detection"
  output_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}

This node shows the positions of the key points of the body, face and hands that we need for POSE_LANDMARKS, FACE_LANDMARKS, LEFT_HAND_LANDMARKS, and RIGHT_HAND_LANDMARKS. With this information, we can add the output streams we need when initializing the Holistic graph. into the graph.

3.3.2 Code implementation

3.3.2.1 Holistic tracking key point detection class

HolisticTrackingDetect.h

#ifndef HOLISTIC_TRACKING_DETECT_H
#define HOLISTIC_TRACKING_DETECT_H

#include <cstdlib>
#include "absl/flags/flag.h"
#include "absl/flags/parse.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/file_helpers.h"
#include "mediapipe/framework/port/opencv_highgui_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_video_inc.h"
#include "mediapipe/framework/port/parse_text_proto.h"
#include "mediapipe/framework/port/status.h"

#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/rect.pb.h"

namespace GoogleMediapipeDetect {
    
    

	class HolisticTrackingDetect
	{
    
    
	public:
		HolisticTrackingDetect();
		virtual~HolisticTrackingDetect();

	public:
		int InitModel(const char* model_path);
		int DetectImageDirect(int image_width, int image_height, void* image_data ,int* detect_result,bool show_result_image = false);
		int DetectCamera(bool show_image = false);
		int Release();

	private:
		absl::Status Mediapipe_InitGraph(const char* model_path);
		absl::Status Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);
		absl::Status Mediapipe_RunMPPGraph_Camera(bool show_image = false);
		absl::Status Mediapipe_ReleaseGraph();

	private:
		bool m_bIsInit;
		bool m_bIsRelease;

		mediapipe::CalculatorGraph m_Graph;

		const char* m_Video_InputStreamName;

		const char* m_Video_OutputStreamName;
		const char* m_PoseLandmarks_OutputStreamName;
		const char* m_LeftHandLandmarks_OutputStreamName;
		const char* m_RightHandLandmarks_OutputStreamName;
		const char* m_FaceLandmarks_OutputStreamName;
		

		std::unique_ptr<mediapipe::OutputStreamPoller> m_pVideoPoller;
		std::unique_ptr<mediapipe::OutputStreamPoller> m_pPoseLandmarksPoller;
		std::unique_ptr<mediapipe::OutputStreamPoller> m_pLeftHandLandmarksPoller;
		std::unique_ptr<mediapipe::OutputStreamPoller> m_pRightHandLandmarksPoller;
		std::unique_ptr<mediapipe::OutputStreamPoller> m_pFaceLandmarksPoller;
	};
}

#endif // !HOLISTIC_TRACKING_DETECT_H

HolisticTrackingDetect.cpp

#include <vector>

#include "HolisticTrackingDetect.h"
#include "GestureRecognition.h"
#include "ArmUpAndDownRecognition.h"

GoogleMediapipeDetect::HolisticTrackingDetect::HolisticTrackingDetect()
{
    
    
	m_bIsInit = false;
	m_bIsRelease = false;

	m_Video_InputStreamName = "input_video";

	m_Video_OutputStreamName = "output_video";
	m_PoseLandmarks_OutputStreamName = "pose_landmarks";
	m_LeftHandLandmarks_OutputStreamName = "left_hand_landmarks";
	m_RightHandLandmarks_OutputStreamName = "right_hand_landmarks";
	m_FaceLandmarks_OutputStreamName = "face_landmarks";

	m_pVideoPoller = nullptr;
	m_pPoseLandmarksPoller = nullptr;
	m_pLeftHandLandmarksPoller = nullptr;
	m_pRightHandLandmarksPoller = nullptr;
	m_pFaceLandmarksPoller = nullptr;
}

GoogleMediapipeDetect::HolisticTrackingDetect::~HolisticTrackingDetect()
{
    
    
	if (m_bIsInit && !m_bIsRelease)
	{
    
    
		Release();
	}
}

int GoogleMediapipeDetect::HolisticTrackingDetect::InitModel(const char* model_path)
{
    
    
	absl::Status run_status = Mediapipe_InitGraph(model_path);
	if (!run_status.ok())
		return 0;
	m_bIsInit = true;
	return  1;
}

int GoogleMediapipeDetect::HolisticTrackingDetect::DetectImageDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
    
    
	if (!m_bIsInit)
		return 0;

	absl::Status run_status = Mediapipe_RunMPPGraph_Direct(image_width, image_height, image_data, detect_result,show_result_image);
	if (!run_status.ok()) {
    
    
		return 0;
	}
	return 1;
}

int GoogleMediapipeDetect::HolisticTrackingDetect::DetectCamera(bool show_image)
{
    
    
	if (!m_bIsInit)
		return 0;
	absl::Status run_status = Mediapipe_RunMPPGraph_Camera(show_image);
	if (!run_status.ok()) {
    
    
		return 0;
	}
	return 1;

}

int GoogleMediapipeDetect::HolisticTrackingDetect::Release()
{
    
    
	absl::Status run_status = Mediapipe_ReleaseGraph();
	if (!run_status.ok()) {
    
    
		return 0;
	}
	m_bIsRelease = true;
	return 1;
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_InitGraph(const char* model_path)
{
    
    
	std::string calculator_graph_config_contents;
	MP_RETURN_IF_ERROR(mediapipe::file::GetContents(model_path, &calculator_graph_config_contents));
	std::cout << "mediapipe::file::GetContents success" << std::endl;

	mediapipe::CalculatorGraphConfig config =
		mediapipe::ParseTextProtoOrDie<mediapipe::CalculatorGraphConfig>(
			calculator_graph_config_contents);

	MP_RETURN_IF_ERROR(m_Graph.Initialize(config));
	std::cout << "m_Graph.Initialize(config) success" << std::endl;

	// 1 视频输出
	auto videoOutputStream = m_Graph.AddOutputStreamPoller(m_Video_OutputStreamName);
	assert(videoOutputStream.ok());
	m_pVideoPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(videoOutputStream.value()));

	// 2 PoseLandmarks输出
	mediapipe::StatusOrPoller poseLandmarks = m_Graph.AddOutputStreamPoller(m_PoseLandmarks_OutputStreamName);
	assert(poseLandmarks.ok());
	m_pPoseLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(poseLandmarks.value()));

	// 3 LeftHandLandmarks输出
	mediapipe::StatusOrPoller leftHandLandmarks = m_Graph.AddOutputStreamPoller(m_LeftHandLandmarks_OutputStreamName);
	assert(leftHandLandmarks.ok());
	m_pLeftHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(leftHandLandmarks.value()));

	// 4 RightHandLandmarks输出
	mediapipe::StatusOrPoller rightHandLandmarks = m_Graph.AddOutputStreamPoller(m_RightHandLandmarks_OutputStreamName);
	assert(rightHandLandmarks.ok());
	m_pRightHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(rightHandLandmarks.value()));

	// 5 FaceLandmarks输出
	mediapipe::StatusOrPoller faceLandmarks = m_Graph.AddOutputStreamPoller(m_FaceLandmarks_OutputStreamName);
	assert(faceLandmarks.ok());
	m_pFaceLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(faceLandmarks.value()));

	MP_RETURN_IF_ERROR(m_Graph.StartRun({
    
    }));
	std::cout << "----------------Graph StartRun Success---------------------" << std::endl;
	return absl::OkStatus();
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
    
    
	/*----- 1 构造cv::Mat对象 -----*/
	cv::Mat camera_frame(cv::Size(image_width, image_height), CV_8UC3, (uchar*)image_data);
	cv::cvtColor(camera_frame, camera_frame, cv::COLOR_BGR2RGB);
	// 水平翻转输入图像
	cv::flip(camera_frame, camera_frame, 1);
	//std::cout << "cv::Mat对象构建完成" << std::endl;

	/*----- 2 将OpenCV Mat转换为ImageFrame -----*/
	auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
		mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
		mediapipe::ImageFrame::kDefaultAlignmentBoundary);
	cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
	camera_frame.copyTo(input_frame_mat);
	//std::cout << "将OpenCV Mat转换为ImageFrame完成" << std::endl;

	/*----- 3 发送图片到图中推理 -----*/
	size_t frame_timestamp_us =
		(double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;

	MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
		m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
		.At(mediapipe::Timestamp(frame_timestamp_us))));
	//std::cout << "发送图片到图中推理完成" << std::endl;

	/*----- 4 得到结果 -----*/

	// 1 视频输出结果帧
	mediapipe::Packet packet;
	if (!m_pVideoPoller->Next(&packet))
	{
    
    
		return absl::InvalidArgumentError("no next packet");
	}
	if (show_result_image)
	{
    
    
		// 从视频输出获取mediapipe::ImageFrame结果
		auto& output_frame = packet.Get<mediapipe::ImageFrame>();

		// 转换mediapipe::ImageFrame为cv::Mat
		cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);

		// 显示cv::Mat结果
		cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
		cv::Mat dst;
		cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols, output_frame_mat.rows));
		cv::imshow("MediapipeHolistic", dst);
		cv::waitKey(1);
	}

	// 2 PoseLandmarks
	mediapipe::Packet poseeLandmarksPacket;
	int left_arm_result = (int)ArmUpDown::NoResult;
	int right_arm_result = (int)ArmUpDown::NoResult;
	if (m_pPoseLandmarksPoller->QueueSize() != 0)
	{
    
    
		if (m_pPoseLandmarksPoller->Next(&poseeLandmarksPacket))
		{
    
    
			auto& output_landmarks = poseeLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
			//std::cout << "PoseLandmarks size:" << output_landmarks.landmark_size() << std::endl;

			std::vector<Point2D> posePoints;
			posePoints.clear();

			for (int i = 0; i < output_landmarks.landmark_size(); ++i)
			{
    
    
				Point2D tempPoint2D;
				const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
				float x = landmark.x() * camera_frame.cols;
				float y = landmark.y() * camera_frame.rows;
				tempPoint2D.x = x;
				tempPoint2D.y = y;

				posePoints.emplace_back(tempPoint2D);
			}

			ArmUpAndDownRecognition armUpAndDownRecognition;
			armUpAndDownRecognition.RecognizeProcess(posePoints,left_arm_result,right_arm_result);
			//std::cout << "手臂抬手放手识别结果:" << poseDetectResult << std::endl;
		}
	}
	detect_result[0] = left_arm_result;
	detect_result[1] = right_arm_result;

	// 3 LeftHandLandmarks
	mediapipe::Packet leftHandLandmarksPacket;
	int leftHandDetectResult = Gesture::NoGesture;
	if (m_pLeftHandLandmarksPoller->QueueSize() > 0)
	{
    
    
		if (m_pLeftHandLandmarksPoller->Next(&leftHandLandmarksPacket))
		{
    
    
			auto& output_landmarks = leftHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
			//std::cout << "LeftHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;

			std::vector<Point2D> singleGesturePoints;
			singleGesturePoints.clear();

			for (int i = 0; i < output_landmarks.landmark_size(); ++i)
			{
    
    
				Point2D tempPoint2D;
				const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
				float x = landmark.x() * camera_frame.cols;
				float y = landmark.y() * camera_frame.rows;
				tempPoint2D.x = x;
				tempPoint2D.y = y;

				singleGesturePoints.emplace_back(tempPoint2D);
			}

			GestureRecognition gestureRecognition;
			leftHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
			//std::cout << "左手手势识别结果:" << leftHandDetectResult << std::endl;
		}
	}
	detect_result[2] = leftHandDetectResult;

	// 4 RightHandLandmarks
	mediapipe::Packet rightHandLandmarksPacket;
	int rightHandDetectResult = Gesture::NoGesture;
	if (m_pRightHandLandmarksPoller->QueueSize() > 0)
	{
    
    
		if (m_pRightHandLandmarksPoller->Next(&rightHandLandmarksPacket))
		{
    
    
			auto& output_landmarks = rightHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
			//std::cout << "RightHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;

			std::vector<Point2D> singleGesturePoints;
			singleGesturePoints.clear();

			for (int i = 0; i < output_landmarks.landmark_size(); ++i)
			{
    
    
				Point2D tempPoint2D;
				const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
				float x = landmark.x() * camera_frame.cols;
				float y = landmark.y() * camera_frame.rows;
				tempPoint2D.x = x;
				tempPoint2D.y = y;

				singleGesturePoints.emplace_back(tempPoint2D);
			}

			GestureRecognition gestureRecognition;
			rightHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
			//std::cout << "右手手势识别结果:" << rightHandDetectResult << std::endl;

		}
	}
	detect_result[3] = rightHandDetectResult;

	// 4 FaceLandmarks
	//mediapipe::Packet faceLandmarksPacket;
	//if (m_pFaceLandmarksPoller->QueueSize() > 0)
	//{
    
    
	//	if (m_pFaceLandmarksPoller->Next(&faceLandmarksPacket))
	//	{
    
    
	//		auto& output_landmarks = faceLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
	//		std::cout << "FaceLandmarks size:" << output_landmarks.landmark_size() << std::endl;

	//		for (int i = 0; i < output_landmarks.landmark_size(); ++i)
	//		{
    
    
	//			const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
	//			float x = landmark.x() * camera_frame.cols;
	//			float y = landmark.y() * camera_frame.rows;
	//		}
	//	}
	//}

	return absl::OkStatus();
}


absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Camera(bool show_image)
{
    
    
	std::string cvWindowName = "MediapipeHolistic";

	// 打开OpenCV摄像头
	cv::VideoCapture capture(0);
	if (!capture.isOpened())
	{
    
    
		return absl::InvalidArgumentError("cv camera is not open");
	}

	bool grab_frames = true;
	while (grab_frames) {
    
    

		// 从摄像头抓取视频帧
		cv::Mat camera_frame_raw;
		capture >> camera_frame_raw;
		if (camera_frame_raw.empty())
			break;

		cv::Mat camera_frame;
		cv::cvtColor(camera_frame_raw, camera_frame, cv::COLOR_BGR2RGB);
		cv::flip(camera_frame, camera_frame, 1);

		// 将OpenCV Mat转换为ImageFrame
		auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
			mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
			mediapipe::ImageFrame::kDefaultAlignmentBoundary);
		cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
		camera_frame.copyTo(input_frame_mat);

		// 发送图片到图中推理
		size_t frame_timestamp_us =
			(double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;

		MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
			m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
			.At(mediapipe::Timestamp(frame_timestamp_us))));

		// 获取推理结果
		mediapipe::Packet packet;
		if (!m_pVideoPoller->Next(&packet)) break;

		if (show_image)
		{
    
    
			// 从视频输出获取mediapipe::ImageFrame结果
			auto& output_frame = packet.Get<mediapipe::ImageFrame>();

			// 转换mediapipe::ImageFrame为cv::Mat
			cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);

			// 显示cv::Mat结果
			cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
			cv::Mat dst;
			cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols / 2, output_frame_mat.rows / 2));
			cv::imshow(cvWindowName, dst);
			cv::waitKey(1);
		}
	}
	if (show_image)
		cv::destroyWindow(cvWindowName);

	return absl::OkStatus();
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_ReleaseGraph()
{
    
    
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_InputStreamName));
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_OutputStreamName));
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_PoseLandmarks_OutputStreamName));
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_LeftHandLandmarks_OutputStreamName));
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_RightHandLandmarks_OutputStreamName));
	MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_FaceLandmarks_OutputStreamName));

	return m_Graph.WaitUntilDone();
}

Carefully look at the code of the above detection class. According to the content of the pbtxt file in Section 3.3.1, add the input stream we need to the graph in the Mediapipe_InitGraph function that initializes the graph, and then process these data in the detection functions Mediapipe_RunMPPGraph_Direct and Mediapipe_RunMPPGraph_Camera functions. Specific operations were performed.

3.3.2.2 Gesture detection class

GestureRecognition.h

#ifndef GESTURE_RECOGNITION_H
#define GESTURE_RECOGNITION_H

#include <vector>
#include "GestureRecognition.h"
#include "TrackingDataStructure.h"

namespace GoogleMediapipeDetect {
    
    

	class GestureRecognition
	{
    
    
	public:
		GestureRecognition();
		virtual~GestureRecognition();

	public:
		int RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points);

	private:
		float Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2);
	};
}

#endif // !GESTURE_RECOGNITION_H

GestureRecognition.cpp

#include "GestureRecognition.h"

GoogleMediapipeDetect::GestureRecognition::GestureRecognition()
{
    
    

}

GoogleMediapipeDetect::GestureRecognition::~GestureRecognition()
{
    
    

}

int GoogleMediapipeDetect::GestureRecognition::RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points)
{
    
    
	if (single_hand_joint_points.size() != 21)
		return Gesture::NoGesture;

	// 大拇指角度
	Vector2D thumb_vec1;
	thumb_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[2].x;
	thumb_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[2].y;

	Vector2D thumb_vec2;
	thumb_vec2.x = single_hand_joint_points[3].x - single_hand_joint_points[4].x;
	thumb_vec2.y = single_hand_joint_points[3].y - single_hand_joint_points[4].y;

	float thumb_angle = Vector2DAngle(thumb_vec1, thumb_vec2);
	//std::cout << "thumb_angle = " << thumb_angle << std::endl;
	//std::cout << "thumb.y = " << single_hand_joint_vector[0].y << std::endl;


	// 食指角度
	Vector2D index_vec1;
	index_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[6].x;
	index_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[6].y;

	Vector2D index_vec2;
	index_vec2.x = single_hand_joint_points[7].x - single_hand_joint_points[8].x;
	index_vec2.y = single_hand_joint_points[7].y - single_hand_joint_points[8].y;

	float index_angle = Vector2DAngle(index_vec1, index_vec2);
	//std::cout << "index_angle = " << index_angle << std::endl;


	// 中指角度
	Vector2D middle_vec1;
	middle_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[10].x;
	middle_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[10].y;

	Vector2D middle_vec2;
	middle_vec2.x = single_hand_joint_points[11].x - single_hand_joint_points[12].x;
	middle_vec2.y = single_hand_joint_points[11].y - single_hand_joint_points[12].y;

	float middle_angle = Vector2DAngle(middle_vec1, middle_vec2);
	//std::cout << "middle_angle = " << middle_angle << std::endl;


	// 无名指角度
	Vector2D ring_vec1;
	ring_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[14].x;
	ring_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[14].y;

	Vector2D ring_vec2;
	ring_vec2.x = single_hand_joint_points[15].x - single_hand_joint_points[16].x;
	ring_vec2.y = single_hand_joint_points[15].y - single_hand_joint_points[16].y;

	float ring_angle = Vector2DAngle(ring_vec1, ring_vec2);
	//std::cout << "ring_angle = " << ring_angle << std::endl;

	// 小拇指角度
	Vector2D pink_vec1;
	pink_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[18].x;
	pink_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[18].y;

	Vector2D pink_vec2;
	pink_vec2.x = single_hand_joint_points[19].x - single_hand_joint_points[20].x;
	pink_vec2.y = single_hand_joint_points[19].y - single_hand_joint_points[20].y;

	float pink_angle = Vector2DAngle(pink_vec1, pink_vec2);
	//std::cout << "pink_angle = " << pink_angle << std::endl;


	// 根据角度判断手势
	float angle_threshold = 65;
	float thumb_angle_threshold = 40;

	int result = -1;
	if ((thumb_angle > thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
		result = Gesture::Fist;
	else if ((thumb_angle > 5) && (index_angle < angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
		result = Gesture::One;
	else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
		result = Gesture::Two;
	else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle > angle_threshold))
		result = Gesture::Three;
	else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
		result = Gesture::Four;
	else if ((thumb_angle < thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
		result = Gesture::Five;
	else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle < angle_threshold))
		result = Gesture::Six;
	else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
		result = Gesture::ThumbUp;
	else if ((thumb_angle > 5) && (index_angle > angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
		result = Gesture::Ok;
	else
		result = Gesture::NoGesture;

	return result;
}

float GoogleMediapipeDetect::GestureRecognition::Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2)
{
    
    
	double PI = 3.141592653;
	float t = (vec1.x * vec2.x + vec1.y * vec2.y) / (sqrt(pow(vec1.x, 2) + pow(vec1.y, 2)) * sqrt(pow(vec2.x, 2) + pow(vec2.y, 2)));
	float angle = acos(t) * (180 / PI);
	return angle;
}

3.3.2.3 Arm raising and lowering detection class

ArmUpAndDownRecognition.h

#ifndef ARM_UP_AND_DOWN_RECOGNITION_H
#define ARM_UP_AND_DOWN_RECOGNITION_H

#include <vector>

#include "TrackingDataStructure.h"

namespace GoogleMediapipeDetect {
    
    
	class ArmUpAndDownRecognition 
	{
    
    
	public:
		ArmUpAndDownRecognition();
		virtual~ArmUpAndDownRecognition();

	public:
		bool RecognizeProcess(const std::vector<Point2D>& pose_joint_points,int& left_arm_result,int& right_arm_result);
	};
}

#endif // !ARM_UP_AND_DOWN_RECOGNITION_H

ArmUpAndDownRecognition.cpp

#include "ArmUpAndDownRecognition.h"

GoogleMediapipeDetect::ArmUpAndDownRecognition::ArmUpAndDownRecognition()
{
    
    

}

GoogleMediapipeDetect::ArmUpAndDownRecognition::~ArmUpAndDownRecognition()
{
    
    

}

bool GoogleMediapipeDetect::ArmUpAndDownRecognition::RecognizeProcess(const std::vector<Point2D>& pose_joint_points, int& left_arm_result, int& right_arm_result)
{
    
    
	if (pose_joint_points.size() != 33)
		return false;

	Point2D left_elbow = pose_joint_points[13];
	Point2D right_elbow = pose_joint_points[14];

	Point2D left_wrist = pose_joint_points[15];
	Point2D right_wrist = pose_joint_points[16];

	// 检测左手
	if (left_wrist.y > left_elbow.y)
	{
    
    
		left_arm_result = (int)ArmUpDown::ArmDown;
	}
	else if (left_wrist.y < left_elbow.y)
	{
    
    
		left_arm_result = (int)ArmUpDown::ArmUp;
	}
	else
	{
    
    
		left_arm_result = (int)ArmUpDown::NoResult;
	}

	// 检测右手
	if (right_wrist.y > left_elbow.y)
	{
    
    
		right_arm_result = ArmUpDown::ArmDown;
	}
	else if (right_wrist.y < left_elbow.y)
	{
    
    
		right_arm_result = ArmUpDown::ArmUp;
	}
	else
	{
    
    
		right_arm_result = ArmUpDown::NoResult;
	}

	return true;
}

For other project-related classes, please refer to the Github project.

4 Project compilation and debugging in bazel

4.1 Compilation

4.1.1 Compiled file location and bazel compiled files

Create a new folder in the Mediapipe warehouse directory mediapipe\mediapipe\examples\desktop and name it holistic_tracking_dll.

Copy all files under dll/holistic_tracking_dll of the Github project to the mediapipe\mediapipe\examples\desktop\holistic_tracking_dll folder.

The bazel compilation file used to compile the project is as follows:

# Copyright 2020 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

licenses(["notice"])

package(default_visibility = ["//mediapipe/examples:__subpackages__"])

cc_binary(
    name = "holistic_tracking_cpu",
    deps = [
        "//mediapipe/examples/desktop:demo_run_graph_main",
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)

cc_binary(
    name = "MediapipeHolisticTracking",
	srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
    linkshared=True,
    deps = [
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)


# Linux only
cc_binary(
    name = "holistic_tracking_gpu",
    deps = [
        "//mediapipe/examples/desktop:demo_run_graph_main_gpu",
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_gpu_deps",
    ],
)

in,

cc_binary(
    name = "MediapipeHolisticTracking",
	srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
    linkshared=True,
    deps = [
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)

The compilation configuration for this project.

4.1.2 Compilation

4.1.2.1 Release mode

Example command:

bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures

4.1.2.2 Debug debug mode

Example command:

bazel build -c dbg --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures

The dll compiled in Debug mode contains debuggable symbols and can be debugged directly by setting breakpoints in Visual Studio.

4.1.2.3 Compilation output directory

If the compilation goes well, the MediapipeHolisticTracking.dll file will be generated in the mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll directory.

4.2 Debugging

In this section, we will teach you how to debug this dynamic library.

First, you need to use the Debug mode of 4.1.2.2 to compile. After the compilation is completed, the MediapipeHolisticTracking.dll file and the MediapipeHolisticTracking.PDB file will be generated under mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll. Isn’t it very friendly to see the PDB file? ah?

Then copy the MediapipeHolisticTracking.dll file and MediapipeHolisticTracking.PDB file to the directory where the dll is called, then set a breakpoint in the code file, use the debug mode to hit the breakpoint, and enjoy debugging quickly.

If you are interested, you can visit my personal website: https://www.stubbornhuang.com/

Guess you like

Origin blog.csdn.net/HW140701/article/details/122606320