[Hand tracking] Part of the hand in google/mediapipe is reproduced

Reference link:

1) github code link: https://github.com/google/mediapipe

2) Documentation: https://google.github.io/mediapipe

3) Python environment configuration document: https://google.github.io/mediapipe/getting_started/python

4) Documentation for simple API calls: https://google.github.io/mediapipe/solutions/hands#python-solution-api

0. Environmental preparation

Python environment configuration document: https://google.github.io/mediapipe/getting_started/python

ubuntu20.04
cuda11.2
python3.8
opencv-python==4.1.2.30
mediapipe==0.8.2

sudo apt install -y protobuf-compiler
sudo apt install -y cmake

1 Introduction

To explain a little, the document is basically in the first link. Python is used by installing mediapipe's pypi library and calling the API.

Documentation: https://google.github.io/mediapipe

github code link: https://github.com/google/mediapipe

2. Experiment

Code reference here: The documentation for simple API calls: https://google.github.io/mediapipe/solutions/hands#python-solution-api

The official document copied directly from the code below is convenient for you to modify by yourself. There are two ways to read images. The first is to read the file names in a list, which can actually be rewritten to directly read the files in the directory, and then save them in a list; the second is to read the camera, which can also be rewritten here as Read the video directly (for desktop computers, you may not be able to use the camera, just go here).

import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands

# For static images:
hands = mp_hands.Hands(
    static_image_mode=True,
    max_num_hands=2,
    min_detection_confidence=0.5)
for idx, file in enumerate(file_list):
  # Read an image, flip it around y-axis for correct handedness output (see
  # above).
  image = cv2.flip(cv2.imread(file), 1)
  # Convert the BGR image to RGB before processing.
  results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

  # Print handedness and draw hand landmarks on the image.
  print('Handedness:', results.multi_handedness)
  if not results.multi_hand_landmarks:
    continue
  image_hight, image_width, _ = image.shape
  annotated_image = image.copy()
  for hand_landmarks in results.multi_hand_landmarks:
    print('hand_landmarks:', hand_landmarks)
    print(
        f'Index finger tip coordinates: (',
        f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '
        f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_hight})'
    )
    mp_drawing.draw_landmarks(
        annotated_image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
  cv2.imwrite(
      '/tmp/annotated_image' + str(idx) + '.png', cv2.flip(annotated_image, 1))
hands.close()

# For webcam input:
hands = mp_hands.Hands(
    min_detection_confidence=0.5, min_tracking_confidence=0.5)
cap = cv2.VideoCapture(0)
while cap.isOpened():
  success, image = cap.read()
  if not success:
    print("Ignoring empty camera frame.")
    # If loading a video, use 'break' instead of 'continue'.
    continue

  # Flip the image horizontally for a later selfie-view display, and convert
  # the BGR image to RGB.
  image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
  # To improve performance, optionally mark the image as not writeable to
  # pass by reference.
  image.flags.writeable = False
  results = hands.process(image)

  # Draw the hand annotations on the image.
  image.flags.writeable = True
  image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
      mp_drawing.draw_landmarks(
          image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
  cv2.imshow('MediaPipe Hands', image)
  if cv2.waitKey(5) & 0xFF == 27:
    break
hands.close()
cap.release()

The result of running the test:

3. Interpretation of hand_landmarks in the output

Take the landmark here as an example, a hand has 21 key points, x and y are the normalized results of width and height, and z is the depth.

hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x

According to the above and the key point naming to read the corresponding point coordinates.

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_35975447/article/details/113933444