Fatigue detection - closed eyes detection (detailed code tutorial)

Introduction

Drowsiness often occurs while driving a car. This behavior is harmful to others and ourselves. If there is a system that can identify drowsiness, then this system will undoubtedly be of great significance!
Insert image description here

Implementation steps

Idea: Most drivers who drive while fatigued doze off, so we judge whether the driver is driving fatigued (or drowsy) based on the frequency and time the driver's eyes close.

Detailed implementation steps

【1】eye key point detection.

Insert image description here

We use Face Mesh to detect eye key points, and Face Mesh returns 468 face key points:
Since we focus on driver drowsiness detection, among the 468 points, we only need landmark points belonging to the eye area. There are 32 landmark points (16 points each) in the eye area. To calculate the EAR, we only need 12 points (6 points for each eye).

The above picture is used as a reference. The 12 selected landmark points are as follows:

For left eye: [362, 385, 387, 263, 373, 380]

For right eye: [33, 160, 158, 133, 153, 144]

The selected landmark points are arranged in order: P 1, P 2, P 3, P 4, P 5, P 6

```bash

```bash
import cv2
import numpy as np
import matplotlib.pyplot as plt
import mediapipe as mp

mp_facemesh = mp.solutions.face_mesh
mp_drawing  = mp.solutions.drawing_utils
denormalize_coordinates = mp_drawing._normalized_to_pixel_coordinates

%matplotlib inline
获取双眼的地标(索引)点。

`


```bash
# Landmark points corresponding to left eye
all_left_eye_idxs = list(mp_facemesh.FACEMESH_LEFT_EYE)
# flatten and remove duplicates
all_left_eye_idxs = set(np.ravel(all_left_eye_idxs)) 

# Landmark points corresponding to right eye
all_right_eye_idxs = list(mp_facemesh.FACEMESH_RIGHT_EYE)
all_right_eye_idxs = set(np.ravel(all_right_eye_idxs))

# Combined for plotting - Landmark points for both eye
all_idxs = all_left_eye_idxs.union(all_right_eye_idxs)

# The chosen 12 points:   P1,  P2,  P3,  P4,  P5,  P6
chosen_left_eye_idxs  = [362, 385, 387, 263, 373, 380]
chosen_right_eye_idxs = [33,  160, 158, 133, 153, 144]
all_chosen_idxs = chosen_left_eye_idxs + chosen_right_eye_idx
图片

[2] Detect whether the eyes are closed - calculate the eye aspect ratio (EAR).

To detect if the eyes are closed, we can use the Eye Aspect Ratio (EAR) formula:

The EAR formula returns a single scalar that reflects eye opening:

  1. We will use Mediapipe's Face Mesh solution to detect and retrieve relevant landmarks in the eye area (points P 1 - P 6 in the image below).
  2. After retrieving the relevant points, the eye aspect ratio (EAR) is calculated between the height and width of the eye.
    When the eyes are open and close to zero, the EAR is almost constant, while with the eyes closed it is partially human and insensitive to head posture. The aspect ratio of the open eyes has small interindividual differences. It is completely invariant to uniform scaling of images and in-plane rotation of faces. Since both eyes blink at the same time, the EAR of both eyes is average.
    Insert image description here

Above: Detected open and closed eyes of landmark Pi.

Bottom: Eye aspect ratio EAR drawn for several frames of a video sequence. There is a flicker.

First, we must calculate the Eye Aspect Ratio for each eye:

|| represents the L2 norm, used to calculate the distance between two vectors.

To calculate the final EAR value, the authors recommend taking the average of the two EAR values.

Insert image description here

Generally speaking, the average EAR value is in the range [0.0, 0.40]. The EAR value decreases rapidly during the "eyes-closed" movement.

Now that we are familiar with the EAR formula, let's define the three required functions: distance(…), get_ear(…), and calculate_avg_ear(…).

def distance(point_1, point_2):
    """Calculate l2-norm between two points"""
    dist = sum([(i - j) ** 2 for i, j in zip(point_1, point_2)]) ** 0.5
    return dist
get_ear ()函数将.landmark属性作为参数。在每个索引位置,我们都有一个NormalizedLandmark对象。该对象保存标准化的x、y和z坐标值。
def get_ear(landmarks, refer_idxs, frame_width, frame_height):
    """
    Calculate Eye Aspect Ratio for one eye.

    Args:
        landmarks: (list) Detected landmarks list
        refer_idxs: (list) Index positions of the chosen landmarks
                            in order P1, P2, P3, P4, P5, P6
        frame_width: (int) Width of captured frame
        frame_height: (int) Height of captured frame

    Returns:
        ear: (float) Eye aspect ratio
    """
    try:
        # Compute the euclidean distance between the horizontal
        coords_points = []
        for i in refer_idxs:
            lm = landmarks[i]
            coord = denormalize_coordinates(lm.x, lm.y, 
                                             frame_width, frame_height)
            coords_points.append(coord)

        # Eye landmark (x, y)-coordinates
        P2_P6 = distance(coords_points[1], coords_points[5])
        P3_P5 = distance(coords_points[2], coords_points[4])
        P1_P4 = distance(coords_points[0], coords_points[3])

        # Compute the eye aspect ratio
        ear = (P2_P6 + P3_P5) / (2.0 * P1_P4)

    except:
        ear = 0.0
        coords_points = None

    return ear, coords_points

Finally, the calculate_avg_ear(…) function is defined:

def calculate_avg_ear(landmarks, left_eye_idxs, right_eye_idxs, image_w, image_h):
    """Calculate Eye aspect ratio"""

    left_ear, left_lm_coordinates = get_ear(
                                      landmarks, 
                                      left_eye_idxs, 
                                      image_w, 
                                      image_h
                                    )
    right_ear, right_lm_coordinates = get_ear(
                                      landmarks, 
                                      right_eye_idxs, 
                                      image_w, 
                                      image_h
                                    )
    Avg_EAR = (left_ear + right_ear) / 2.0

    return Avg_EAR, (left_lm_coordinates, right_lm_coordinates)

Let's test the EAR formula. We will calculate the average EAR value of the previously used image and another image with the eyes closed.

image_eyes_open  = cv2.imread("test-open-eyes.jpg")[:, :, ::-1]
image_eyes_close = cv2.imread("test-close-eyes.jpg")[:, :, ::-1]

for idx, image in enumerate([image_eyes_open, image_eyes_close]):
   
    image = np.ascontiguousarray(image)
    imgH, imgW, _ = image.shape

    # Creating a copy of the original image for plotting the EAR value
    custom_chosen_lmk_image = image.copy()

    # Running inference using static_image_mode
    with mp_facemesh.FaceMesh(refine_landmarks=True) as face_mesh:
        results = face_mesh.process(image).multi_face_landmarks

        # If detections are available.
        if results:
            for face_id, face_landmarks in enumerate(results):
                landmarks = face_landmarks.landmark
                EAR, _ = calculate_avg_ear(
                          landmarks, 
                          chosen_left_eye_idxs, 
                          chosen_right_eye_idxs, 
                          imgW, 
                          imgH
                      )

                # Print the EAR value on the custom_chosen_lmk_image.
                cv2.putText(custom_chosen_lmk_image, 
                            f"EAR: {round(EAR, 2)}", (1, 24),
                            cv2.FONT_HERSHEY_COMPLEX, 
                            0.9, (255, 255, 255), 2
                )                
             
                plot(img_dt=image.copy(),
                     img_eye_lmks_chosen=custom_chosen_lmk_image,
                     face_landmarks=face_landmarks,
                     ts_thickness=1, 
                     ts_circle_radius=3, 
                     lmk_circle_radius=3
                )

result:

picture

As you can see, the EAR value is 0.28 when the eyes are open and 0.08 when the eyes are closed (close to zero).

【3】Design a real-time detection system.

Insert image description here

First , we declare two thresholds and a counter.

  • EAR_thresh: Threshold used to check whether the current EAR value is within the range.
  • D_TIME: A counter variable that tracks the current amount of elapsed time EAR < EAR_THRESH.
  • WAIT_TIME: Determines whether the amount of elapsed time EAR < EAR_THRESH exceeds the allowed limit.
  • When the application starts, we record the current time (in seconds) in a variable t1 and read the incoming frame.

Next , we preprocess and frame through Mediapipe's Face Mesh solution pipeline.

  • If any landmark detection is available, we retrieve the associated ( Pi ) eye landmark. Otherwise, reset t1 and reset here to make the algorithm consistent). D_TIME (D_TIME
  • If detection is available, the average EAR value for both eyes is calculated using the retrieved eye landmarks.
  • If it is the current time, add the difference between the current time and to. Then reset the next frame to . EAR < EAR_THRESHt2t1D_TIMEt1 t2
  • If D_TIME >= WAIT_TIME, we raise an alarm or continue to the next frame.

Guess you like

Origin blog.csdn.net/ALiLiLiYa/article/details/132515440