Computer Vision: Driver Fatigue Detection

Table of contents

Preface

Key points explained

Detailed code explanation

Results display

Direction for improvement (yawning fatigue detection method)

Direction for improvement (nodding to detect fatigue)

GUI interface design display

Preface

In the last blog we talked about how to locate faces and locate key points on faces. These include 5-point positioning and 68-point positioning. After positioning, we can use the positioning information to do some related operations, such as eye-closed detection, which can be applied to the driver's Fatigue detection, or for people who often use computers, not closing their eyes may cause dry eyes.

Key points explained

Our blog this time mainly explains how to detect fatigue driving by closing eyes, so we first need to understand how to let the computer determine whether a person has closed his eyes. We can know from the last blog that we first need the computer to recognize the face, and then continue to search for key points on the recognized face. What we use here is 68 key point detection.

As for the eyes, each of his eyes has 6 key points. Here we can use a way to determine whether a blink has occurred.

Among the 6 key points of the eye, we can find that when the eyes are open, the Euclidean distances between 2 and 6 points and 3 and 5 points a i=2> is larger. The distance between points 1 and 4 will increase slightly, then we can set a formula.

$EAR=\frac{||P2-P6||+||P3-P5||}{2||P1-P4||}$

Correspondingly on the diagram, points 2 and 6 are subtracted, and points 3 and 5 are subtracted. Then compare the difference between 1 and 4 points by 2 times. All of them are absolute values. In this way, the value of EAR will be larger when the eyes are open, and the value of EAR will be smaller when the eyes are closed. Then we set a threshold ourselves, if the value of EAR is lower than this threshold for more than a few frames in the video frame. Then we assume that the driver is closing his eyes.

After verification in the paper, it shows that the accuracy of this method is very impressive and has strong robustness.

Detailed code explanation

First we import the toolkit, which also includes the toolkit for calculating Euclidean distance.

from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np
import argparse
import time
import dlib
import cv2

Then we positioned the 68 key point positioning information.

FACIAL_LANDMARKS_68_IDXS = OrderedDict([
	("mouth", (48, 68)),
	("right_eyebrow", (17, 22)),
	("left_eyebrow", (22, 27)),
	("right_eye", (36, 42)),
	("left_eye", (42, 48)),
	("nose", (27, 36)),
	("jaw", (0, 17))
])

Here"jaw", (0, 17) represents the key point marks of the chin position, which are 0-17 points respectively.
Then we import the required models and videos into the program. Keypoint detection model.

ap = argparse.ArgumentParser()
ap.add_argument("-p", "--shape-predictor", required=True,
	help="path to facial landmark predictor")
ap.add_argument("-v", "--video", type=str, default="",
	help="path to input video file")
args = vars(ap.parse_args())

EYE_AR_THRESH = 0.3
EYE_AR_CONSEC_FRAMES = 3

These two parameters are very important here, among which EYE_AR_THRESH represents the threshold of EAR. If it is higher than this threshold, it means that the person's eyes are open at this time. If it is lower than this threshold, then attention should be paid at this time, the driver may be closing his eyes. And EYE_AR_CONSEC_FRAMES means that if the EAR value exceeds three frames or more, we can identify it as a closed eye. Why three frames? Because if there are two frames in one frame, it may be affected by other factors.

COUNTER = 0
TOTAL = 0

Then we set two more counters. If it is less than the threshold, then the value of COUNTER will be incremented by one. When the value of COUNTER is greater than or equal to 3, the TOTAL will be incremented by one. It means that the recorded eyes were closed once.

print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(args["shape_predictor"])

We are very familiar with it here. One is the face locator and the other is the key point detector. Call them out separately here.

(lStart, lEnd) = FACIAL_LANDMARKS_68_IDXS["left_eye"]
(rStart, rEnd) = FACIAL_LANDMARKS_68_IDXS["right_eye"]

Then we only take two ROI areas through key points, which are the left eye area and the right eye area.

print("[INFO] starting video stream thread...")
vs = cv2.VideoCapture(args["video"])

Then we read in the video.

while True:
	# 预处理
	frame = vs.read()[1]
	if frame is None:
		break
	(h, w) = frame.shape[:2]
	width=1200
	r = width / float(w)
	dim = (width, int(h * r))
	frame = cv2.resize(frame, dim, interpolation=cv2.INTER_AREA)
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

Enlarge the video display frame a little. The key here is that if the video frame is set too small, faces may not be detected. Then we set the width to 1200, and then resize the length in the same proportion. Finally converted to grayscale image.

rects = detector(gray, 0)

A face is detected here, and the four coordinates of the face frame are obtained. Note that grayscale images must be processed.

	for rect in rects:
		# 获取坐标
		shape = predictor(gray, rect)
		shape = shape_to_np(shape)

Here, face frame traversal is performed, and 68 key points are detected.

def shape_to_np(shape, dtype="int"):
	# 创建68*2
	coords = np.zeros((shape.num_parts, 2), dtype=dtype)
	# 遍历每一个关键点
	# 得到坐标
	for i in range(0, shape.num_parts):
		coords[i] = (shape.part(i).x, shape.part(i).y)
	return coords

Here are the coordinates of the key points extracted.

		leftEye = shape[lStart:lEnd]
		rightEye = shape[rStart:rEnd]
		leftEAR = eye_aspect_ratio(leftEye)
		rightEAR = eye_aspect_ratio(rightEye)

Then we calculated the EAR values for the left eye and the right eye respectively. The eye_aspect_ratio function here is used to calculate the EAR value.

def eye_aspect_ratio(eye):
	# 计算距离，竖直的
	A = dist.euclidean(eye[1], eye[5])
	B = dist.euclidean(eye[2], eye[4])
	# 计算距离，水平的
	C = dist.euclidean(eye[0], eye[3])
	# ear值
	ear = (A + B) / (2.0 * C)
	return ear

wheredist.euclidean means calculating the Euclidean distance, which is exactly the same as calculating the EAR value in the formula.

		ear = (leftEAR + rightEAR) / 2.0

		# 绘制眼睛区域
		leftEyeHull = cv2.convexHull(leftEye)
		rightEyeHull = cv2.convexHull(rightEye)
		cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
		cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)

Then the EAR solution was performed for both the left eye and the right eye and an average value was obtained, and then the eye area was plotted based on the concept of convex hull. Draw the left eye area and the right eye area.

		if ear < EYE_AR_THRESH:
			COUNTER += 1

		else:
			# 如果连续几帧都是闭眼的，总数算一次
			if COUNTER >= EYE_AR_CONSEC_FRAMES:
				TOTAL += 1

			# 重置
			COUNTER = 0

		# 显示
		cv2.putText(frame, "Blinks: {}".format(TOTAL), (10, 30),
			cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
		cv2.putText(frame, "EAR: {:.2f}".format(ear), (300, 30),
			cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

	cv2.imshow("Frame", frame)
	key = cv2.waitKey(10) & 0xFF
 
	if key == 27:
		break

vs.release()
cv2.destroyAllWindows()

Finally, a threshold judgment is performed. If EAR is less than 0.3 for three consecutive frames, then we add one to TOTAL, thus recording the process of closing eyes. Then finally the EAR value and TOTAL value are displayed in the video. Finally complete the overall training.

Results display

Direction for improvement (yawning fatigue detection method)

We know that in fatigue detection, light detection of blinks may not be particularly accurate, so we also need to combine other points that can show driver fatigue to show whether the driver is in the fatigue driving stage. We learned that yawning and nodding can also indicate driver fatigue. Let's first consider mouth yawning.
First, let’s take a look at the key points of the mouth.

We use the same method for blink detection to continue using the same method for the mouth to detect whether the mouth is open! The corresponding formula is:

$MAR=\frac{||P2-P6||+||P3-P5||}{2||P1-P4||}$

def mouth_aspect_ratio(mouth):
	A = np.linalg.norm(mouth[2] - mouth[9])  # 51, 59
	B = np.linalg.norm(mouth[4] - mouth[7])  # 53, 57
	C = np.linalg.norm(mouth[0] - mouth[6])  # 49, 55
	mar = (A + B) / (2.0 * C)
	return mar

Here we choose six points in the mouth area to determine whether the driver has opened his mouth!

MAR_THRESH = 0.5
MOUTH_AR_CONSEC_FRAMES = 3

We also need to set a threshold, and the explanation is the same as for blink detection.

(mStart, mEnd) = FACIAL_LANDMARKS_68_IDXS["mouth"]

First, we get the corresponding mouth area among the 68 key points.

mouth = shape[mStart:mEnd]
mar = mouth_aspect_ratio(mouth)

Then use the function mouth_aspect_ratio to calculate the mar value! Then perform convex hull detection and draw it.

		mouthHull = cv2.convexHull(mouth)
		cv2.drawContours(frame, [mouthHull], -1, (0, 255, 0), 1)

		left = rect.left()#绘制出来人脸框
		top = rect.top()
		right = rect.right()
		bottom = rect.bottom()
		cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 3)

What we need to add here is that we need to draw the face frame!

		if mar > MAR_THRESH:  # 张嘴阈值0.5
			mCOUNTER += 1
			cv2.putText(frame, "Yawning!", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
		else:
			# 如果连续3次都小于阈值，则表示打了一次哈欠
			if mCOUNTER >= MOUTH_AR_CONSEC_FRAMES:  # 阈值：3
				mTOTAL += 1
			# 重置嘴帧计数器
			mCOUNTER = 0
		cv2.putText(frame, "Yawning: {}".format(mTOTAL), (150, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
		cv2.putText(frame, "mCOUNTER: {}".format(mCOUNTER), (300, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
		cv2.putText(frame, "MAR: {:.2f}".format(mar), (480, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

Then make a judgment and show it in the video!

Direction for improvement (nodding to detect fatigue)

Detection process:
2D face key point detection; 3D face model matching; solving the conversion relationship between 3D points and corresponding 2D points; solving Euler angles based on the rotation matrix.
The attitude of an object relative to the camera can be represented by a rotation matrix and a translation matrix.
!](https://img-blog.csdnimg.cn/a8286dc98d624f4183eed96daab991e2.png)

1. Euler angles

Simply put, the Euler angle is the rotation angle of an object around the three coordinate axes (x, y, z axis) of the coordinate system.

2. Conversion between world coordinate system and other coordinate systems

Coordinate system conversion:

$\begin{pmatrix} X\\ Y\\ Z\\ \end{pmatrix}=R\begin{pmatrix} U\\ V\\ W \end{pmatrix}+T=[R|T]=\begin{pmatrix} U\\ V\\ W\\ 1 \end{pmatrix}$

Conversion from camera coordinate system to pixel coordinate system:

$S\begin{pmatrix} X\\ Y\\ 1 \end{pmatrix}=\begin{pmatrix} Fx& 0& Cx\\ 0& Fy& Cy\\ 0& 0& 1\end{pmatrix}\begin{pmatrix} X\\ Y\\ Z \end{pmatrix}$

Therefore, the relationship between the pixel coordinate system and the world coordinate system is as follows:

$S\begin{pmatrix} X\\ Y\\ 1 \end{pmatrix}=\begin{pmatrix} Fx & 0& Cx\\ 0& Fy& Cy\\ 0& 1& 1\end{pmatrix}[R|T]\begin{pmatrix} U\\ V\\ W\\ 1 \end{pmatrix}$

Then we define it according to the paper:

object_pts = np.float32([[6.825897, 6.760612, 4.402142],  #33左眉左上角
                         [1.330353, 7.122144, 6.903745],  #29左眉右角
                         [-1.330353, 7.122144, 6.903745], #34右眉左角
                         [-6.825897, 6.760612, 4.402142], #38右眉右上角
                         [5.311432, 5.485328, 3.987654],  #13左眼左上角
                         [1.789930, 5.393625, 4.413414],  #17左眼右上角
                         [-1.789930, 5.393625, 4.413414], #25右眼左上角
                         [-5.311432, 5.485328, 3.987654], #21右眼右上角
                         [2.005628, 1.409845, 6.165652],  #55鼻子左上角
                         [-2.005628, 1.409845, 6.165652], #49鼻子右上角
                         [2.774015, -2.080775, 5.048531], #43嘴左上角
                         [-2.774015, -2.080775, 5.048531],#39嘴右上角
                         [0.000000, -3.116408, 6.097667], #45嘴中央下角
                         [0.000000, -7.415691, 4.070434]])#6下巴角

K = [6.5308391993466671e+002, 0.0, 3.1950000000000000e+002,
     0.0, 6.5308391993466671e+002, 2.3950000000000000e+002,
     0.0, 0.0, 1.0]# 等价于矩阵[fx, 0, cx; 0, fy, cy; 0, 0, 1]
# 图像中心坐标系(uv)：相机畸变参数[k1, k2, p1, p2, k3]
D = [7.0834633684407095e-002, 6.9140193737175351e-002, 0.0, 0.0, -1.3073460323689292e+000]
reprojectsrc = np.float32([[10.0, 10.0, 10.0],
                           [10.0, 10.0, -10.0],
                           [10.0, -10.0, -10.0],
                           [10.0, -10.0, 10.0],
                           [-10.0, 10.0, 10.0],
                           [-10.0, 10.0, -10.0],
                           [-10.0, -10.0, -10.0],
                           [-10.0, -10.0, 10.0]])
# 绘制正方体12轴
line_pairs = [[0, 1], [1, 2], [2, 3], [3, 0],
              [4, 5], [5, 6], [6, 7], [7, 4],
              [0, 4], [1, 5], [2, 6], [3, 7]]

Among them, reprojectsrc and line_pairs belong to the operations of rectangle and rectangular connection frame. It will be used later.

cam_matrix = np.array(K).reshape(3, 3).astype(np.float32)
dist_coeffs = np.array(D).reshape(5, 1).astype(np.float32)

Here we have reshaped the K and D matrices!

def get_head_pose(shape):  # 头部姿态估计
	# （像素坐标集合）填写2D参考点，注释遵循https://ibug.doc.ic.ac.uk/resources/300-W/
	# 17左眉左上角/21左眉右角/22右眉左上角/26右眉右上角/36左眼左上角/39左眼右上角/42右眼左上角/
	# 45右眼右上角/31鼻子左上角/35鼻子右上角/48左上角/54嘴右上角/57嘴中央下角/8下巴角
	image_pts = np.float32([shape[17], shape[21], shape[22], shape[26], shape[36],
							shape[39], shape[42], shape[45], shape[31], shape[35],
							shape[48], shape[54], shape[57], shape[8]])
	# solvePnP计算姿势——求解旋转和平移矩阵：
	# rotation_vec表示旋转矩阵，translation_vec表示平移矩阵，cam_matrix与K矩阵对应，dist_coeffs与D矩阵对应。
	_, rotation_vec, translation_vec = cv2.solvePnP(object_pts, image_pts, cam_matrix, dist_coeffs)
	# projectPoints重新投影误差：原2d点和重投影2d点的距离（输入3d点、相机内参、相机畸变、r、t，输出重投影2d点）
	reprojectdst, _ = cv2.projectPoints(reprojectsrc, rotation_vec, translation_vec, cam_matrix, dist_coeffs)
	reprojectdst = tuple(map(tuple, reprojectdst.reshape(8, 2)))  # 以8行2列显示

	# 计算欧拉角calc euler angle
	# 参考https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#decomposeprojectionmatrix
	rotation_mat, _ = cv2.Rodrigues(rotation_vec)  # 罗德里格斯公式（将旋转矩阵转换为旋转向量）
	pose_mat = cv2.hconcat((rotation_mat, translation_vec))  # 水平拼接，vconcat垂直拼接
	# decomposeProjectionMatrix将投影矩阵分解为旋转矩阵和相机矩阵
	_, _, _, _, _, _, euler_angle = cv2.decomposeProjectionMatrix(pose_mat)

	pitch, yaw, roll = [math.radians(_) for _ in euler_angle]

	pitch = math.degrees(math.asin(math.sin(pitch)))
	roll = -math.degrees(math.asin(math.sin(roll)))
	yaw = math.degrees(math.asin(math.sin(yaw)))
	print('pitch:{}, yaw:{}, roll:{}'.format(pitch, yaw, roll))

	return reprojectdst, euler_angle  # 投影误差，欧拉角

Here we position some key points, and we convert the world coordinate system into 2D coordinates. Finally, we calculated the Euler angle through CV2, so that we can judge whether the driver nodded!

HAR_THRESH = 0.3
NOD_AR_CONSEC_FRAMES = 3
hCOUNTER = 0
hTOTAL = 0

Similarly here we also need to set a threshold and counter!

		reprojectdst, euler_angle = get_head_pose(shape)
		har = euler_angle[0, 0]  # 取pitch旋转角度
		if har > HAR_THRESH:  # 点头阈值0.3
			hCOUNTER += 1
		else:
			# 如果连续3次都小于阈值，则表示瞌睡点头一次
			if hCOUNTER >= NOD_AR_CONSEC_FRAMES:  # 阈值：3
				hTOTAL += 1
			# 重置点头帧计数器
			hCOUNTER = 0

		# 绘制正方体12轴
		for start, end in line_pairs:
			cv2.line(frame, (int(reprojectdst[start][0]),int(reprojectdst[start][1])), (int(reprojectdst[end][0]),int(reprojectdst[end][1])), (0, 0, 255))
		# 显示角度结果
		cv2.putText(frame, "X: " + "{:7.2f}".format(euler_angle[0, 0]), (10, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.75,
					(0, 255, 0), thickness=2)  # GREEN
		cv2.putText(frame, "Y: " + "{:7.2f}".format(euler_angle[1, 0]), (150, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.75,
					(255, 0, 0), thickness=2)  # BLUE
		cv2.putText(frame, "Z: " + "{:7.2f}".format(euler_angle[2, 0]), (300, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.75,
					(0, 0, 255), thickness=2)  # RED
		cv2.putText(frame, "Nod: {}".format(hTOTAL), (450, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 0), 2)

		for (x, y) in shape:
			cv2.circle(frame, (x, y), 1, (0, 0, 255), -1)

	if TOTAL >= 50 or mTOTAL >= 15:
		cv2.putText(frame, "SLEEP!!!", (100, 200), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 3)

Here are also some judgment operations and displaying information in the video.
The final effect is as follows:

GUI interface design display

If you think the blogger's articles are good or you can use them, you can follow the blogger for free. It would be even better if you can support them by collecting them in three consecutive collections! This is the greatest support you can give me!