Jetson nano combat series: based on the ResNet deep residual network in dlib to realize face recognition and use GPU acceleration

0, order

  This article describes how to use the deep residual network (ResNet) in Dlib to realize real-time face recognition. The basic development environment is as follows:

Installed software version
MIRACLES 10.2.89
cuDNN 8.0.0.180
OpenCV 4.4.0
TensorFlow 2.3.1
Jetpack Jetpack 4.4.1
Platform Jetson nano

  I have tried the implementation of face detection using opencv and the face_recognition module in dlib for face recognition, but the accuracy of face-recognition is not ideal, especially for Asian faces, which are easy to recognize as the same person. This article will use the deep residual network-ResNet in dlib to realize face recognition. It needs to be explained that this article does not involve the construction of deep residual networks, but uses and trained related pre-training models to implement this function.

1. Sources preparation

  Download related models and parameters, dlib official website portal: http://dlib.net/files/

detector     = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
sp           = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
facerec       = dlib.face_recognition_model_v1('dlib_face_recognition_resnet_model_v1.dat')

2、Coding

Insert picture description here

2.1. Facial data classification, save local facial feature vectors and labels

  The pre-trained resnet model is used to save the feature data of the face, and the face feature data and the corresponding name tag are saved as a local file for real-time face recognition. What is the face feature vector after all? Now I am still not very clear, I just know that it can express a person's facial features.

import os
import cv2
import dlib
import numpy as np
import json

detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
sp = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
facerec = dlib.face_recognition_model_v1('dlib_face_recognition_resnet_model_v1.dat')

imagePATH = '/home/colin/works/face_recognition_resnet/data/'
data = np.zeros((1, 128))
lables = []

for file in os.listdir(imagePATH):
    if '.jpg' in file or '.png' in file:
        fileName = file
        lableName = file.split('_')[0]
        print('current image:', file)
        print('current lable:', lableName)

        img = cv2.imread(imagePATH + file)
        if img.shape[0] * img.shape[1] > 500000:
            img = cv2.resize(img, (0,0), fx = 0.5, fy = 0.5)
        dets = detector(img, 1)
        for k, d in enumerate(dets):
            rec = dlib.rectangle(d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom())
            shape = sp(img, rec)
            face_descriptor = facerec.compute_face_descriptor(img, shape)
            faceArray = np.array(face_descriptor).reshape((1, 128))
            data  = np.concatenate((data, faceArray))
            lables.append(lableName)
            cv2.rectangle(img, (rec.left(), rec.top(), rec.right(), rec.bottom()), (0, 255, 0), 2)
        cv2.waitKey(2)
        cv2.imshow('img', img)

data = data[1:, :]
np.savetxt('faceData.txt', data, fmt = '%f')

lableFile = open("labels.txt", 'w')
json.dump(lables, lableFile)
lableFile.close()

cv2.destroyAllWindows()

2.2, face detection

detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')

2.3, face recognition

# 640 480 320 240
def gstreamer_pipeline(
    capture_width=320,
    capture_height=240,
    display_width=320,
    display_height=240,
    framerate=30,
    flip_method=0,
):
    return (
        "nvarguscamerasrc ! "
        "video/x-raw(memory:NVMM), "
        "width=(int)%d, height=(int)%d, "
        "format=(string)NV12, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
)

def findNearestClassForImage(face_descriptor, faceLabel):
    global threshold
    temp =  face_descriptor - data
    e = np.linalg.norm(temp,axis=1,keepdims=True)
    min_distance = e.min() 
    print('distance: ', min_distance)
    if min_distance > threshold:
        return 'unknow'
    index = np.argmin(e)
    return faceLabel[index]

def recognition(img):
    dets = detector(img, 1)
    for k, d in enumerate(dets):
        
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom()))
        rec = dlib.rectangle(d.rect.left(),d.rect.top(),d.rect.right(),d.rect.bottom())
        print(rec.left(),rec.top(),rec.right(),rec.bottom())
        shape = sp(img, rec)
        face_descriptor = facerec.compute_face_descriptor(img, shape)        
        
        class_pre = findNearestClassForImage(face_descriptor, label)
        print(class_pre)
        cv2.rectangle(img, (rec.left(), rec.top()+10), (rec.right(), rec.bottom()), (0, 255, 0), 2)
        cv2.putText(img, class_pre , (rec.left(),rec.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,255,0), 2, cv2.LINE_AA)

        img = image_shop.mark_add(rec.left(), rec.right(), rec.top(), rec.bottom(), img)

    return img


def data_load():
    global label, data, filePATH
    labelFile = open(filePATH + 'labels.txt', 'r')
    label = json.load(labelFile)
    labelFile.close()

    data = np.loadtxt(filePATH + 'faceData.txt', dtype=float)


def face_recognition_livevideo(window_name, camera_idx):
    cv2.namedWindow(window_name)

    #CSI Camera for get pipeline
    cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=camera_idx), cv2.CAP_GSTREAMER)
    
    while cap.isOpened():
        ok, frame = cap.read() #read 1 frame
        if not ok:
            break
        
        resImage = recognition(frame)

        #display
        cv2.imshow(window_name, resImage)
        c = cv2.waitKey(1)
        if c & 0xFF == ord('q'):
            break

    #close
    cap.release()
    cv2.destroyAllWindows()    


if __name__ == '__main__':
    data_load()
    face_recognition_livevideo('Find Face', 0)

2.4. Use GPU for acceleration:

  You can use dlib with CUDA enabled. If dlib is not enabled, you may need to reinstall dlib. Add "-DDLIB_USE_CUDA=1" when compiling and installing. You can refer to my previous blog post about the installation of the dlib library.
Insert picture description here

3. Demo effect

  The achieved effect is still good, and dlib can also call CUDA to participate in the calculation well. Thanks to the call of the GPU, the CPU will not appear too high load.
Insert picture description here

Reference appendix

1) Use the deep residual network (ResNet) in dlib to realize real-time face recognition
2) Daniel teaches you to use the deep residual network (ResNet) in dlib to realize real-time face recognition

Guess you like

Origin blog.csdn.net/qq_33475105/article/details/111994267