Jetson nano combat series: face recognition based on dlib+FaceNet

0, order

0.1. What is FaceNet?

  The face recognition model proposed by Google engineers Florian Schroff, Dmitry Kalenichenko, and James Philbin in 2015 is a bit of ancient times. But FaceNet first changed the situation that face verification and face recognition systems need to train two neural networks separately, unifying the two into one framework. FaceNet is mainly used to verify whether the face is the same person, and to find out who the person is through face recognition. Unlike the previous face recognition methods, FaceNet also maps the feature data of the face to a multi-dimensional space, and obtains the similarity of the face data through the Euclidean distance of the space. The smaller the Euclidean distance, the greater the similarity between the two faces. FaceNet uses the image mapping method based on deep neural network and the Loss function based on Triplets to train the neural network. The network directly outputs a 128-dimensional vector space. As shown in Figure 1, the deep learning framework is like a black box here, we just need to input Batch to it, where Batch refers to the sample of the face image. Deep Architecture refers to the use of a deep learning architecture, which is the GoogLeNet network proposed by Google in 2014, which is a 22-layer deep network.
Insert picture description here
  L2 refers to feature normalization, so that all image features will be mapped to a hypersphere. Enbedding is a feature vector output through the aforementioned GoogLeNet deep learning network and normalized by L2. At the end of FaceNet is its new loss function Triplet Loss. The learning process of FaceNet is to make the distance between as many Anchor and Positive as possible in the triplet smaller than the distance between Anchor and Negative, that is, those belonging to the same person are as close as possible, and those that do not belong to the same person as far as possible.

(i)Negatives: Regions that the Intersec-tion-over-Union (IoU) ratio less than 0.3 to any ground-truth faces;
(ii) Positives: IoU above 0.65 to a ground truth face;
(iii) Part faces: IoU between 0.4 and 0.65 to a ground truth face;

Insert picture description here
  As shown in the figure above, the purpose of the Triplet Loss loss function at the end is to directly learn the separability between features. The traditional Loss function is usually Single or Double Loss, which tends to map a type of feature face image to the same space, while Triplet Loss tries to separate the face image of an individual from the images of other people.
Insert picture description here
  That is to take 3 images of human faces in each training, denoted as xia x_i^axia x i p x_i^p xip x i n x_i^n xin, Where a and p are the face image of the same person, n is the face image of another person, and the triple loss is to directly optimize the distance, which can solve the problem of facial feature representation but requires a very large amount of data. Has a very good effect.

0.2, the advantages of FaceNet

   There are no restrictions in FaceNet that require face alignment. The advantage of this model is that only a small amount of processing is needed on the picture (only the face area needs to be cropped, and no additional preprocessing, such as 3d alignment, etc.), can be used as model input. At the same time, the model has a very high accuracy rate on the data set. FaceNet does not require alignment like DeepFace and DeepID. After FaceNet gets the final representation, there is no need to retrain the model for classification like DeepID, just calculate the distance directly, which is simple and effective.
Paper download portal: https://arxiv.org/pdf/1503.03832.pdf

1. Environmental preparation

1.1, dependent library installation

sudo apt-get install libopenblas-dev gfortrandir
pip3 install scipy
pip3 install scikit-learn
pip3 install Pillow

   In the above-mentioned dependent libraries, due to network problems, errors such as Read timed out. will appear when downloading scipy. It is strongly recommended to download the source file package directly and install it locally. The scipy source code portal: https://github.com/scipy/scipy/releases , after the download is complete, install it locally.

pip3 install scipy-1.2.3.tar.gz

Insert picture description here

1.2, clone FaceNet project source code

git clone https://github.com/davidsandberg/facenet ./

   Enter the FaceNet src directory, open the python environment, try to import the facenet runtime library, and see if the relevant built-in functions can be listed.
Insert picture description here

1.3, download the pre-trained model

   Resource portal: https://github.com/davidsandberg/facenet
The pre-trained model used here is: 20180402-114759.zip, decompress it and place it in the working path.
Insert picture description here

2、Coding

Insert picture description here

2.1. Create a new FaceNet model class, write related constructors, inference functions, and destructors

import tensorflow.compat.v1 as tf
from scipy import misc
import facenet
import numpy as np

#facenet network class
class FaceNetModel():
    def __init__(self, model_file):
        tf.Graph().as_default()
        facenet.load_model('20180402-114759.pb')
        
        self.image_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
        self.phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")

        self.embeddings_op = tf.get_default_graph().get_tensor_by_name("embeddings:0")

        self.sess = tf.Session()


    def get_descriptor(self, image):
        
        image = misc.imresize(image, (160,160), interp = "bilinear")
        image = facenet.prewhiten(image)
        images = np.stack([image])

        feed_dict = {
    
    self.image_placeholder:images, self.phase_train_placeholder:False}

        emb = self.sess.run(self.embeddings_op, feed_dict = feed_dict)

        return emb[0,:]

    def __del__(self):
        self.sess.close()

def get_FaceNetModel(model_file):
    return FaceNetModel(model_file)

2.2、Face register

import dlib
import cv2
import os
import facenet_model
import pickle

font = cv2.FONT_HERSHEY_SIMPLEX
detector = dlib.get_frontal_face_detector()

face_net = facenet_model.get_FaceNetModel('20180402-114759.pb')
imagePATH = '/home/colin/works/face_recognition_facenet/dataset/processed/Colin/'

def create_known(path):
    global font
    person_names    = []
    face_features   = []

    print("creating known face lib...")
    for file_name in os.listdir(path):
        if '.jpg' in file_name or '.png' in file_name:
            #read imege and change to RGB
            image = cv2.imread(path + file_name)
            rgb_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            
            #detect face
            dets = detector(rgb_img)
            
            if(len(dets) == 0):
                continue
            
            det = dets[0]
            face_img = rgb_img[det.top():det.bottom(), det.left():det.right()]
            descriptor = face_net.get_descriptor(face_img)

            person_name = file_name[:file_name.rfind('_')]
            person_names.append(person_name)

            face_features.append(descriptor)
            print('Appending + '+person_name+'...')

    with open('train.pkl', 'wb') as f:
        pickle.dump(person_names, f)
        pickle.dump(face_features, f)

    print('Face Library Created!')
    #return person_names, face_features

if __name__ == '__main__':
    create_known(imagePATH)

2.3、Face recognition

import cv2
import os
import time
import numpy as np
import facenet_model
import dlib
import pickle

import image_shop
################################ Global variable ######################################
person_names = []
face_features = []
imagePATH = '/home/colin/works/face_recognition_facenet/dataset/processed/Colin/'
detector = dlib.get_frontal_face_detector()
face_net = facenet_model.get_FaceNetModel('20180402-114759.pb')
########################################################################################

# 640 480 320 240
def gstreamer_pipeline(
    capture_width=320,
    capture_height=240,
    display_width=320,
    display_height=240,
    framerate=30,
    flip_method=0,
):
    return (
        "nvarguscamerasrc ! "
        "video/x-raw(memory:NVMM), "
        "width=(int)%d, height=(int)%d, "
        "format=(string)NV12, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
)

def train_data_load():
    global person_names, face_features

    with open('train.pkl','rb') as f:
        person_names=pickle.load(f)
        face_features=pickle.load(f)

        
def facenet_recognition(image):
    rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    dets = detector(rgb_image)
    #dets = detector(gray_image, 2)
    for det in dets:
        if(det.top() < 0) or (det.bottom() < 0) or (det.left() < 0) or (det.right() < 0):
            continue
        
        #get the face area
        face_img = rgb_image[det.top():det.bottom(), det.left():det.right()]
        #get the face descriptor
        descriptor = face_net.get_descriptor(face_img)
        
        min_dist = 0.7 #1
        person_name = 'unknown'
        for i in range(len(face_features)):
            dist = np.linalg.norm(descriptor-face_features[i])
            print('dist:', dist)
            if dist < min_dist:
                min_dist = dist
                person_name = person_names[i]

        cv2.rectangle(image, (det.left(), det.top()+10), (det.right(), det.bottom()), (0, 255, 0), 2)
        cv2.putText(image, person_name , (det.left(),det.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,0,0), 2, cv2.LINE_AA)

        image = image_shop.mark_add(det.left(), det.right(), det.top(), det.bottom(), image)

    return image

def face_recognition_livevideo(window_name, camera_idx):
    cv2.namedWindow(window_name)

    #CSI Camera for get pipeline
    cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=camera_idx), cv2.CAP_GSTREAMER)
    
    while cap.isOpened():
        ok, frame = cap.read() #read 1 frame
        if not ok:
            break
        resImage = facenet_recognition(frame)
        #display
        cv2.imshow(window_name, resImage)
        c = cv2.waitKey(1)
        if c & 0xFF == ord('q'):
            break

    #close
    cap.release()
    cv2.destroyAllWindows()    


if __name__ == '__main__':
    train_data_load()
    face_recognition_livevideo('Find Face', 0)

3. Demo effect

Insert picture description here

Reference appendix

1) "Introduction to the development of artificial intelligence examples based on the NVIDIA Jetson platform"

Guess you like

Origin blog.csdn.net/qq_33475105/article/details/111994319