0, order
0.1. What is FaceNet?
The face recognition model proposed by Google engineers Florian Schroff, Dmitry Kalenichenko, and James Philbin in 2015 is a bit of ancient times. But FaceNet first changed the situation that face verification and face recognition systems need to train two neural networks separately, unifying the two into one framework. FaceNet is mainly used to verify whether the face is the same person, and to find out who the person is through face recognition. Unlike the previous face recognition methods, FaceNet also maps the feature data of the face to a multi-dimensional space, and obtains the similarity of the face data through the Euclidean distance of the space. The smaller the Euclidean distance, the greater the similarity between the two faces. FaceNet uses the image mapping method based on deep neural network and the Loss function based on Triplets to train the neural network. The network directly outputs a 128-dimensional vector space. As shown in Figure 1, the deep learning framework is like a black box here, we just need to input Batch to it, where Batch refers to the sample of the face image. Deep Architecture refers to the use of a deep learning architecture, which is the GoogLeNet network proposed by Google in 2014, which is a 22-layer deep network.
L2 refers to feature normalization, so that all image features will be mapped to a hypersphere. Enbedding is a feature vector output through the aforementioned GoogLeNet deep learning network and normalized by L2. At the end of FaceNet is its new loss function Triplet Loss. The learning process of FaceNet is to make the distance between as many Anchor and Positive as possible in the triplet smaller than the distance between Anchor and Negative, that is, those belonging to the same person are as close as possible, and those that do not belong to the same person as far as possible.
(i)Negatives: Regions that the Intersec-tion-over-Union (IoU) ratio less than 0.3 to any ground-truth faces;
(ii) Positives: IoU above 0.65 to a ground truth face;
(iii) Part faces: IoU between 0.4 and 0.65 to a ground truth face;
As shown in the figure above, the purpose of the Triplet Loss loss function at the end is to directly learn the separability between features. The traditional Loss function is usually Single or Double Loss, which tends to map a type of feature face image to the same space, while Triplet Loss tries to separate the face image of an individual from the images of other people.
That is to take 3 images of human faces in each training, denoted as xia x_i^axia、 x i p x_i^p xip、 x i n x_i^n xin, Where a and p are the face image of the same person, n is the face image of another person, and the triple loss is to directly optimize the distance, which can solve the problem of facial feature representation but requires a very large amount of data. Has a very good effect.
0.2, the advantages of FaceNet
There are no restrictions in FaceNet that require face alignment. The advantage of this model is that only a small amount of processing is needed on the picture (only the face area needs to be cropped, and no additional preprocessing, such as 3d alignment, etc.), can be used as model input. At the same time, the model has a very high accuracy rate on the data set. FaceNet does not require alignment like DeepFace and DeepID. After FaceNet gets the final representation, there is no need to retrain the model for classification like DeepID, just calculate the distance directly, which is simple and effective.
Paper download portal: https://arxiv.org/pdf/1503.03832.pdf
1. Environmental preparation
1.1, dependent library installation
sudo apt-get install libopenblas-dev gfortrandir
pip3 install scipy
pip3 install scikit-learn
pip3 install Pillow
In the above-mentioned dependent libraries, due to network problems, errors such as Read timed out. will appear when downloading scipy. It is strongly recommended to download the source file package directly and install it locally. The scipy source code portal: https://github.com/scipy/scipy/releases , after the download is complete, install it locally.
pip3 install scipy-1.2.3.tar.gz
1.2, clone FaceNet project source code
git clone https://github.com/davidsandberg/facenet ./
Enter the FaceNet src directory, open the python environment, try to import the facenet runtime library, and see if the relevant built-in functions can be listed.
1.3, download the pre-trained model
Resource portal: https://github.com/davidsandberg/facenet
The pre-trained model used here is: 20180402-114759.zip, decompress it and place it in the working path.
2、Coding
2.1. Create a new FaceNet model class, write related constructors, inference functions, and destructors
import tensorflow.compat.v1 as tf
from scipy import misc
import facenet
import numpy as np
#facenet network class
class FaceNetModel():
def __init__(self, model_file):
tf.Graph().as_default()
facenet.load_model('20180402-114759.pb')
self.image_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
self.phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
self.embeddings_op = tf.get_default_graph().get_tensor_by_name("embeddings:0")
self.sess = tf.Session()
def get_descriptor(self, image):
image = misc.imresize(image, (160,160), interp = "bilinear")
image = facenet.prewhiten(image)
images = np.stack([image])
feed_dict = {
self.image_placeholder:images, self.phase_train_placeholder:False}
emb = self.sess.run(self.embeddings_op, feed_dict = feed_dict)
return emb[0,:]
def __del__(self):
self.sess.close()
def get_FaceNetModel(model_file):
return FaceNetModel(model_file)
2.2、Face register
import dlib
import cv2
import os
import facenet_model
import pickle
font = cv2.FONT_HERSHEY_SIMPLEX
detector = dlib.get_frontal_face_detector()
face_net = facenet_model.get_FaceNetModel('20180402-114759.pb')
imagePATH = '/home/colin/works/face_recognition_facenet/dataset/processed/Colin/'
def create_known(path):
global font
person_names = []
face_features = []
print("creating known face lib...")
for file_name in os.listdir(path):
if '.jpg' in file_name or '.png' in file_name:
#read imege and change to RGB
image = cv2.imread(path + file_name)
rgb_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#detect face
dets = detector(rgb_img)
if(len(dets) == 0):
continue
det = dets[0]
face_img = rgb_img[det.top():det.bottom(), det.left():det.right()]
descriptor = face_net.get_descriptor(face_img)
person_name = file_name[:file_name.rfind('_')]
person_names.append(person_name)
face_features.append(descriptor)
print('Appending + '+person_name+'...')
with open('train.pkl', 'wb') as f:
pickle.dump(person_names, f)
pickle.dump(face_features, f)
print('Face Library Created!')
#return person_names, face_features
if __name__ == '__main__':
create_known(imagePATH)
2.3、Face recognition
import cv2
import os
import time
import numpy as np
import facenet_model
import dlib
import pickle
import image_shop
################################ Global variable ######################################
person_names = []
face_features = []
imagePATH = '/home/colin/works/face_recognition_facenet/dataset/processed/Colin/'
detector = dlib.get_frontal_face_detector()
face_net = facenet_model.get_FaceNetModel('20180402-114759.pb')
########################################################################################
# 640 480 320 240
def gstreamer_pipeline(
capture_width=320,
capture_height=240,
display_width=320,
display_height=240,
framerate=30,
flip_method=0,
):
return (
"nvarguscamerasrc ! "
"video/x-raw(memory:NVMM), "
"width=(int)%d, height=(int)%d, "
"format=(string)NV12, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
"video/x-raw, format=(string)BGR ! appsink"
% (
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)
def train_data_load():
global person_names, face_features
with open('train.pkl','rb') as f:
person_names=pickle.load(f)
face_features=pickle.load(f)
def facenet_recognition(image):
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
dets = detector(rgb_image)
#dets = detector(gray_image, 2)
for det in dets:
if(det.top() < 0) or (det.bottom() < 0) or (det.left() < 0) or (det.right() < 0):
continue
#get the face area
face_img = rgb_image[det.top():det.bottom(), det.left():det.right()]
#get the face descriptor
descriptor = face_net.get_descriptor(face_img)
min_dist = 0.7 #1
person_name = 'unknown'
for i in range(len(face_features)):
dist = np.linalg.norm(descriptor-face_features[i])
print('dist:', dist)
if dist < min_dist:
min_dist = dist
person_name = person_names[i]
cv2.rectangle(image, (det.left(), det.top()+10), (det.right(), det.bottom()), (0, 255, 0), 2)
cv2.putText(image, person_name , (det.left(),det.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,0,0), 2, cv2.LINE_AA)
image = image_shop.mark_add(det.left(), det.right(), det.top(), det.bottom(), image)
return image
def face_recognition_livevideo(window_name, camera_idx):
cv2.namedWindow(window_name)
#CSI Camera for get pipeline
cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=camera_idx), cv2.CAP_GSTREAMER)
while cap.isOpened():
ok, frame = cap.read() #read 1 frame
if not ok:
break
resImage = facenet_recognition(frame)
#display
cv2.imshow(window_name, resImage)
c = cv2.waitKey(1)
if c & 0xFF == ord('q'):
break
#close
cap.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
train_data_load()
face_recognition_livevideo('Find Face', 0)
3. Demo effect
Reference appendix
1) "Introduction to the development of artificial intelligence examples based on the NVIDIA Jetson platform"