FaceNet face feature extraction
FaceNet is a deep neural network for extracting features from facial images. It was proposed by Google researchers Schroff et al.
Paper address: https://arxiv.org/abs/1503.03832
The working principle of FaceNet is to input a face image, compress it and output it as a vector consisting of 128 bits, representing the basic characteristics of the face. This vector is called an embedding, (all relevant information from the face image is embedded into the vector).
So, how to realize face recognition through FaceNet?
A common approach is to take an image embedding and calculate the distance to a picture of a known face. Usually calculated using the law of cosines or the Euclidean distance formula. If the calculated face distance is close enough to the embedding of a known face, we assume that the face belongs to the same person.
So the question is, how does FaceNet know what to extract from a face image?
In order to train a face recognizer, we need many images of faces. Like every machine learning problem, training typically requires thousands of different images. When we start the training process, the model generates random vectors for each image, which means that the images are randomly distributed.
Learning steps:
- Randomly select an anchor image;
- Randomly select positive sample images of the same person as the base image;
- Randomly select negative images of people different from the main image;
- Adjust the parameters of the FaceNet neural network so that the positive samples are closer to the anchor than the negative samples.
We repeat these four steps until changes are no longer required or are so small that they have no effect. After training, all faces of the same person are close to each other in distance, and far away from different faces.
faceNet.py
Complete object code:
# faceNet.py
import cv2import cv2
import stow
import typing
import numpy as np
import onnxruntime as ort
class FaceNet:
"""FaceNet class object, which can be used for simplified face recognition
"""
def __init__(
self,
detector: object,
onnx_model_path: str = "models/faceNet.onnx",
anchors: typing.Union[str, dict] = 'faces',
force_cpu: bool = False,
threshold: float = 0.5,
color: tuple = (255, 255, 255),
thickness: int = 2,
) -> None:
"""Object for face recognition
Params:
detector: (object) - detector object to detect faces in image
onnx_model_path: (str) - path to onnx model
force_cpu: (bool) - if True, onnx model will be run on CPU
anchors: (str or dict) - path to directory with faces or dictionary with anchor names as keys and anchor encodings as values
threshold: (float) - threshold for face recognition
color: (tuple) - color of bounding box and text
thickness: (int) - thickness of bounding box and text
"""
if not stow.exists(onnx_model_path):
raise Exception(f"Model doesn't exists in {onnx_model_path}")
self.detector = detector
self.threshold = threshold
self.color = color
self.thickness = thickness
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
providers = providers if ort.get_device() == "GPU" and not force_cpu else providers[::-1]
self.ort_sess = ort.InferenceSession(onnx_model_path, providers=providers)
self.input_shape = self.ort_sess._inputs_meta[0].shape[1:3]
self.anchors = self.load_anchors(anchors) if isinstance(anchors, str) else anchors
def normalize(self, img: np.ndarray) -> np.ndarray:
"""Normalize image
Args:
img: (np.ndarray) - image to be normalized
Returns:
img: (np.ndarray) - normalized image
"""
mean, std = img.mean(), img.std()
return (img - mean) / std
def l2_normalize(self, x: np.ndarray, axis: int = -1, epsilon: float = 1e-10) -> np.ndarray:
"""l2 normalization function
Args:
x: (np.ndarray) - input array
axis: (int) - axis to normalize
epsilon: (float) - epsilon to avoid division by zero
Returns:
x: (np.ndarray) - normalized array
"""
output = x / np.sqrt(np.maximum(np.sum(np.square(x), axis=axis, keepdims=True), epsilon))
return output
def detect_save_faces(self, image: np.ndarray, output_dir: str = "faces"):
"""Detect faces in given image and save them to output_dir
Args:
image: (np.ndarray) - image to be processed
output_dir: (str) - directory where faces will be saved
Returns:
bool: (bool) - True if faces were detected and saved
"""
face_crops = [image[t:b, l:r] for t, l, b, r in self.detector(image, return_tlbr=True)]
if face_crops == []:
return False
stow.mkdir(output_dir)
for index, crop in enumerate(face_crops):
output_path = stow.join(output_dir, f"face_{str(index)}.png")
cv2.imwrite(output_path, crop)
print("Crop saved to:", output_path)
self.anchors = self.load_anchors(output_dir)
return True
def load_anchors(self, faces_path: str):
"""Generate anchors for given faces path
Args:
faces_path: (str) - path to directory with faces
Returns:
anchors: (dict) - dictionary with anchor names as keys and anchor encodings as values
"""
anchors = {}
if not stow.exists(faces_path):
return {}
for face_path in stow.ls(faces_path):
anchors[stow.basename(face_path)] = self.encode(cv2.imread(face_path.path))
return anchors
def encode(self, face_image: np.ndarray) -> np.ndarray:
"""Encode face image with FaceNet model
Args
face_image: (np.ndarray) - face image to be encoded
Returns:
face_encoding: (np.ndarray) - face encoding
"""
face = self.normalize(face_image)
face = cv2.resize(face, self.input_shape).astype(np.float32)
encode = self.ort_sess.run(None, {self.ort_sess._inputs_meta[0].name: np.expand_dims(face, axis=0)})[0][0]
normalized_encode = self.l2_normalize(encode)
return normalized_encode
def cosine_distance(self, a: np.ndarray, b: typing.Union[np.ndarray, list]) -> np.ndarray:
"""Cosine distance between wectors a and b
Args:
a: (np.ndarray) - first vector
b: (np.ndarray) - second list of vectors
Returns:
distance: (float) - cosine distance
"""
if isinstance(a, list):
a = np.array(a)
if isinstance(b, list):
b = np.array(b)
return np.dot(a, b.T) / (np.linalg.norm(a) * np.linalg.norm(b))
def draw(self, image: np.ndarray, face_crops: dict):
"""Draw face crops on image
Args:
image: (np.ndarray) - image to be drawn on
face_crops: (dict) - dictionary with face crops as values and face names as keys
Returns:
image: (np.ndarray) - image with drawn face crops
"""
for value in face_crops.values():
t, l, b, r = value["tlbr"]
cv2.rectangle(image, (l, t), (r, b), self.color, self.thickness)
cv2.putText(image, stow.name(value['name']), (l, t - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, self.color, self.thickness)
return image
def __call__(self, frame: np.ndarray) -> np.ndarray:
"""Face recognition pipeline
Args:
frame: (np.ndarray) - image to be processed
Returns:
frame: (np.ndarray) - image with drawn face recognition results
"""
face_crops = {index: {"name": "Unknown", "tlbr": tlbr} for index, tlbr in enumerate(self.detector(frame, return_tlbr=True))}
for key, value in face_crops.items():
t, l, b, r = value["tlbr"]
face_encoding = self.encode(frame[t:b, l:r])
distances = self.cosine_distance(face_encoding, list(self.anchors.values()))
if np.max(distances) > self.threshold:
face_crops[key]["name"] = list(self.anchors.keys())[np.argmax(distances)]
frame = self.draw(frame, face_crops)
return frame
We can use faceNet/convert_to_onnx.py
a script to do the conversion:
# faceNet/convert_to_onnx.py
import os
import tensorflow as tf
import tf2onnx
from architecture import InceptionResNetV2
if __name__ == '__main__':
""" weights can be downloaded from https://drive.google.com/drive/folders/1scGoVCQp-cNwKTKOUqevCP1N2LlyXU3l?usp=sharing
Put facenet_keras_weights.h5 file in model folder
"""
facenet_weights_path = "models/facenet_keras_weights.h5"
onnx_model_output_path = "models/faceNet.onnx"
if not os.path.exists(facenet_weights_path):
raise Exception(f"Model doesn't exists in {facenet_weights_path}, download weights from \
https://drive.google.com/drive/folders/1scGoVCQp-cNwKTKOUqevCP1N2LlyXU3l?usp=sharing")
faceNet = InceptionResNetV2()
faceNet.load_weights(facenet_weights_path)
spec = (tf.TensorSpec(faceNet.inputs[0].shape, tf.float32, name="image_input"),)
tf2onnx.convert.from_keras(faceNet, output_path=onnx_model_output_path, input_signature=spec)
view raw
First, download the weights from the given link in the code and place them in the model folder. Then run faceNet/convert_to_onnx.py
the code with python, which is able to convert the model to .onnx
format.
Once we have the model, we can open main.py
the script and run the webcam real-time face recognition with the following code:
# main.py
from utils import FPSmetric
from engine import Engine
from faceDetection import MPFaceDetection
from faceNet.faceNet import FaceNet
if __name__ == '__main__':
facenet = FaceNet(
detector = MPFaceDetection(),
onnx_model_path = "models/faceNet.onnx",
anchors = "faces",
force_cpu = True,
)
engine = Engine(webcam_id=0, show=True, custom_objects=[facenet, FPSmetric()])
# save first face crop as anchor, otherwise don't use
while not facenet.detect_save_faces(engine.process_webcam(return_frame=True), output_dir="faces"):
continue
engine.run()
When given the storage path of the model. We give it the path to save the anchors; it must be an image with a face crop. Guarantees that the model will load this anchor and display the corresponding name when a match is found.
Next, we need to create an engine object that is responsible for processing image, video, or webcam streams; webcams can optionally be processed. Use the "show" parameter.
Also, we can add a FPSmetric to know how fast the face recognition works.
Finally, we need to pass the "facenet" object to the "custom_objects" parameter. Here we can add more, "pencil sketch", "background removal" or other entities we want.
We can also create a function that grabs the first webcam frame, and if it finds a face in it, it crops and saves it:
while not facenet.detect_save_faces(engine.process_webcam(return_frame=True), output_dir="faces"):
continue
This way we created a system that can do real-time face recognition on our CPU, and it runs at around 30 fps, which is more than enough for us!
source:
GitHub address: https://github.com/pythonlessons/background_removal