Real-time face recognition training based on FaceNet

FaceNet face feature extraction

FaceNet is a deep neural network for extracting features from facial images. It was proposed by Google researchers Schroff et al.

Paper address: https://arxiv.org/abs/1503.03832

The working principle of FaceNet is to input a face image, compress it and output it as a vector consisting of 128 bits, representing the basic characteristics of the face. This vector is called an embedding, (all relevant information from the face image is embedded into the vector).

So, how to realize face recognition through FaceNet?

A common approach is to take an image embedding and calculate the distance to a picture of a known face. Usually calculated using the law of cosines or the Euclidean distance formula. If the calculated face distance is close enough to the embedding of a known face, we assume that the face belongs to the same person.

So the question is, how does FaceNet know what to extract from a face image?

In order to train a face recognizer, we need many images of faces. Like every machine learning problem, training typically requires thousands of different images. When we start the training process, the model generates random vectors for each image, which means that the images are randomly distributed.

Learning steps:

Randomly select an anchor image;
Randomly select positive sample images of the same person as the base image;
Randomly select negative images of people different from the main image;
Adjust the parameters of the FaceNet neural network so that the positive samples are closer to the anchor than the negative samples.

We repeat these four steps until changes are no longer required or are so small that they have no effect. After training, all faces of the same person are close to each other in distance, and far away from different faces.

faceNet.pyComplete object code:

# faceNet.py
import cv2import cv2
import stow
import typing
import numpy as np
import onnxruntime as ort

class FaceNet:
    """FaceNet class object, which can be used for simplified face recognition
    """
    def __init__(
        self, 
        detector: object,
        onnx_model_path: str = "models/faceNet.onnx", 
        anchors: typing.Union[str, dict] = 'faces',
        force_cpu: bool = False,
        threshold: float = 0.5,
        color: tuple = (255, 255, 255),
        thickness: int = 2,
        ) -> None:
        
        """Object for face recognition
        Params:
            detector: (object) - detector object to detect faces in image
            onnx_model_path: (str) - path to onnx model
            force_cpu: (bool) - if True, onnx model will be run on CPU
            anchors: (str or dict) - path to directory with faces or dictionary with anchor names as keys and anchor encodings as values
            threshold: (float) - threshold for face recognition
            color: (tuple) - color of bounding box and text
            thickness: (int) - thickness of bounding box and text
        """
        
        if not stow.exists(onnx_model_path):
            raise Exception(f"Model doesn't exists in {onnx_model_path}")

        self.detector = detector
        self.threshold = threshold
        self.color = color
        self.thickness = thickness

        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']

        providers = providers if ort.get_device() == "GPU" and not force_cpu else providers[::-1]

        self.ort_sess = ort.InferenceSession(onnx_model_path, providers=providers)

        self.input_shape = self.ort_sess._inputs_meta[0].shape[1:3]
        
        self.anchors = self.load_anchors(anchors) if isinstance(anchors, str) else anchors

    def normalize(self, img: np.ndarray) -> np.ndarray:
        
        """Normalize image
        Args:
            img: (np.ndarray) - image to be normalized
        Returns:
            img: (np.ndarray) - normalized image
        """
        
        mean, std = img.mean(), img.std()
        return (img - mean) / std

    def l2_normalize(self, x: np.ndarray, axis: int = -1, epsilon: float = 1e-10) -> np.ndarray:
        
        """l2 normalization function
        Args:
            x: (np.ndarray) - input array
            axis: (int) - axis to normalize
            epsilon: (float) - epsilon to avoid division by zero
        Returns:
            x: (np.ndarray) - normalized array
        """
        
        output = x / np.sqrt(np.maximum(np.sum(np.square(x), axis=axis, keepdims=True), epsilon))
        return output

    def detect_save_faces(self, image: np.ndarray, output_dir: str = "faces"):
        
        """Detect faces in given image and save them to output_dir
        Args:
            image: (np.ndarray) - image to be processed
            output_dir: (str) - directory where faces will be saved
        Returns:
            bool: (bool) - True if faces were detected and saved
        """
        
        face_crops = [image[t:b, l:r] for t, l, b, r in self.detector(image, return_tlbr=True)]

        if face_crops == []: 
            return False

        stow.mkdir(output_dir)

        for index, crop in enumerate(face_crops):
            output_path = stow.join(output_dir, f"face_{str(index)}.png")
            cv2.imwrite(output_path, crop)
            print("Crop saved to:", output_path)

        self.anchors = self.load_anchors(output_dir)
        
        return True

    
    def load_anchors(self, faces_path: str):
        
        """Generate anchors for given faces path
        Args:
            faces_path: (str) - path to directory with faces
        Returns:
            anchors: (dict) - dictionary with anchor names as keys and anchor encodings as values
        """
        
        anchors = {}
        if not stow.exists(faces_path):
            return {}

        for face_path in stow.ls(faces_path):
            anchors[stow.basename(face_path)] = self.encode(cv2.imread(face_path.path))

        return anchors

    def encode(self, face_image: np.ndarray) -> np.ndarray:
        """Encode face image with FaceNet model
        Args 
            face_image: (np.ndarray) - face image to be encoded
            
        Returns:
            face_encoding: (np.ndarray) - face encoding
        """
        face = self.normalize(face_image)
        face = cv2.resize(face, self.input_shape).astype(np.float32)

        encode = self.ort_sess.run(None, {self.ort_sess._inputs_meta[0].name: np.expand_dims(face, axis=0)})[0][0]
        normalized_encode = self.l2_normalize(encode)

        return normalized_encode

    def cosine_distance(self, a: np.ndarray, b: typing.Union[np.ndarray, list]) -> np.ndarray:
        
        """Cosine distance between wectors a and b
        Args:
            a: (np.ndarray) - first vector
            b: (np.ndarray) - second list of vectors
        Returns:
            distance: (float) - cosine distance
        """
        
        if isinstance(a, list):
            a = np.array(a)

        if isinstance(b, list):
            b = np.array(b)

        return np.dot(a, b.T) / (np.linalg.norm(a) * np.linalg.norm(b))

    def draw(self, image: np.ndarray, face_crops: dict):
        
        """Draw face crops on image
        Args:
            image: (np.ndarray) - image to be drawn on
            face_crops: (dict) - dictionary with face crops as values and face names as keys
        Returns:
            image: (np.ndarray) - image with drawn face crops
        """
        
        for value in face_crops.values():
            t, l, b, r = value["tlbr"]
            cv2.rectangle(image, (l, t), (r, b), self.color, self.thickness)
            cv2.putText(image, stow.name(value['name']), (l, t - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, self.color, self.thickness)

        return image

    def __call__(self, frame: np.ndarray) -> np.ndarray:
        
        """Face recognition pipeline
        Args:
            frame: (np.ndarray) - image to be processed
        Returns:
            frame: (np.ndarray) - image with drawn face recognition results
        """
        
        face_crops = {index: {"name": "Unknown", "tlbr": tlbr} for index, tlbr in enumerate(self.detector(frame, return_tlbr=True))}
        for key, value in face_crops.items():
            t, l, b, r = value["tlbr"]
            face_encoding = self.encode(frame[t:b, l:r])
            distances = self.cosine_distance(face_encoding, list(self.anchors.values()))
            if np.max(distances) > self.threshold:
                face_crops[key]["name"] = list(self.anchors.keys())[np.argmax(distances)]

        frame = self.draw(frame, face_crops)

        return frame

We can use faceNet/convert_to_onnx.pya script to do the conversion:

# faceNet/convert_to_onnx.py
import os
import tensorflow as tf
import tf2onnx
from architecture import InceptionResNetV2

if __name__ == '__main__':
    """ weights can be downloaded from https://drive.google.com/drive/folders/1scGoVCQp-cNwKTKOUqevCP1N2LlyXU3l?usp=sharing
    Put facenet_keras_weights.h5 file in model folder
    """
    facenet_weights_path = "models/facenet_keras_weights.h5"
    onnx_model_output_path = "models/faceNet.onnx"

    if not os.path.exists(facenet_weights_path):
        raise Exception(f"Model doesn't exists in {facenet_weights_path}, download weights from \
            https://drive.google.com/drive/folders/1scGoVCQp-cNwKTKOUqevCP1N2LlyXU3l?usp=sharing")

    faceNet = InceptionResNetV2()
    faceNet.load_weights(facenet_weights_path) 

    spec = (tf.TensorSpec(faceNet.inputs[0].shape, tf.float32, name="image_input"),)
    tf2onnx.convert.from_keras(faceNet, output_path=onnx_model_output_path, input_signature=spec)
view raw

First, download the weights from the given link in the code and place them in the model folder. Then run faceNet/convert_to_onnx.pythe code with python, which is able to convert the model to .onnxformat.

Once we have the model, we can open main.pythe script and run the webcam real-time face recognition with the following code:

# main.py
from utils import FPSmetric
from engine import Engine
from faceDetection import MPFaceDetection
from faceNet.faceNet import FaceNet

if __name__ == '__main__':
    facenet = FaceNet(
        detector = MPFaceDetection(),
        onnx_model_path = "models/faceNet.onnx", 
        anchors = "faces",
        force_cpu = True,
    )
    engine = Engine(webcam_id=0, show=True, custom_objects=[facenet, FPSmetric()])

    # save first face crop as anchor, otherwise don't use
    while not facenet.detect_save_faces(engine.process_webcam(return_frame=True), output_dir="faces"):
        continue

    engine.run()

When given the storage path of the model. We give it the path to save the anchors; it must be an image with a face crop. Guarantees that the model will load this anchor and display the corresponding name when a match is found.

Next, we need to create an engine object that is responsible for processing image, video, or webcam streams; webcams can optionally be processed. Use the "show" parameter.

Also, we can add a FPSmetric to know how fast the face recognition works.

Finally, we need to pass the "facenet" object to the "custom_objects" parameter. Here we can add more, "pencil sketch", "background removal" or other entities we want.

We can also create a function that grabs the first webcam frame, and if it finds a face in it, it crops and saves it:

while not facenet.detect_save_faces(engine.process_webcam(return_frame=True), output_dir="faces"):
    continue

This way we created a system that can do real-time face recognition on our CPU, and it runs at around 30 fps, which is more than enough for us!

source:

GitHub address: https://github.com/pythonlessons/background_removal

Real-time face recognition training based on FaceNet

This way we created a system that can do real-time face recognition on our CPU, and it runs at around 30 fps, which is more than enough for us!

Guess you like