Python+Keras+opencv realizes face recognition

Keras is a better entry framework for artificial intelligence. It is an advanced Python neural network framework, which has been added to TensorFlow as its default framework, providing a more advanced API for TensorFlow. If TensorFlow is likened to Java or C++ in the programming world, then Keras is the Python in the programming world. As a high-level package of TensorFlow, it can be used in conjunction with TensorFlow and can be used to quickly build models. And Keras is officially supported by TensorFlow. When a GPU is available on the machine, the code will automatically call the GPU for parallel computing.

The full name of OpenCV is Open source Computer Vision Library, an open source computer vision library. In other words, it is a set of open source API function libraries for computer vision. This means that (1) whether it is scientific research or commercial applications, you can use it for development; (2) the source code of all API functions is public, and you can see the program steps of its internal implementation ; (3) You can modify the source code of OpenCV, compile and generate the specific API functions you need. However, as a library, it only provides APIs for some commonly used, classic, and popular algorithms. A typical computer vision algorithm should include the following steps: (1) data acquisition (for OpenCV, it is a picture); (2) preprocessing; (3) feature extraction; (4) feature selection; (5) classification Design and training; (6) classification and discrimination; and OpenCV provides APIs for these six parts.

After introducing Keras and opencv, let's take a look at the final effect of the model: (The mosaic is a bit rough, forgive me)

I used hundreds of training data, including myself and friends. I mainly measured the effect. As shown in the figure below, me is the blogger himself, and others are friends. Realize the function of face recognition. I am also interested in computer vision and want to try it. (The code only implements the function, and still needs to learn from the big guys who do deep learning~)

 

Implementation steps

The first step: face detection

This step needs to use the haarcascade_frontalface_alt2.xml classifier under the opencv package. This classifier can be used to detect faces, and then we write our own code to circle the faces and realize the first step of face recognition, face detection.

Create a new face_recognition_file project, create a new data_preparation.py file under the project to prepare face data

def detect_face():
    # 调用分类器
    face_cascade = cv2.CascadeClassifier('/anaconda3/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml')
    # 打开摄像头
    camera = cv2.VideoCapture(0)
    # while循环,一帧一帧将读取视频流数据
    while True:
        ret, frame = camera.read()
        # 将图片灰度化
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # 检测人脸
        faces = face_cascade.detectMultiScale(gray, 1.1, 5)
        #x,y为矩形的对角线顶点的坐标,w为宽度,h为高度
        for (x, y, w, h) in faces:
            img = cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)

        cv2.imshow('camera', frame)
        if cv2.waitKey(1) & 0xff == ord("q"):
            break
    camera.release()
    cv2.destroyAllWindows()

The effect is as follows:

Step 2: Data preparation

In this step, you only need to slightly adjust the code of the first step and save the pictures in the video stream.

Modify the data_preparation.py code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Seven'
import cv2


def CatchPICFromVideo(catch_num, path_name):
    face_cascade = cv2.CascadeClassifier('/anaconda3/share/OpenCV/haarcascades/haarcascade_frontalface_alt2.xml') # haarcascade_frontalface_alt2
    camera = cv2.VideoCapture(0)
    num = 0
    while True:
        ret, frame = camera.read()
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.1, 5)
        for (x, y, w, h) in faces:
            img_name = f'{path_name}/{str(num)}.jpg'
            image = frame[y:y + h, x:x + w]
            print(img_name)
            cv2.imwrite(img_name, image)
            num += 1
            if num > catch_num:
                break

            # 画出矩形框圈出人脸
            cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)
            # 显示捕捉了多少张人脸
            font = cv2.FONT_HERSHEY_SIMPLEX
            cv2.putText(frame, f'num:{str(num)}', (x + 30, y + 30), font, 1, (255, 0, 255), 4)
        if num > catch_num:
            break
        # 显示图像
        cv2.imshow('camera', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    camera.release()
    cv2.destroyAllWindows()


if __name__ == '__main__':
    CatchPICFromVideo(100, './data/me')

Create a new data folder in the current working directory, and create a new me folder under the data folder to save your own pictures, and create a new others to save other people’s pictures

The third step: image data preprocessing

The image input to the convolutional neural network needs to have a fixed size. We need to use OpenCV to unify the size of the image for input into the convolutional neural network. We create a face_dataset.pyfile in the working directory , define a resize_imagefunction to adjust the image size, and then Read the picture into the Python list for further processing with Python, and annotate the data with " I " as 0 and others as "1". In face_dataset.pyLane then define a read_pathfunction to read images from the specified path to the Python list, then define a load_datasetfunction call read_pathfunction to read image and into a more suitable machine learning numpy multidimensional array, and then by determining the difference between the picture image pathname Data is marked as 0 and 1
 

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Seven'
import os
import numpy as np
import cv2

# 定义图片尺寸
IMAGE_SIZE = 64


# 按照定义图像大小进行尺度调整
def resize_image(image, height=IMAGE_SIZE, width=IMAGE_SIZE):
    top, bottom, left, right = 0, 0, 0, 0
    # 获取图像尺寸
    h, w, _ = image.shape
    # 找到图片最长的一边
    longest_edge = max(h, w)
    # 计算短边需要填充多少使其与长边等长
    if h < longest_edge:
        d = longest_edge - h
        top = d // 2
        bottom = d // 2
    elif w < longest_edge:
        d = longest_edge - w
        left = d // 2
        right = d // 2
    else:
        pass

    # 设置填充颜色
    BLACK = [0, 0, 0]
    # 对原始图片进行填充操作
    constant = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=BLACK)
    # 调整图像大小并返回
    return cv2.resize(constant, (height, width))

images, labels = list(), list()
# 读取训练数据
def read_path(path_name):
    for dir_item in os.listdir(path_name):
        # 合并成可识别的操作路径
        full_path = os.path.abspath(os.path.join(path_name, dir_item))
        # 如果是文件夹,则继续递归调用
        if os.path.isdir(full_path):
            read_path(full_path)
        else:
            if dir_item.endswith('.jpg'):
                # print(dir_item)
                image = cv2.imread(full_path)
                image = resize_image(image, IMAGE_SIZE, IMAGE_SIZE)
                images.append(image)
                labels.append(path_name)
    print(labels)
    return images, labels


# 从指定路径读取训练数据
def load_dataset(path_name):
    images, labels = read_path(path_name)
    # 由于图片是基于矩阵计算的, 将其转为矩阵
    images = np.array(images)
    print(images.shape)
    labels = np.array([0 if label.endswith('me') else 1 for label in labels])
    return images, labels


if __name__ == '__main__':
    images, labels = load_dataset(os.getcwd()+'/data')
    print('load over')

Step 4: Use Keras to build a convolutional neural network

Create a face_train.pyfile in the working directory , the code is as follows:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Seven'
import random
import numpy as np
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers import Conv2D, MaxPool2D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import load_model
from keras import backend as K
from face_dataset import load_dataset, resize_image, IMAGE_SIZE
import warnings
warnings.filterwarnings('ignore')


class Dataset:
    def __init__(self, path_name):
        # 训练集
        self.train_images = None
        self.train_labels = None
        # 验证集
        # self.valid_images = None
        # self.valid_labels = None
        # 测试集
        self.test_images = None
        self.test_labels = None
        # 数据加载路径
        self.path_name = path_name
        # 当前库采用的维度顺序
        self.input_shape = None

    def load(self, img_rows=IMAGE_SIZE, img_cols=IMAGE_SIZE, img_channels=3, nb_classes=2):
        # 加载数据集至内存
        images, labels = load_dataset(self.path_name)
        train_images, test_images, train_labels, test_labels = train_test_split(images, labels, test_size=0.3,
                                                                                random_state=random.randint(0, 10))
        if K.image_dim_ordering() == 'th':
            train_images = train_images.reshape(train_images.shape[0], img_channels, img_rows, img_cols)
            test_images = test_images.reshape(test_images.shape[0], img_channels, img_rows, img_cols)
            self.input_shape = (img_channels, img_rows, img_cols)
        else:
            train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, img_channels)
            test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, img_channels)
            self.input_shape = (img_rows, img_cols, img_channels)

            # 输出训练集、测试集的数量
            print(train_images.shape[0], 'train samples')
            print(test_images.shape[0], 'test samples')
            # 我们的模型使用categorical_crossentropy作为损失函数,因此需要根据类别数量nb_classes将
            # 类别标签进行one-hot编码使其向量化,在这里我们的类别只有两种,经过转化后标签数据变为二维
            train_labels = np_utils.to_categorical(train_labels, nb_classes)
            test_labels = np_utils.to_categorical(test_labels, nb_classes)
            # 像素数据浮点化以便归一化
            train_images = train_images.astype('float32')
            test_images = test_images.astype('float32')
            # 将其归一化,图像的各像素值归一化到0~1区间
            train_images /= 255.0
            test_images /= 255.0
            self.train_images = train_images
            self.test_images = test_images
            self.train_labels = train_labels
            self.test_labels = test_labels


# CNN网络模型类
class Model:
    def __init__(self):
        self.model = None

    # 建立模型
    def build_model(self, dataset, nb_classes=2):
        # 构建一个空的网络模型,它是一个线性堆叠模型,各神经网络层会被顺序添加,专业名称为序贯模型或线性堆叠模型
        self.model = Sequential()

        # 以下代码将顺序添加CNN网络需要的各层,一个add就是一个网络层
        self.model.add(Conv2D(32, 3, 3, border_mode='same',
                                     input_shape=dataset.input_shape))  # 1 2维卷积层
        self.model.add(Activation('relu'))  # 2 激活函数层

        self.model.add(Conv2D(32, 3, 3))  # 3 2维卷积层
        self.model.add(Activation('relu'))  # 4 激活函数层

        self.model.add(MaxPool2D(pool_size=(2, 2)))  # 5 池化层
        self.model.add(Dropout(0.25))  # 6 Dropout层

        self.model.add(Conv2D(64, 3, 3, border_mode='same'))  # 7  2维卷积层
        self.model.add(Activation('relu'))  # 8  激活函数层

        self.model.add(Conv2D(64, 3, 3))  # 9  2维卷积层
        self.model.add(Activation('relu'))  # 10 激活函数层

        self.model.add(MaxPool2D(pool_size=(2, 2)))  # 11 池化层
        self.model.add(Dropout(0.25))  # 12 Dropout层

        self.model.add(Flatten())  # 13 Flatten层
        self.model.add(Dense(512))  # 14 Dense层,又被称作全连接层
        self.model.add(Activation('relu'))  # 15 激活函数层
        self.model.add(Dropout(0.5))  # 16 Dropout层
        self.model.add(Dense(nb_classes))  # 17 Dense层
        self.model.add(Activation('softmax'))  # 18 分类层,输出最终结果

        # 输出模型概况
        self.model.summary()

    # 训练模型
    def train(self, dataset, batch_size=20, nb_epoch=100, data_augmentation=True):
        sgd = SGD(lr=0.01, decay=1e-6,
                  momentum=0.9, nesterov=True)  # 采用SGD+momentum的优化器进行训练,首先生成一个优化器对象
        self.model.compile(loss='categorical_crossentropy',
                           optimizer=sgd,
                           metrics=['accuracy'])  # 完成实际的模型配置工作

        # 不使用数据提升,所谓的提升就是从我们提供的训练数据中利用旋转、翻转、加噪声等方法创造新的
        # 训练数据,有意识的提升训练数据规模,增加模型训练量
        if not data_augmentation:
            self.model.fit(dataset.train_images,
                           dataset.train_labels,
                           batch_size=batch_size,
                           nb_epoch=nb_epoch,
                           validation_data=(dataset.test_images, dataset.test_labels),
                           shuffle=True)
        # 使用实时数据提升
        else:
            # 定义数据生成器用于数据提升,其返回一个生成器对象datagen,datagen每被调用一
            # 次其生成一组数据(顺序生成),节省内存,其实就是python的数据生成器
            datagen = ImageDataGenerator(
                featurewise_center=False,  # 是否使输入数据去中心化(均值为0),
                samplewise_center=False,  # 是否使输入数据的每个样本均值为0
                featurewise_std_normalization=False,  # 是否数据标准化(输入数据除以数据集的标准差)
                samplewise_std_normalization=False,  # 是否将每个样本数据除以自身的标准差
                zca_whitening=False,  # 是否对输入数据施以ZCA白化
                rotation_range=20,  # 数据提升时图片随机转动的角度(范围为0~180)
                width_shift_range=0.2,  # 数据提升时图片水平偏移的幅度(单位为图片宽度的占比,0~1之间的浮点数)
                height_shift_range=0.2,  # 同上,只不过这里是垂直
                horizontal_flip=True,  # 是否进行随机水平翻转
                vertical_flip=False)  # 是否进行随机垂直翻转

            # 计算整个训练样本集的数量以用于特征值归一化、ZCA白化等处理
            datagen.fit(dataset.train_images)

            # 利用生成器开始训练模型
            self.model.fit_generator(datagen.flow(dataset.train_images, dataset.train_labels,
                                                  batch_size=batch_size),
                                     samples_per_epoch=dataset.train_images.shape[0],
                                     nb_epoch=nb_epoch,
                                     validation_data=(dataset.test_images, dataset.test_labels))

    MODEL_PATH = './face.model.h5'

    def save_model(self, file_path=MODEL_PATH):
        self.model.save(file_path)

    def load_model(self, file_path=MODEL_PATH):
        self.model = load_model(file_path)

    def evaluate(self, dataset):
        score = self.model.evaluate(dataset.test_images, dataset.test_labels, verbose=1)
        # print("%s: %.2f%%" % (self.model.metrics_names[1], score[1] * 100))
        print(f'{self.model.metrics_names[1]}:{score[1] * 100}%')

    # 识别人脸
    def face_predict(self, image):
        # 依然是根据后端系统确定维度顺序
        if K.image_dim_ordering() == 'th' and image.shape != (1, 3, IMAGE_SIZE, IMAGE_SIZE):
            image = resize_image(image)  # 尺寸必须与训练集一致都应该是IMAGE_SIZE x IMAGE_SIZE
            image = image.reshape((1, 3, IMAGE_SIZE, IMAGE_SIZE))  # 与模型训练不同,这次只是针对1张图片进行预测
        elif K.image_dim_ordering() == 'tf' and image.shape != (1, IMAGE_SIZE, IMAGE_SIZE, 3):
            image = resize_image(image)
            image = image.reshape((1, IMAGE_SIZE, IMAGE_SIZE, 3))

            # 浮点并归一化
        image = image.astype('float32')
        image /= 255.0

        # 给出输入属于各个类别的概率,我们是二值类别,则该函数会给出输入图像属于0和1的概率各为多少
        result = self.model.predict_proba(image)
        print('result:', result)

        # 给出类别预测:0或者1
        result = self.model.predict_classes(image)

        # 返回类别预测结果
        return result[0]


if __name__ == '__main__':
    dataset = Dataset('./data/')
    dataset.load()

    # 训练模型,这段代码不用,注释掉
    model = Model()
    model.build_model(dataset)
    # 测试训练函数的代码
    model.train(dataset)
    model.save_model(file_path='./model/me.face.model.h5')
    # 评估模型
    # model = Model()
    # model.load_model(file_path='./model/me.face.model.h5')
    # model.evaluate(dataset)




In the above piece of code, after loading the data, I first use scikit-learn to divide the data set into training set (used to train the model) and test set (used to evaluate the effect of the trained model), scikit-learn is A traditional Python-based machine learning library that can be used interactively with Numpy. The train_test_splitfunction is used here to divide the data set into training set and test set at a ratio of 70%/30%. It should be noted that this function will take the data by default. The sets are randomly scattered, so that the data distribution of the training set and the test set are more similar. After dividing the data set, we must also consider the issue of the image data format followed by the Keras back-end. When the Keras back-end engine is Tensorflow, the back-end image_data_formatattribute is channel_last, that is to say, the dimensional format of the image data is (rows, cols, channels), if the back-end engine is In Theano, the image_data_formatattribute is channel_firstthe dimensional format of the image data (channels, rows, cols). Therefore, in order to make the program more robust, we need image_data_formatto adjust the dimensional order of the image data set by judging the value of the attribute.

Step 5: Test

Create a new findme.py to test the function of this project

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Seven'

import cv2
from face_train import Model

if __name__ == '__main__':
    # 加载模型
    model = Model()
    model.load_model(file_path='./model/me.face.model.h5')

    # 框住人脸的矩形边框颜色
    color = (0, 255, 0)

    # 捕获指定摄像头的实时视频流
    camera = cv2.VideoCapture(0)

    # 人脸识别分类器本地存储路径
    cascade_path = "/anaconda3/share/OpenCV/haarcascades/haarcascade_frontalface_alt2.xml"

    # 循环检测识别人脸
    while True:
        ret, frame = camera.read()  # 读取一帧视频

        # 图像灰化,降低计算复杂度
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # 使用人脸识别分类器,读入分类器
        cascade = cv2.CascadeClassifier(cascade_path)

        # 利用分类器识别出哪个区域为人脸
        faces = cascade.detectMultiScale(gray, 1.1, 5)
        if len(faces) > 0:
            for (x, y, w, h) in faces:
                # 截取脸部图像提交给模型识别这是谁
                image = frame[y: y + h, x: x + w]
                faceID = model.face_predict(image)

                # 如果是“我”
                if faceID == 0:
                    cv2.rectangle(frame, (x, y), (x + w, y + h), color, thickness=2)

                    # 文字提示是谁
                    cv2.putText(frame, 'me',
                                (x + 30, y + 30),  # 坐标
                                cv2.FONT_HERSHEY_SIMPLEX,  # 字体
                                1,  # 字号
                                (255, 0, 255),  # 颜色
                                2)  # 字的线宽
                else:
                    cv2.rectangle(frame, (x, y), (x + w, y + h), color, thickness=2)

                    # 文字提示是谁
                    cv2.putText(frame, 'others',
                                (x + 30, y + 30),  # 坐标
                                cv2.FONT_HERSHEY_SIMPLEX,  # 字体
                                1,  # 字号
                                (255, 0, 255),  # 颜色
                                2)

        cv2.imshow("camera", frame)

        # 等待1毫秒看是否有按键输入
        k = cv2.waitKey(1)
        # 如果输入q则退出循环
        if k & 0xFF == ord('q'):
            break

    # 释放摄像头并销毁所有窗口
    camera.release()
    cv2.destroyAllWindows()

to sum up:

In general, you have to admire the power of Python, the simple and easy-to-understand code coupled with powerful third-party library support, giving Python more advantages in the direction of machine learning and deep learning. Of course Java is more efficient than Python when dealing with big data. (?)

Different languages ​​have their own advantages and disadvantages, I hope I can dabble more and keep learning, hahahahaha

 

Guess you like

Origin blog.csdn.net/gf19960103/article/details/91038858