Computer vision image processing + face recognition (code + practical demonstration)

Take you through the simplest process of image processing and face recognition


foreword

If NLP is the jewel in the crown of artificial intelligence, CV can be said to be the eyes of artificial intelligence. This article mainly introduces the principles and related theories of image processing and face recognition, and then demonstrates the process of image processing and face recognition through code operations.


1. Image processing

1. Image processing content

Image digitization;
image transformation;
Image enhancement;
Image restoration;
Image compression coding;
Image segmentation;
Image analysis and description;

        Image recognition and classification

Image digitization: Image digitization is to represent image information in digital form, that is, to encode the gray value or color value of each pixel in the image so that it can be recognized and processed by a computer.
Image transformation: Image transformation refers to the process of spatially transforming an image and transforming the image from one coordinate system to another. It can change the scale, rotation angle, position and other characteristics of the image.
Image Enhancement: Image enhancement refers to the process of improving image quality, increasing image contrast, and enhancing image details and colors to better meet application needs.
•Image Restoration: Image restoration refers to the process of restoring the original image from the damaged image. It can be processing such as filtering, denoising, anti-noise, etc., or it can enhance the quality of the image.
Image compression coding: Image compression coding refers to the process of expressing image information with fewer bytes to reduce image storage space and transmission time.
Image segmentation: Image segmentation refers to the process of dividing an image into different objects for subsequent processing. It can be based on features such as color, texture, and shape, or it can be based on deep learning algorithms.
Image analysis and description: Image analysis and description refers to the process of analyzing the image, extracting the features of the image, and describing the image as a series of numbers for subsequent processing.
Image recognition and classification: Image recognition and classification refers to the process of identifying the classification information of the image, that is, identifying the objects in the image for subsequent processing.

2. Common algorithms

Geometric transformation of images (image distortion correction, image scaling: bilinear interpolation, rotation, stitching)

Image Transformation (Fourier, Cosine, Walsh-Hadamard, KL Transform, Wavelet Transform)

Image frequency domain processing (enhancement algorithm: high frequency boosting, homomorphic filtering; smoothing and denoising: low-pass filtering)

Python computer vision (eight) - OpenCV for image enhancement - Programmer Sought

3. Image data enhancement

Image data augmentation is a technique that augments an original image dataset by performing a series of transformations on an existing image dataset to create a new image dataset. Its purpose is to allow machine learning models to better capture the complexity of images and generalize better to new datasets. Common image data enhancement techniques include rotation, scaling, cropping, inversion, color transformation, etc.

Image data augmentation in deep learning:

Color Jittering: data enhancement for color : image brightness, saturation, contrast changes
PCA Jittering: First calculate the mean and standard deviation according to the three color channels of RGB , then calculate the covariance matrix on the entire training set, perform eigendecomposition, and obtain eigenvectors and eigenvalues ​​for PCA Jittering
• Random Scale: scale transformation;
Random Crop: Use random image difference method to crop and zoom the image; including Scale Jittering method (used by VGG and ResNet models ) or scale and aspect ratio enhancement transformation;
Horizontal/ vertical Flip: horizontal and vertical flip;
• Shit: translation transformation;
Rotation/Reflection: rotation / affine transformation;
Noise: Gaussian noise, blurring;
Label Shuffle: Augmentation of category imbalanced data;

Image enhancement code demo:

# 图像增强算法,图像锐化算法
# 1)基于直方图均衡化 2)基于拉普拉斯算子 3)基于对数变换 4)基于伽马变换 5)CLAHE 6)retinex-SSR 7)retinex-MSR
# 其中,基于拉普拉斯算子的图像增强为利用空域卷积运算实现滤波
# 基于同一图像对比增强效果
# 直方图均衡化:对比度较低的图像适合使用直方图均衡化方法来增强图像细节
# 拉普拉斯算子可以增强局部的图像对比度
# log对数变换对于整体对比度偏低并且灰度值偏低的图像增强效果较好
# 伽马变换对于图像对比度偏低,并且整体亮度值偏高(对于相机过曝)情况下的图像增强效果明显

import cv2
import numpy as np
import matplotlib.pyplot as plt


# 直方图均衡增强
def hist(image):
    r, g, b = cv2.split(image)
    r1 = cv2.equalizeHist(r)
    g1 = cv2.equalizeHist(g)
    b1 = cv2.equalizeHist(b)
    image_equal_clo = cv2.merge([r1, g1, b1])
    return image_equal_clo


# 拉普拉斯算子
def laplacian(image):
    kernel = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
    image_lap = cv2.filter2D(image, cv2.CV_8UC3, kernel)
    return image_lap


# 对数变换
def log(image):
    image_log = np.uint8(np.log(np.array(image) + 1))
    cv2.normalize(image_log, image_log, 0, 255, cv2.NORM_MINMAX)
    # 转换成8bit图像显示
    cv2.convertScaleAbs(image_log, image_log)
    return image_log


# 伽马变换
def gamma(image):
    fgamma = 2
    image_gamma = np.uint8(np.power((np.array(image) / 255.0), fgamma) * 255.0)
    cv2.normalize(image_gamma, image_gamma, 0, 255, cv2.NORM_MINMAX)
    cv2.convertScaleAbs(image_gamma, image_gamma)
    return image_gamma


# 限制对比度自适应直方图均衡化CLAHE
def clahe(image):
    b, g, r = cv2.split(image)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    b = clahe.apply(b)
    g = clahe.apply(g)
    r = clahe.apply(r)
    image_clahe = cv2.merge([b, g, r])
    return image_clahe


def replaceZeroes(data):
    min_nonzero = min(data[np.nonzero(data)])
    data[data == 0] = min_nonzero
    return data


# retinex SSR
def SSR(src_img, size):
    L_blur = cv2.GaussianBlur(src_img, (size, size), 0)
    img = replaceZeroes(src_img)
    L_blur = replaceZeroes(L_blur)

    dst_Img = cv2.log(img/255.0)
    dst_Lblur = cv2.log(L_blur/255.0)
    dst_IxL = cv2.multiply(dst_Img, dst_Lblur)
    log_R = cv2.subtract(dst_Img, dst_IxL)

    dst_R = cv2.normalize(log_R,None, 0, 255, cv2.NORM_MINMAX)
    log_uint8 = cv2.convertScaleAbs(dst_R)
    return log_uint8


def SSR_image(image):
    size = 3
    b_gray, g_gray, r_gray = cv2.split(image)
    b_gray = SSR(b_gray, size)
    g_gray = SSR(g_gray, size)
    r_gray = SSR(r_gray, size)
    result = cv2.merge([b_gray, g_gray, r_gray])
    return result


# retinex MMR
def MSR(img, scales):
    weight = 1 / 3.0
    scales_size = len(scales)
    h, w = img.shape[:2]
    log_R = np.zeros((h, w), dtype=np.float32)

    for i in range(scales_size):
        img = replaceZeroes(img)
        L_blur = cv2.GaussianBlur(img, (scales[i], scales[i]), 0)
        L_blur = replaceZeroes(L_blur)
        dst_Img = cv2.log(img/255.0)
        dst_Lblur = cv2.log(L_blur/255.0)
        dst_Ixl = cv2.multiply(dst_Img, dst_Lblur)
        log_R += weight * cv2.subtract(dst_Img, dst_Ixl)

    dst_R = cv2.normalize(log_R,None, 0, 255, cv2.NORM_MINMAX)
    log_uint8 = cv2.convertScaleAbs(dst_R)
    return log_uint8


def MSR_image(image):
    scales = [15, 101, 301]  # [3,5,9]
    b_gray, g_gray, r_gray = cv2.split(image)
    b_gray = MSR(b_gray, scales)
    g_gray = MSR(g_gray, scales)
    r_gray = MSR(r_gray, scales)
    result = cv2.merge([b_gray, g_gray, r_gray])
    return result


if __name__ == "__main__":
    image = cv2.imread("example.jpg")
    image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    plt.subplot(4, 2, 1)
    plt.imshow(image)
    plt.axis('off')
    plt.title('Offical')

    # 直方图均衡增强
    image_equal_clo = hist(image)

    plt.subplot(4, 2, 2)
    plt.imshow(image_equal_clo)
    plt.axis('off')
    plt.title('equal_enhance')

    # 拉普拉斯算法增强
    image_lap = laplacian(image)

    plt.subplot(4, 2, 3)
    plt.imshow(image_lap)
    plt.axis('off')
    plt.title('laplacian_enhance')

    # LoG对象算法增强
    image_log = log(image)

    plt.subplot(4, 2, 4)
    plt.imshow(image_log)
    plt.axis('off')
    plt.title('log_enhance')

    # 伽马变换
    image_gamma = gamma(image)

    plt.subplot(4, 2, 5)
    plt.imshow(image_gamma)
    plt.axis('off')
    plt.title('gamma_enhance')

    # CLAHE
    image_clahe = clahe(image)

    plt.subplot(4, 2, 6)
    plt.imshow(image_clahe)
    plt.axis('off')
    plt.title('CLAHE')

    # retinex_ssr
    image_ssr = SSR_image(image)

    plt.subplot(4, 2, 7)
    plt.imshow(image_ssr)
    plt.axis('off')
    plt.title('SSR')

    # retinex_msr
    image_msr = MSR_image(image)

    plt.subplot(4, 2, 8)
    plt.imshow(image_msr)
    plt.axis('off')
    plt.title('MSR')

    plt.show()
Original image:

       Picture after processing:

 1) Based on histogram equalization 2) Based on Laplace operator 3) Based on logarithmic transformation 4) Based on gamma transformation 5) CLAHE 6) retinex-SSR 7) retinex-MSR

1. Histogram equalization: images with low contrast are suitable for using histogram equalization method to enhance image details

2. The Laplacian operator can enhance the local image contrast

3. Log logarithmic transformation is better for image enhancement with low overall contrast and low gray value

4. Gamma transformation has obvious image enhancement effect on low image contrast and high overall brightness value (for camera overexposure)

56. The effect of CLAHE and retinex is better

When using cv2.imread() to read the image in the output, the three-channel sequence of the default color image is B, G, R, which is just swapped with the R channel and B channel in RGB we are familiar with. However, when using the plt.imshow() function, the channel order of the image displayed by default is R, G, and B, which causes the image to appear blue in color.

2. Face recognition

1. Face recognition content

Face Detection ( Face Detection ): target recognition

·Face Alignment (Face Alignment) The coordinates of the key points of facial features on the face, the number is a preset fixed value, the common ones are 5 points, 68 points, 90 points , etc.

· Face Feature Extraction converts a face image into features that can represent the characteristics of the face, and the specific form is a series of fixed-length values . First, the coordinates of the key points of the facial features are rotated, scaled, etc. to achieve face alignment, and then the features are extracted and the value string is calculated.

Face Compare : The purpose of the algorithm is to measure the similarity between two faces. The input of the face comparison algorithm is two face features. The face features are obtained by the previous face feature extraction algorithm, and the output is the similarity between the two features .

Face matching : Face matching Face matching is given any two face images and judging whether the faces in the two face images belong to the same person.

Face matching process: identify the face in each image, find the position of the face , then extract the features of the face, and finally compare the distance between the features of different faces . If the distance is less than a certain threshold, it is considered to be The same face, otherwise it is considered to be a different face.

Face matching mode:

1 : 1 authentication comparison

1 : N is 1 face for comparison with N faces in the bottom library . For example, in the attendance machine, our face database contains all face photos of the whole company

 

2. Compare the core process


The first step is face detection, the second step is face alignment, and the third step is feature extraction. This is the three steps that must be done for each photo. When it is time to compare, compare the extracted features. , and then determine whether the two faces belong to the same person.

Face Detection:

The classifier refers to looking at each sliding position of the sliding window to judge whether it is a face, and putting the sliding window into the regression model can help correct the accuracy of face detection. The input is a sliding window. If there is a face in the output, which side should be corrected and how much it needs to be corrected, so Δx, Δy, Δw, Δh are its coordinates and how much its width and height should be corrected. Finally, it can Find faces more accurately 

Face feature extraction algorithm: The previous traditional method is the so-called local texture model , global texture model , shape regression model and the like. What is more popular now is to use deep convolutional neural networks or cyclic neural networks, or convolutional neural networks with 3DMM parameters . The so-called 3DMM parameter means that there is three-dimensional information in it, and then there is a cascaded deep neural network.

The method of face comparison:

The mainstream method is the deep method, that is, the deep convolutional neural network. Generally speaking, this network uses DCNN to replace the previous feature extraction methods, that is, some different features on a picture and a face Find out, there are many parameters in DCNN , this parameter is learned, not told to him by people, if learned, it is better than those summed up by people. Then the obtained set of features may generally have dimensions of 128 dimensions, 256 dimensions, or 512 dimensions, or 1024 dimensions , and then compare them . To judge the distance between feature vectors , Euclidean distance or cosine similarity is generally used . The evaluation indicators for face comparison are also divided into speed and accuracy. Speed ​​includes the calculation time of a single face feature vector and the comparison speed. Accuracy includes ACC and ROC .

Ordinary comparison is a simple operation, which is the distance between two points. It may only need to do one inner product , which is the inner product of two vectors, but when face recognition encounters 1 : N comparison, when that When the N library is very large, when you get a photo and go to the N library to search, the number of searches will be very large. For example, if the N library is 1 million, you may need to search a million times. It is equivalent to doing one million comparisons. At this time, there is still a requirement for the total time, so there will be various technologies to speed up this comparison.

3. Algorithms

dlib:

facenet, arcface, etc. 

• The basic idea of ​​Facenet is to use the same face to have high cohesion in photos of different angles and poses , and different faces to have low coupling . It is proposed to use cnn + triplet mining method
• The main idea of ​​Arcface is to use the corner margin to maximize the distance between classes and minimize the distance within classes . Use the arc-cosine function to calculate the angle between the feature and the target weight, then add an angle margin penalty m to the target angle, and regain the target logit through the cosine function Finally, use a fixed feature norm to redetermine all logic, and then completely Process according to softmax loss .

3. Code and practical display 

The code is implemented based on the face_recognition library, which is relatively simple. When installing this library, pay attention to the dlib library and opencv-python, and then the cmake library. There are also some pitfalls during installation. It is worth noting that the library is 32 Bit and 64-bit python are more sensitive, and then pay attention to the environment configuration

Key point identification:

from PIL import Image, ImageDraw
import face_recognition

# Load the jpg file into a numpy array
image = face_recognition.load_image_file("biden.jpg")

# Find all facial features in all the faces in the image
face_landmarks_list = face_recognition.face_landmarks(image)

print("I found {} face(s) in this photograph.".format(len(face_landmarks_list)))

# Create a PIL imagedraw object so we can draw on the picture
pil_image = Image.fromarray(image)
d = ImageDraw.Draw(pil_image)

for face_landmarks in face_landmarks_list:

    # Print the location of each facial feature in this image
    for facial_feature in face_landmarks.keys():
        print("The {} in this face has the following points: {}".format(facial_feature, face_landmarks[facial_feature]))

    # Let's trace out each facial feature in the image with a line!
    for facial_feature in face_landmarks.keys():
        d.line(face_landmarks[facial_feature], width=5)

# Show the picture
pil_image.show()

Recognition effect:

输出的关键点坐标:I found 1 face(s) in this photograph.
The chin in this face has the following points: [(182, 120), (184, 135), (187, 150), (191, 165), (197, 178), (207, 189), (219, 198), (230, 205), (243, 205), (255, 201), (264, 191), (272, 179), (278, 167), (281, 153), (281, 140), (281, 126), (280, 113)]
The left_eyebrow in this face has the following points: [(194, 112), (199, 105), (208, 103), (218, 104), (226, 108)]
The right_eyebrow in this face has the following points: [(241, 107), (249, 103), (257, 101), (266, 101), (272, 107)]
The nose_bridge in this face has the following points: [(235, 119), (236, 128), (237, 137), (238, 146)]
The nose_tip in this face has the following points: [(227, 152), (233, 153), (238, 154), (244, 152), (248, 150)]
The left_eye in this face has the following points: [(205, 122), (210, 119), (216, 120), (223, 124), (217, 125), (210, 125)]
The right_eye in this face has the following points: [(247, 122), (252, 117), (258, 116), (264, 118), (259, 121), (253, 122)]
The top_lip in this face has the following points: [(215, 169), (223, 166), (233, 164), (239, 165), (245, 163), (254, 163), (262, 165), (259, 166), (246, 166), (239, 167), (233, 167), (217, 169)]
The bottom_lip in this face has the following points: [(262, 165), (256, 179), (247, 186), (240, 187), (234, 187), (223, 182), (215, 169), (217, 169), (233, 181), (240, 181), (246, 180), (259, 166)]

There are nine key points in total.

Face makeup:

 easy makeup

from PIL import Image, ImageDraw
import face_recognition

# Load the jpg file into a numpy array
image = face_recognition.load_image_file("biden.jpg")

# Find all facial features in all the faces in the image
face_landmarks_list = face_recognition.face_landmarks(image)

pil_image = Image.fromarray(image)
for face_landmarks in face_landmarks_list:
    d = ImageDraw.Draw(pil_image, 'RGBA')

    # Make the eyebrows into a nightmare
    d.polygon(face_landmarks['left_eyebrow'], fill=(68, 54, 39, 128))
    d.polygon(face_landmarks['right_eyebrow'], fill=(68, 54, 39, 128))
    d.line(face_landmarks['left_eyebrow'], fill=(68, 54, 39, 150), width=5)
    d.line(face_landmarks['right_eyebrow'], fill=(68, 54, 39, 150), width=5)

    # Gloss the lips
    d.polygon(face_landmarks['top_lip'], fill=(150, 0, 0, 128))
    d.polygon(face_landmarks['bottom_lip'], fill=(150, 0, 0, 128))
    d.line(face_landmarks['top_lip'], fill=(150, 0, 0, 64), width=8)
    d.line(face_landmarks['bottom_lip'], fill=(150, 0, 0, 64), width=8)

    # Sparkle the eyes
    d.polygon(face_landmarks['left_eye'], fill=(255, 255, 255, 30))
    d.polygon(face_landmarks['right_eye'], fill=(255, 255, 255, 30))

    # Apply some eyeliner
    d.line(face_landmarks['left_eye'] + [face_landmarks['left_eye'][0]], fill=(0, 0, 0, 110), width=6)
    d.line(face_landmarks['right_eye'] + [face_landmarks['right_eye'][0]], fill=(0, 0, 0, 110), width=6)

    pil_image.show()

The effect is as follows:

(a bit ex...)

Face comparison:

 The simplest one-to-N comparison only needs a few pictures in the same directory

import face_recognition

# Load the jpg files into numpy arrays
biden_image = face_recognition.load_image_file("biden.jpg")
obama_image = face_recognition.load_image_file("obama.jpg")
unknown_image = face_recognition.load_image_file("ikun.jpg")

# Get the face encodings for each face in each image file
# Since there could be more than one face in each image, it returns a list of encodings.
# But since I know each image only has one face, I only care about the first encoding in each image, so I grab index 0.
try:
    biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
    obama_face_encoding = face_recognition.face_encodings(obama_image)[0]
    unknown_face_encoding = face_recognition.face_encodings(unknown_image)[0]
except IndexError:
    print("I wasn't able to locate any faces in at least one of the images. Check the image files. Aborting...")
    quit()

known_faces = [
    biden_face_encoding,
    obama_face_encoding
]

# results is an array of True/False telling if the unknown face matched anyone in the known_faces array
results = face_recognition.compare_faces(known_faces, unknown_face_encoding)

print("这张照片里的人是obama吗? {}".format(results[0]))
print("这张照片里的人是biden吗? {}".format(results[1]))
print("这个人是没见过的吗? {}".format(not True in results))

Summarize

The above is the content of simple image processing and face recognition. The code and content are relatively basic and can be used for getting started.

Guess you like

Origin blog.csdn.net/weixin_46451009/article/details/128938606