A preliminary exploration of image recognition: looking for eyes

Recently, the teacher in the laboratory assigned the first simple assignment: give a picture and a picture of the eyes taken from the characters in the picture, and design an algorithm to find the position of the eyes in the complete picture.

First go to the intuitive effect,
enter the picture
Insert picture description here
Insert picture description here

Output picture
Insert picture description here

Input picture
Insert picture description here
Insert picture description here

Output picture
Insert picture description here

The prerequisite knowledge needed to complete this task (you only need to understand it without going deep) include the nature of vector multiplication and the concept of convolution .

The so-called convolution, in simple terms, is the matrix and the corresponding elements of the matrix are respectively multiplied ( note, not matrix multiplication ), generally there will be a smaller matrix as a filter, from the upper left corner of the larger picture to the lower right corner, calculation The convolution sum of each small matrix (the size of the small matrix is ​​the same as the filter matrix), where the point corresponding to the largest sum is the upper left corner of the matrix with the highest matching degree.

Why is the largest convolutional sum the best match? Think about it, the process of convolution is the multiplication of corresponding points in the matrix, which is the same as the result of expanding the matrix into a one-dimensional vector and then multiplying. It can be known from the property of vector multiplication that the maximum value of the multiplication can be obtained when two vectors are parallel, because the angle is 0 and cos0 = 1. Of course, two identical vectors are parallel, which is why the convolution sum is the largest.

However, it is still wrong. You may ask such a question. A large matrix can be divided into many small matrices. If the values ​​of the two matrices are very different from each other and the filter does not have much effect, what should I do at this time? What? The answer is simple, that is, for each small matrix in the large matrix, the average value of itself is subtracted before convolution with the filter matrix, so as to avoid the value gap being too large.

Since the average value of the small matrix in the big matrix has been subtracted, right?
Well, although it is not necessary to do this, it can make the calculated value relatively small.

official

After the above wave of analysis, plus removing the denominator, in fact, the final version is simply to find the correlation coefficient orz. The larger the correlation coefficient, the higher the matching degree.
Insert picture description here

At this point, the structure of the algorithm is out, and the code is very concise, as follows:

import matplotlib.pyplot as plt # plt 用于显示图片
import numpy as np
from PIL import Image

def find_piece_in_pic(whole_pic, part_pic) :
    #两个参数分别是两张图片的地址
    
    part = Image.open(part_pic)
    #转化为灰度图
    part = part.convert('L')
    part = np.array(part).astype('float64')

    whole = Image.open(whole_pic)
    whole = whole.convert('L')
    whole = np.array(whole).astype('float64')

    H, W = whole.shape
    h, w = part.shape

    part = part - int(np.average(part))

    res = np.zeros((whole.shape))

    for r in range(H - h + 1) :
        for c in range(W - w + 1) :
            cur_whole = whole[r : r + h, c : c + w]
            cur_whole = cur_whole - np.average(cur_whole)
            temp1 = (math.sqrt(np.sum(part * part)) * math.sqrt(np.sum(cur_whole * cur_whole)))
            if temp1 == 0 : continue
            temp = np.sum(cur_whole * part) / temp1
            res[r ,c] = temp

    topr, topc = np.where(res == np.max(res))
    print(topr, topc)
    print(np.max(res))

    plt.figure()
    plt.imshow(whole, cmap='gray')#灰度图要加上这个参数
    plt.gca().add_patch(plt.Rectangle((topc,topr), w, h, color='black'))
    plt.show()
    

find_piece_in_pic('einstain.png', 'eye.png')

Matlab function

Such a simple and highly applicable algorithm must of course be included in Matlab. It becomes extremely simple under Matlab because the functions are all encapsulated.
Still take Einstein and his charming big eyes as an example:

eye = rgb2gray(imread('eye.png'));
einstain = rgb2gray(imread('einstain.png'));
imshowpair(peppers,onion,'montage')

c = normxcorr2(eye,einstain); #就是这个函数
figure, surf(c), shading flat
[ypeak, xpeak] = find(c==max(c(:)));

yoffSet = ypeak-size(eye,1);
xoffSet = xpeak-size(eye,2);

figure
imshow(einstain);
imrect(gca, [xoffSet+1, yoffSet+1, size(eye,2), size(eye,1)]);

Guess you like

Origin blog.csdn.net/weixin_43867940/article/details/106162416