OpenCV-Python ID card information recognition

This article uses OpenCV-Python and CnOcr to realize the case of ID card information recognition. To identify the text information in the ID card, it is divided into three major steps: 1. Detection and search of the ID card area through preprocessing; 2. Extraction of the text information of the ID card; 3. Identification of the text information of the ID card. Let's take a look at the specific process of identification CnOcr official website . Identification process video

Pre-environment

The environment here needs to install OpenCV-Python, Numpy and CnOcr. The Python version used in this article is 3.6, and the OpenCV-Python version is 3.4.1.15. If you are a student of version 4.x, there may be some Api operations that are different. The installation and introduction of these dependencies, I will not go into details here, they are all installed using Pip.

identification process

First, import the required dependencies cv2, numpy, cnocr and create a show image function for later use:

import cv2
import numpy as np
from cnocr import CnOcr


def show(image, window_name):
    cv2.namedWindow(window_name, 0)
    cv2.imshow(window_name, image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    
# 加载CnOcr的模型
ocr = CnOcr(model_name='densenet_lite_136-gru')

ID area lookup

After a series of processing through the grayscale processing of the loaded image –> filter processing –> binary processing –> edge detection –> expansion processing –> contour search –> perspective transformation (correction) –> image rotation –> fixed image size, We can clearly cut out the specific area of the ID card.

The original image

Use OpenCV's imread method to read local images.

image = cv2.imread('card.png')
show(image, "image")

insert image description here

grayscale processing

Convert the three-channel BGR image to a grayscale image, because the following OpenCV operations need to be based on grayscale images.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")

insert image description here

median filter

Use filtering, also known as blurring, to reduce unwanted noise.

blur = cv2.medianBlur(gray, 7)
show(blur, "blur")

insert image description here

binary processing

Binary processing, either black or white. Here, through cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU, OpenCV's Otsu method is used to binarize the image to process the image. The processed image can clearly distinguish the background and ID card area.

threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")

insert image description here

edge detection

Use the most commonly used edge detection method in OpenCV, Canny, to detect the edges in the image.

canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")

insert image description here

edge swelling

In order to make the edge of the edge detection in the previous step more coherent, use expansion processing to expand the white edge, that is, the edge line becomes thicker.

kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")

insert image description here

contour detection

Use findContours to detect the contour of the image with edge expansion, you can clearly see that there are still a lot of noise in the background part, and the part of the ID card that needs to be recognized is also circled by the contour.

binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")

insert image description here

Contour sorting

After sorting the areas of the contours, we can accurately extract the contours of the ID card.

contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")

insert image description here

perspective transformation

The four vertices of the contour are extracted by approximating the contour, and sorted in order, and then the perspective transformation is performed on the selected image area through warpPerspective, that is, the selected image is corrected.

epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
    n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)

# 生成变换矩阵
M = cv2.getPerspectiveTransform(p1, pts2)
# 进行透视变换
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")

insert image description here

fixed image size

To make the image positive, by judging the width and height of the image, if the width<height, rotate the image by 90°. And resize the image to the specified size. It is convenient to process the image later.

if w < h:
    dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")

insert image description here

Detect ID card text position

After grayscale, binary filtering and opening and closing operations, the primary key of the text area in the image is revealed.

temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")

insert image description here

Extremely inflated

Given a relatively large convolution box, the expansion process is performed to deepen and enlarge the white area. Areas of text are more visible.

kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")

insert image description here

Contour Find Text Area

Use contour search to find out the white blocky area.

binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")

insert image description here

Filter out the text area

After the contour detection in the previous step, we found that there are some noises in the selected contour. Through the observation of the image, use the approximate contour, and then use the following logic to filter out the text area. And define the text description information, and add the location information of the text area to the specified collection. At this point, it can be clearly seen that all the required text areas have been extracted.

labels = ['姓名', '性别', '民族', '出生年', '出生月', '出生日', '住址', '公民身份证号码']
positions = []
data_areas = {
    
    }
resize_copy = resize.copy()
for contour in contours:
    epsilon = 0.002 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    x, y, w, h = cv2.boundingRect(approx)
    if h > 50 and x < 670:
        res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
        area = gray[y:(y + h), x:(x + w)]
        blur = cv2.medianBlur(area, 3)
        data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
        positions.append((x, y))
        data_areas['{}-{}'.format(x, y)] = data_area

show(res, "res")

insert image description here

Sort the text area

It is found that the text area is ordered from bottom to top, and the area of the x-axis from left to right is unordered, so use the following logic to sort the text area

positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
    if positions[index + 1][1] - positions[index][1] < 10:
        temp_list = [positions[index + 1], positions[index]]
        for i in range(index + 1, len(positions)):
            if positions[i + 1][1] - positions[i][1] < 10:
                temp_list.append(positions[i + 1])
            else:
                break
        temp_list.sort(key=lambda p: p[0])
        positions[index:(index + len(temp_list))] = temp_list
        index = index + len(temp_list) - 1
    else:
        index += 1

recognize text

Use CnOcr to recognize the text areas one by one, and finally output the recognition results.

for index in range(len(positions)):
    position = positions[index]
    data_area = data_areas['{}-{}'.format(position[0], position[1])]
    ocr_data = ocr.ocr(data_area)
    ocr_result = ''.join([''.join(result[0]) for result in ocr_data]).replace(' ', '')
    # print('{}：{}'.format(labels[index], ocr_result))
    result.append('{}：{}'.format(labels[index], ocr_result))
    show(data_area, "data_area")

for item in result:
    print(item)
show(res, "res")

insert image description here

epilogue

Through the above steps, the ID card information is successfully extracted, and some digital parameters in the process may be slightly adjusted in different scenarios.
Put all the code below:

the code

import cv2
import numpy as np
from cnocr import CnOcr

def show(image, window_name):
    cv2.namedWindow(window_name, 0)
    cv2.imshow(window_name, image)
    # 0任意键终止窗口
    cv2.waitKey(0)
    cv2.destroyAllWindows()


ocr = CnOcr(model_name='densenet_lite_136-gru')

image = cv2.imread('card.png')
show(image, "image")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
blur = cv2.medianBlur(gray, 7)
show(blur, "blur")
threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")
kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")
contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")
epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
    n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)

M = cv2.getPerspectiveTransform(p1, pts2)
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")
if w < h:
    dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")
temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")
kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")
labels = ['姓名', '性别', '民族', '出生年', '出生月', '出生日', '住址', '公民身份证号码']
positions = []
data_areas = {
    
    }
resize_copy = resize.copy()
for contour in contours:
    epsilon = 0.002 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    x, y, w, h = cv2.boundingRect(approx)
    if h > 50 and x < 670:
        res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
        area = gray[y:(y + h), x:(x + w)]
        blur = cv2.medianBlur(area, 3)
        data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
        positions.append((x, y))
        data_areas['{}-{}'.format(x, y)] = data_area

show(res, "res")

positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
    if positions[index + 1][1] - positions[index][1] < 10:
        temp_list = [positions[index + 1], positions[index]]
        for i in range(index + 1, len(positions)):
            if positions[i + 1][1] - positions[i][1] < 10:
                temp_list.append(positions[i + 1])
            else:
                break
        temp_list.sort(key=lambda p: p[0])
        positions[index:(index + len(temp_list))] = temp_list
        index = index + len(temp_list) - 1
    else:
        index += 1
for index in range(len(positions)):
    position = positions[index]
    data_area = data_areas['{}-{}'.format(position[0], position[1])]
    ocr_data = ocr.ocr(data_area)
    ocr_result = ''.join([''.join(result[0]) for result in ocr_data]).replace(' ', '')
    # print('{}：{}'.format(labels[index], ocr_result))
    result.append('{}：{}'.format(labels[index], ocr_result))
    show(data_area, "data_area")

for item in result:
    print(item)
show(res, "res")