OpenCV-Python ID card information recognition
This article uses OpenCV-Python and CnOcr to realize the case of ID card information recognition. To identify the text information in the ID card, it is divided into three major steps: 1. Detection and search of the ID card area through preprocessing; 2. Extraction of the text information of the ID card; 3. Identification of the text information of the ID card. Let's take a look at the specific process of identification CnOcr official website . Identification process video
Pre-environment
The environment here needs to install OpenCV-Python, Numpy and CnOcr. The Python version used in this article is 3.6, and the OpenCV-Python version is 3.4.1.15. If you are a student of version 4.x, there may be some Api operations that are different. The installation and introduction of these dependencies, I will not go into details here, they are all installed using Pip.
identification process
First, import the required dependencies cv2, numpy, cnocr and create a show image function for later use:
import cv2
import numpy as np
from cnocr import CnOcr
def show(image, window_name):
cv2.namedWindow(window_name, 0)
cv2.imshow(window_name, image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# 加载CnOcr的模型
ocr = CnOcr(model_name='densenet_lite_136-gru')
ID area lookup
After a series of processing through the grayscale processing of the loaded image –> filter processing –> binary processing –> edge detection –> expansion processing –> contour search –> perspective transformation (correction) –> image rotation –> fixed image size, We can clearly cut out the specific area of the ID card.
The original image
Use OpenCV's imread method to read local images.
image = cv2.imread('card.png')
show(image, "image")
grayscale processing
Convert the three-channel BGR image to a grayscale image, because the following OpenCV operations need to be based on grayscale images.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
median filter
Use filtering, also known as blurring, to reduce unwanted noise.
blur = cv2.medianBlur(gray, 7)
show(blur, "blur")
binary processing
Binary processing, either black or white. Here, through cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU, OpenCV's Otsu method is used to binarize the image to process the image. The processed image can clearly distinguish the background and ID card area.
threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
edge detection
Use the most commonly used edge detection method in OpenCV, Canny, to detect the edges in the image.
canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")
edge swelling
In order to make the edge of the edge detection in the previous step more coherent, use expansion processing to expand the white edge, that is, the edge line becomes thicker.
kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")
contour detection
Use findContours to detect the contour of the image with edge expansion, you can clearly see that there are still a lot of noise in the background part, and the part of the ID card that needs to be recognized is also circled by the contour.
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")
Contour sorting
After sorting the areas of the contours, we can accurately extract the contours of the ID card.
contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")
perspective transformation
The four vertices of the contour are extracted by approximating the contour, and sorted in order, and then the perspective transformation is performed on the selected image area through warpPerspective, that is, the selected image is corrected.
epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)
# 生成变换矩阵
M = cv2.getPerspectiveTransform(p1, pts2)
# 进行透视变换
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")
fixed image size
To make the image positive, by judging the width and height of the image, if the width<height, rotate the image by 90°. And resize the image to the specified size. It is convenient to process the image later.
if w < h:
dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")
Detect ID card text position
After grayscale, binary filtering and opening and closing operations, the primary key of the text area in the image is revealed.
temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")
Extremely inflated
Given a relatively large convolution box, the expansion process is performed to deepen and enlarge the white area. Areas of text are more visible.
kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")
Contour Find Text Area
Use contour search to find out the white blocky area.
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")
Filter out the text area
After the contour detection in the previous step, we found that there are some noises in the selected contour. Through the observation of the image, use the approximate contour, and then use the following logic to filter out the text area. And define the text description information, and add the location information of the text area to the specified collection. At this point, it can be clearly seen that all the required text areas have been extracted.
labels = ['姓名', '性别', '民族', '出生年', '出生月', '出生日', '住址', '公民身份证号码']
positions = []
data_areas = {
}
resize_copy = resize.copy()
for contour in contours:
epsilon = 0.002 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
x, y, w, h = cv2.boundingRect(approx)
if h > 50 and x < 670:
res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
area = gray[y:(y + h), x:(x + w)]
blur = cv2.medianBlur(area, 3)
data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
positions.append((x, y))
data_areas['{}-{}'.format(x, y)] = data_area
show(res, "res")
Sort the text area
It is found that the text area is ordered from bottom to top, and the area of the x-axis from left to right is unordered, so use the following logic to sort the text area
positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
if positions[index + 1][1] - positions[index][1] < 10:
temp_list = [positions[index + 1], positions[index]]
for i in range(index + 1, len(positions)):
if positions[i + 1][1] - positions[i][1] < 10:
temp_list.append(positions[i + 1])
else:
break
temp_list.sort(key=lambda p: p[0])
positions[index:(index + len(temp_list))] = temp_list
index = index + len(temp_list) - 1
else:
index += 1
recognize text
Use CnOcr to recognize the text areas one by one, and finally output the recognition results.
for index in range(len(positions)):
position = positions[index]
data_area = data_areas['{}-{}'.format(position[0], position[1])]
ocr_data = ocr.ocr(data_area)
ocr_result = ''.join([''.join(result[0]) for result in ocr_data]).replace(' ', '')
# print('{}:{}'.format(labels[index], ocr_result))
result.append('{}:{}'.format(labels[index], ocr_result))
show(data_area, "data_area")
for item in result:
print(item)
show(res, "res")
epilogue
Through the above steps, the ID card information is successfully extracted, and some digital parameters in the process may be slightly adjusted in different scenarios.
Put all the code below:
the code
import cv2
import numpy as np
from cnocr import CnOcr
def show(image, window_name):
cv2.namedWindow(window_name, 0)
cv2.imshow(window_name, image)
# 0任意键终止窗口
cv2.waitKey(0)
cv2.destroyAllWindows()
ocr = CnOcr(model_name='densenet_lite_136-gru')
image = cv2.imread('card.png')
show(image, "image")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
blur = cv2.medianBlur(gray, 7)
show(blur, "blur")
threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")
kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")
contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")
epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)
M = cv2.getPerspectiveTransform(p1, pts2)
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")
if w < h:
dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")
temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")
kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")
labels = ['姓名', '性别', '民族', '出生年', '出生月', '出生日', '住址', '公民身份证号码']
positions = []
data_areas = {
}
resize_copy = resize.copy()
for contour in contours:
epsilon = 0.002 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
x, y, w, h = cv2.boundingRect(approx)
if h > 50 and x < 670:
res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
area = gray[y:(y + h), x:(x + w)]
blur = cv2.medianBlur(area, 3)
data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
positions.append((x, y))
data_areas['{}-{}'.format(x, y)] = data_area
show(res, "res")
positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
if positions[index + 1][1] - positions[index][1] < 10:
temp_list = [positions[index + 1], positions[index]]
for i in range(index + 1, len(positions)):
if positions[i + 1][1] - positions[i][1] < 10:
temp_list.append(positions[i + 1])
else:
break
temp_list.sort(key=lambda p: p[0])
positions[index:(index + len(temp_list))] = temp_list
index = index + len(temp_list) - 1
else:
index += 1
for index in range(len(positions)):
position = positions[index]
data_area = data_areas['{}-{}'.format(position[0], position[1])]
ocr_data = ocr.ocr(data_area)
ocr_result = ''.join([''.join(result[0]) for result in ocr_data]).replace(' ', '')
# print('{}:{}'.format(labels[index], ocr_result))
result.append('{}:{}'.format(labels[index], ocr_result))
show(data_area, "data_area")
for item in result:
print(item)
show(res, "res")