mac identification pattern codes using python

Foreword

  • Recent research related to the operation code verification, the recording preparation process and use mounted. Although some prior knowledge of the break code, but before and after are simple to use would not, and did not record a detailed process, so back up and then have to re-find information from the Internet is too much trouble, so here ready to study the key point of the process to make a record.

First of all this article is to study the CAPTCHA, the latter will expand the content from time to time.

Check the Internet a lot of versions of CAPTCHA recognition, see the most current two modules are pytesseract and tesserocr, but because of the installation of all kinds of mistakes I tesserocr here, so in the end I locked the use pytesseract.
Then the next, then record the installation and use. System environment here is mac os 10.14.

Installation tesserocr

brew install tesserocr

Because pytesseract depends on tesserocr so tesserocr first need to install the software. The next step is to install python related packages

Required installation package python

pip3 install pytesseract
pip3 install pillow 

Pytesseract is installed on the word recognition ocr picture, because the validation code identifying high and low degree of difficulty, so in this process need to do some processing on the picture, which requires the use of picture processing module pillow.

A simple demo

import pytesseract
from PIL import Image
import os


def binarizing(img, threshold):
    """传入image对象进行灰度、二值处理"""
    pixdata = img.load()
    w, h = img.size
    # 遍历所有像素,大于阈值的为黑色
    for y in range(h):
        for x in range(w):
            if pixdata[x, y] < threshold:
                pixdata[x, y] = 0
            else:
                pixdata[x, y] = 255
    return img


_temp = os.path.dirname(__file__)
file_path = os.path.join(_temp, 'code2.jpg')
print("file_path", file_path)
image = Image.open(file_path)
image = image.convert('L')
threshold = 157
table = []
# 接下来是二值化处理
# 遍历所有像素,大于阈值的为黑色,threshold是阀值
image = binarizing(image, threshold)
result = pytesseract.image_to_string(image)
print(result)

The example in the picture

You need to use the image of knowledge:

For color images, regardless of image format is PNG, or BMP, or JPG, in the PIL, a module open Image () function to open mode image of an object is returned "RGB". For grayscale images, regardless of image format is PNG, or BMP, or JPG, after opening, the pattern is "L" that is, we are talking about a graying of the operation. In addition, there are other models, but? When we deal with the verification code is to be converted to grayscale, so I will not emphasize the other modes.

Mode "L"

Mode "L" is a gray image, for each pixel with its eight bit, where 0 represents black and 255 represents white, other numbers represent different shades of gray. In the PIL, the pattern "RGB" is converted to "L" mode is converted according to the following formula:

L = R * 299/1000 + G * 587/1000+ B * 114/1000

After pictures of gray become

graying? We have to be binary operation

Binarization operation

Named Incredibles binarization, all the pixels is the entire image can be selected only two values, one is black (gradation is 0), a white (gradation is 255). The second is the value of the benefits of useful information on the picture and distinguish useless information, such as verification picture after binarization verification code pixel is black, white background and interference points, so the face of pixel processing code the time will be very convenient. For simple graphic codes, here basically enough, but if there is line interference, but also other interference line operation.
Corresponding code

def binarizing(img, threshold):
    """传入image对象进行灰度、二值处理"""
    pixdata = img.load()
    w, h = img.size
    # 遍历所有像素,大于阈值的为黑色
    for y in range(h):
        for x in range(w):
            if pixdata[x, y] < threshold:
                pixdata[x, y] = 0 #小于阀值设为0,0是黑色
            else:
                pixdata[x, y] = 255 0 #大于阀值设为255,255是白色
    return img

At this point the picture effect

you can see the picture become sharpened a lot, and this time go recognition is better identified.

To interfere with the line

Common Neighborhood 4, 8 neighborhood algorithm. The so-called X-neighborhood algorithm, reference may input the phone squares method, key 5 to be estimated for the pixel, the neighborhood is judged 4 up and down, is to determine the eight-neighbor pixels around eight. If the 4 or 8 point number 255 greater than a certain threshold value is determined as the point of this noise, the threshold value can be modified according to the actual situation.

Use cv2 processing

Cv2 may also be used in addition to the processing module.
installation

 pip install opencv-python

The sample code

# -*- coding: utf-8 -*-
# @时间 : 2020-01-08 18:01
# @作者 : 陈祥安
# @文件名 : cv2_demo.py
# @公众号: Python学习开发

import cv2
import numpy as np
import os

_temp = os.path.dirname(__file__)
file_path = os.path.join(_temp, 'code2.jpg')


def remove_noise(img, k=4):
    ###8领域过滤
    img2 = img.copy()

    #   img处理数据,k过滤条件
    w, h = img2.shape

    def get_neighbors(img3, r, c):
        count = 0
        for i in [r - 1, r, r + 1]:
            for j in [c - 1, c, c + 1]:
                if img3[i, j] > 10:  # 纯白色
                    count += 1
        return count

    #   两层for循环判断所有的点
    for x in range(w):
        for y in range(h):
            if x == 0 or y == 0 or x == w - 1 or y == h - 1:
                img2[x, y] = 255
            else:
                n = get_neighbors(img2, x, y)  # 获取邻居数量,纯白色的邻居
                if n > k:
                    img2[x, y] = 255
    return img2


img = cv2.imread(file_path)

# 将图片灰度化处理,降维,加权进行灰度化c
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
t, gray2 = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)
cv2.imshow('threshold', gray2)
result = remove_noise(gray2)
cv2.imshow('8neighbors', result)

cv2.waitKey(0)

#cv2.destroyAllWindows()

Reference material

https://www.jb51.net/article/141428.htm
https://blog.csdn.net/icamera0/article/details/50843172
https://www.jb51.net/article/174093.htm

Guess you like

Origin www.cnblogs.com/c-x-a/p/12168010.html
Recommended