Python teaching | Python verification code identification

General introduction

When the python crawler crawls the verification codes of some websites, you may encounter the problem of verification code recognition. Most of the current verification codes are divided into four categories:

1. Calculate verification code

2. Slider verification code

3. Image recognition verification code

4. Voice verification code

This blog mainly writes about image recognition verification codes. It recognizes simple verification codes. If you want to make the recognition rate higher and the recognition more accurate, you need to spend a lot of energy to train your own font library.

Identifying the verification code usually involves these steps:

1. Grayscale processing

2. Binarization

3. Remove borders (if any)

4. Noise reduction

5. Cutting characters or inclination correction

6. Training font library

7. Identify

The first three steps among these 6 steps are basic. You can choose whether 4 or 5 is needed according to the actual situation. It does not necessarily mean cutting the verification code. The recognition rate will increase a lot and sometimes it will decrease.

This blog does not cover the training font library, please search for it yourself. It also doesn’t explain basic grammar.

Several main python libraries used: Pillow (python image processing library), OpenCV (advanced image processing library), pytesseract (recognition library)

Grayscale processing & binarization

Grayscale processing is to convert the colored verification code image into a gray image.

Binarization is to process the image into a black and white image, which is beneficial to subsequent image processing and recognition.

There are ready-made methods in OpenCV for grayscale processing and binarization. The result after processing:

Code:

 1 # 自适应阀值二值化
 2 def \_get\_dynamic\_binary\_image(filedir, img\_name): 3   filename =   './out\_img/' + img\_name.split('.')\[0\] + '\-binary.jpg'
 4   img\_name = filedir + '/' + img\_name 5   print('.....' + img\_name) 6   im = cv2.imread(img\_name) 7   im = cv2.cvtColor(im,cv2.COLOR\_BGR2GRAY) #灰值化
 8   # 二值化
 9   th1 = cv2.adaptiveThreshold(im, 255, cv2.ADAPTIVE\_THRESH\_GAUSSIAN\_C, cv2.THRESH\_BINARY, 21, 1)
10 cv2.imwrite(filename,th1)
11   return th1

remove borders

If the verification code has a border, then we need to remove the border. To remove the border, we need to traverse the pixels, find all the points on the four borders, and change them to white. The border here is two pixels wide.

Note: When using OpenCV, the matrix points of the picture are reversed, that is, the length and width are reversed.

Code:

# 去除边框
def clear\_border(img,img\_name):
  filename \= './out\_img/' + img\_name.split('.')\[0\] + '\-clearBorder.jpg'
  h, w \= img.shape\[:2\]
  for y in range(0, w):
    for x in range(0, h):
      if y < 2 or y > w - 2:
        img\[x, y\] \= 255
      if x < 2 or x > h -2:
        img\[x, y\] \= 255

  cv2.imwrite(filename,img)
  return img

Effect:

Noise reduction

Noise reduction is an important step in verification code processing. I used point noise reduction and line noise reduction here.

The idea of ​​line noise reduction is to detect the four points adjacent to this point (the green points marked in the picture), and determine the number of white points among these four points. If there are more than two white pixels, then it is considered This point is white, thereby removing the entire interference line. However, this method has limitations. If the interference line is particularly thick, there is no way to remove it. Only thin interference lines can be removed.

Code:

 1 # 干扰线降噪
 2 def interference\_line(img, img\_name): 3   filename =  './out\_img/' + img\_name.split('.')\[0\] + '\-interferenceline.jpg'
 4   h, w = img.shape\[:2\]
 5   # !!!opencv矩阵点是反的
 6   # img\[1,2\] 1:图片的高度,2:图片的宽度
 7   for y in range(1, w - 1):
 8     for x in range(1, h - 1):
 9       count = 0
10       if img\[x, y - 1\] > 245:
11         count = count + 1
12       if img\[x, y + 1\] > 245:
13         count = count + 1
14       if img\[x - 1, y\] > 245:
15         count = count + 1
16       if img\[x + 1, y\] > 245:
17         count = count + 1
18       if count > 2:
19         img\[x, y\] = 255
20 cv2.imwrite(filename,img)
21   return img

The idea of ​​point noise reduction is similar to that of line noise reduction, except that the points detected are different for different positions. The comments are very clear.

Code:

# 点降噪
def interference\_point(img,img\_name, x = 0, y = 0):
    """
    9邻域框,以当前点为中心的田字框,黑点个数
    :param x:
    :param y:
    :return:
    """
    filename \=  './out\_img/' + img\_name.split('.')\[0\] + '\-interferencePoint.jpg'
    # todo 判断图片的长宽度下限
    cur\_pixel = img\[x,y\]# 当前像素点的值
    height,width = img.shape\[:2\]

    for y in range(0, width - 1):
      for x in range(0, height - 1):
        if y == 0:  # 第一行
            if x == 0:  # 左上顶点,4邻域
                # 中心点旁边3个点
                sum = int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x + 1, y + 1\])
                if sum <= 2 \* 245:
                  img\[x, y\] \= 0
            elif x == height - 1:  # 右上顶点
                sum = int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x - 1, y\]) \\
                      \+ int(img\[x - 1, y + 1\])
                if sum <= 2 \* 245:
                  img\[x, y\] \= 0
            else:  # 最上非顶点,6邻域
                sum = int(img\[x - 1, y\]) \\
                      \+ int(img\[x - 1, y + 1\]) \\
                      + int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x + 1, y + 1\])
                if sum <= 3 \* 245:
                  img\[x, y\] \= 0
        elif y == width - 1:  # 最下面一行
            if x == 0:  # 左下顶点
                # 中心点旁边3个点
                sum = int(cur\_pixel) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x + 1, y - 1\]) \\
                      \+ int(img\[x, y - 1\])
                if sum <= 2 \* 245:
                  img\[x, y\] \= 0
            elif x == height - 1:  # 右下顶点
                sum = int(cur\_pixel) \\
                      \+ int(img\[x, y - 1\]) \\
                      \+ int(img\[x - 1, y\]) \\
                      \+ int(img\[x - 1, y - 1\])

                if sum <= 2 \* 245:
                  img\[x, y\] \= 0
            else:  # 最下非顶点,6邻域
                sum = int(cur\_pixel) \\
                      \+ int(img\[x - 1, y\]) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x, y - 1\]) \\
                      \+ int(img\[x - 1, y - 1\]) \\
                      \+ int(img\[x + 1, y - 1\])
                if sum <= 3 \* 245:
                  img\[x, y\] \= 0
        else:  # y不在边界
            if x == 0:  # 左边非顶点
                sum = int(img\[x, y - 1\]) \\
                      + int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x + 1, y - 1\]) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x + 1, y + 1\])

                if sum <= 3 \* 245:
                  img\[x, y\] \= 0
            elif x == height - 1:  # 右边非顶点
                sum = int(img\[x, y - 1\]) \\
                      + int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x - 1, y - 1\]) \\
                      \+ int(img\[x - 1, y\]) \\
                      \+ int(img\[x - 1, y + 1\])

                if sum <= 3 \* 245:
                  img\[x, y\] \= 0
            else:  # 具备9领域条件的
                sum = int(img\[x - 1, y - 1\]) \\
                      \+ int(img\[x - 1, y\]) \\
                      \+ int(img\[x - 1, y + 1\]) \\
                      \+ int(img\[x, y - 1\]) \\
                      + int(cur\_pixel) \\
                      \+ int(img\[x, y + 1\]) \\
                      \+ int(img\[x + 1, y - 1\]) \\
                      \+ int(img\[x + 1, y\]) \\
                      \+ int(img\[x + 1, y + 1\])
                if sum <= 4 \* 245:
                  img\[x, y\] \= 0
    cv2.imwrite(filename,img)
    return img

Effect:

In fact, at this step, these characters can be recognized, and there is no need to perform character cutting. Now the recognition rate of these three types of verification codes has reached more than 50%.

Character cutting

Character cutting is usually used for stuck characters in the verification code. The stuck characters are difficult to identify, so we need to cut the stuck characters into single characters for identification.

The idea of ​​character cutting is to find a black point, and then traverse the black points adjacent to it until all connected black points have been traversed, and find the highest point, the lowest point, and the highest point among these points. The right point and the leftmost point are recorded. These four points are considered to be a character, and then the points are traversed backward until the black point is found, and the above steps are continued. Finally cut through four points of each character

The red points in the picture are the four points of each character marked after the code is executed, and then cutting is performed based on these four points (there are some errors in the picture, just understand)

But you can also see that m2 is glued, and the code thinks it is one character, so we need to detect the width of each character. If its width is too wide, we think it is two characters glued together. and cut it from the middle

Determine the four dot codes for each character:

def cfs(im,x\_fd,y\_fd):
  '''用队列和集合记录遍历过的像素坐标代替单纯递归以解决cfs访问过深问题
  '''

  # print('\*\*\*\*\*\*\*\*\*\*')
  xaxis\=\[\]
  yaxis\=\[\]
  visited \=set()
  q \= Queue()
  q.put((x\_fd, y\_fd))
  visited.add((x\_fd, y\_fd))
  offsets\=\[(1, 0), (0, 1), (-1, 0), (0, -1)\]#四邻域

  while not q.empty():
      x,y\=q.get()

      for xoffset,yoffset in offsets:
          x\_neighbor,y\_neighbor \= x+xoffset,y+yoffset

          if (x\_neighbor,y\_neighbor) in (visited):
              continue  # 已经访问过了
          visited.add((x\_neighbor, y\_neighbor))

          try:
              if im\[x\_neighbor, y\_neighbor\] == 0:
                  xaxis.append(x\_neighbor)
                  yaxis.append(y\_neighbor)
                  q.put((x\_neighbor,y\_neighbor))

          except IndexError:
              pass
  # print(xaxis)
  if (len(xaxis) == 0 | len(yaxis) == 0):
    xmax \= x\_fd + 1
    xmin \= x\_fd
    ymax \= y\_fd + 1
    ymin \= y\_fd

  else:
    xmax \= max(xaxis)
    xmin \= min(xaxis)
    ymax \= max(yaxis)
    ymin \= min(yaxis)
    #ymin,ymax=sort(yaxis)

  return ymax,ymin,xmax,xmin

def detectFgPix(im,xmax):
  '''搜索区块起点
  '''

  h,w \= im.shape\[:2\]
  for y\_fd in range(xmax+1,w):
      for x\_fd in range(h):
          if im\[x\_fd,y\_fd\] == 0:
              return x\_fd,y\_fd

def CFS(im):
  '''切割字符位置
  '''

  zoneL\=\[\]#各区块长度L列表
  zoneWB=\[\]#各区块的X轴\[起始,终点\]列表
  zoneHB=\[\]#各区块的Y轴\[起始,终点\]列表
  xmax\=0#上一区块结束黑点横坐标,这里是初始化
  for i in range(10):

      try:
          x\_fd,y\_fd \= detectFgPix(im,xmax)
          # print(y\_fd,x\_fd)
          xmax,xmin,ymax,ymin=cfs(im,x\_fd,y\_fd)
          L \= xmax - xmin
          H \= ymax - ymin
          zoneL.append(L)
          zoneWB.append(\[xmin,xmax\])
          zoneHB.append(\[ymin,ymax\])

      except TypeError:
          return zoneL,zoneWB,zoneHB

  return zoneL,zoneWB,zoneHB

Split sticky character codes:

      # 切割的位置
      im\_position = CFS(im)

      maxL \= max(im\_position\[0\])
      minL \= min(im\_position\[0\])

      # 如果有粘连字符,如果一个字符的长度过长就认为是粘连字符,并从中间进行切割
      if(maxL > minL + minL \* 0.7):
        maxL\_index \= im\_position\[0\].index(maxL)
        minL\_index \= im\_position\[0\].index(minL)
        # 设置字符的宽度
        im\_position\[0\]\[maxL\_index\] = maxL // 2
        im\_position\[0\].insert(maxL\_index \+ 1, maxL // 2)
        # 设置字符X轴\[起始,终点\]位置
        im\_position\[1\]\[maxL\_index\]\[1\] = im\_position\[1\]\[maxL\_index\]\[0\] + maxL // 2
        im\_position\[1\].insert(maxL\_index + 1, \[im\_position\[1\]\[maxL\_index\]\[1\] + 1, im\_position\[1\]\[maxL\_index\]\[1\] + 1 + maxL // 2\])
        # 设置字符的Y轴\[起始,终点\]位置
        im\_position\[2\].insert(maxL\_index + 1, im\_position\[2\]\[maxL\_index\])

      # 切割字符,要想切得好就得配置参数,通常 1 or 2 就可以
      cutting\_img(im,im\_position,img\_name,1,1)

Cutting sticky character codes:

def cutting\_img(im,im\_position,img,xoffset = 1,yoffset = 1):
  filename \=  './out\_img/' + img.split('.')\[0\]
  # 识别出的字符个数
  im\_number = len(im\_position\[1\])
  # 切割字符
  for i in range(im\_number):
    im\_start\_X \= im\_position\[1\]\[i\]\[0\] - xoffset
    im\_end\_X \= im\_position\[1\]\[i\]\[1\] + xoffset
    im\_start\_Y \= im\_position\[2\]\[i\]\[0\] - yoffset
    im\_end\_Y \= im\_position\[2\]\[i\]\[1\] + yoffset
    cropped \= im\[im\_start\_Y:im\_end\_Y, im\_start\_X:im\_end\_X\]
    cv2.imwrite(filename \+ '\-cutting-' + str(i) + '.jpg',cropped)

Effect:

identify

The identification uses the typeseract library. It mainly identifies the parameter settings for identifying a line of characters and a single character. It also identifies the parameter settings for Chinese and English. The code is very simple and only one line. Most of the operations here are filter files.

Code:

      # 识别验证码
      cutting\_img\_num = 0
      for file in os.listdir('./out\_img'):
        str\_img \= ''
        if fnmatch(file, '%s-cutting-\*.jpg' % img\_name.split('.')\[0\]):
          cutting\_img\_num += 1
      for i in range(cutting\_img\_num):
        try:
          file \= './out\_img/%s-cutting-%s.jpg' % (img\_name.split('.')\[0\], i)
          # 识别字符
          str\_img = str\_img + image\_to\_string(Image.open(file),lang = 'eng', config='\-psm 10') #单个字符是10,一行文本是7
        except Exception as err:
          pass
      print('切图:%s' % cutting\_img\_num)
      print('识别为:%s' % str\_img)

Finally, the recognition rate of this kind of glued characters is about 30%, and this method only deals with the glueing of two characters. If there are more than two characters glued together, it cannot be recognized, but it is not difficult to judge based on the character width. If you are interested, you can try it

The effect of character recognition without cutting:

The recognition effect of characters that need to be cut:

This kind of code can only recognize simple verification codes. Complex verification codes depend on everyone.

Instructions:

1. Put the verification code image to be recognized into the img folder at the same level as the script, and create the out_img folder    
2. python3 filename
3. Images at various stages such as binarization and noise reduction will be stored in the out_img folder. Finally The recognition results will be printed on the screen

Finally, the source code is attached (with cutting, if you don’t want cutting, just modify it yourself):

  1 from PIL import Image  2 from pytesseract import \*
  3 from fnmatch import fnmatch  4 from queue import Queue  5 import matplotlib.pyplot as plt  6 import cv2  7 import time  8 import os  9 
 10 
 11 
 12 
 13 
 14 def clear\_border(img,img\_name): 15   '''去除边框
 16   '''
 17 
 18   filename = './out\_img/' + img\_name.split('.')\[0\] + '\-clearBorder.jpg'
 19   h, w = img.shape\[:2\]
 20   for y in range(0, w): 21     for x in range(0, h): 22       # if y ==0 or y == w -1 or y == w - 2:
 23       if y < 4 or y > w -4:
 24         img\[x, y\] = 255
 25       # if x == 0 or x == h - 1 or x == h - 2:
 26       if x < 4 or x > h - 4:
 27         img\[x, y\] = 255
 28 
 29   cv2.imwrite(filename,img)
 30   return img 31 
 32 
 33 def interference\_line(img, img\_name): 34   '''
 35   干扰线降噪
 36   '''
 37 
 38   filename =  './out\_img/' + img\_name.split('.')\[0\] + '\-interferenceline.jpg'
 39   h, w = img.shape\[:2\]
 40   # !!!opencv矩阵点是反的
 41   # img\[1,2\] 1:图片的高度,2:图片的宽度
 42   for y in range(1, w - 1):
 43     for x in range(1, h - 1):
 44       count = 0 45       if img\[x, y - 1\] > 245:
 46         count = count + 1
 47       if img\[x, y + 1\] > 245:
 48         count = count + 1
 49       if img\[x - 1, y\] > 245:
 50         count = count + 1
 51       if img\[x + 1, y\] > 245:
 52         count = count + 1
 53       if count > 2:
 54         img\[x, y\] = 255
 55   cv2.imwrite(filename,img)
 56   return img 57 
 58 def interference\_point(img,img\_name, x = 0, y = 0): 59     """点降噪
 60     9邻域框,以当前点为中心的田字框,黑点个数
 61     :param x:
 62     :param y:
 63     :return:
 64     """
 65     filename =  './out\_img/' + img\_name.split('.')\[0\] + '\-interferencePoint.jpg'
 66     # todo 判断图片的长宽度下限
 67     cur\_pixel = img\[x,y\]# 当前像素点的值
 68     height,width = img.shape\[:2\]
 69 
 70     for y in range(0, width - 1):
 71       for x in range(0, height - 1):
 72         if y == 0:  # 第一行
 73             if x == 0:  # 左上顶点,4邻域
 74                 # 中心点旁边3个点
 75                 sum = int(cur\_pixel) \\ 76                       + int(img\[x, y + 1\]) \\
 77                       + int(img\[x + 1, y\]) \\
 78                       + int(img\[x + 1, y + 1\])
 79                 if sum <= 2 \* 245:
 80                   img\[x, y\] = 0 81             elif x == height - 1:  # 右上顶点
 82                 sum = int(cur\_pixel) \\ 83                       + int(img\[x, y + 1\]) \\
 84                       + int(img\[x - 1, y\]) \\
 85                       + int(img\[x - 1, y + 1\])
 86                 if sum <= 2 \* 245:
 87                   img\[x, y\] = 0 88             else:  # 最上非顶点,6邻域
 89                 sum = int(img\[x - 1, y\]) \\
 90                       + int(img\[x - 1, y + 1\]) \\
 91                       + int(cur\_pixel) \\ 92                       + int(img\[x, y + 1\]) \\
 93                       + int(img\[x + 1, y\]) \\
 94                       + int(img\[x + 1, y + 1\])
 95                 if sum <= 3 \* 245:
 96                   img\[x, y\] = 0 97         elif y == width - 1:  # 最下面一行
 98             if x == 0:  # 左下顶点
 99                 # 中心点旁边3个点
100                 sum = int(cur\_pixel) \\
101                       + int(img\[x + 1, y\]) \\
102                       + int(img\[x + 1, y - 1\]) \\
103                       + int(img\[x, y - 1\])
104                 if sum <= 2 \* 245:
105                   img\[x, y\] = 0
106             elif x == height - 1:  # 右下顶点
107                 sum = int(cur\_pixel) \\
108                       + int(img\[x, y - 1\]) \\
109                       + int(img\[x - 1, y\]) \\
110                       + int(img\[x - 1, y - 1\])
111 
112                 if sum <= 2 \* 245:
113                   img\[x, y\] = 0
114             else:  # 最下非顶点,6邻域
115                 sum = int(cur\_pixel) \\
116                       + int(img\[x - 1, y\]) \\
117                       + int(img\[x + 1, y\]) \\
118                       + int(img\[x, y - 1\]) \\
119                       + int(img\[x - 1, y - 1\]) \\
120                       + int(img\[x + 1, y - 1\])
121                 if sum <= 3 \* 245:
122                   img\[x, y\] = 0
123         else:  # y不在边界
124             if x == 0:  # 左边非顶点
125                 sum = int(img\[x, y - 1\]) \\
126                       + int(cur\_pixel) \\
127                       + int(img\[x, y + 1\]) \\
128                       + int(img\[x + 1, y - 1\]) \\
129                       + int(img\[x + 1, y\]) \\
130                       + int(img\[x + 1, y + 1\])
131 
132                 if sum <= 3 \* 245:
133                   img\[x, y\] = 0
134             elif x == height - 1:  # 右边非顶点
135                 sum = int(img\[x, y - 1\]) \\
136                       + int(cur\_pixel) \\
137                       + int(img\[x, y + 1\]) \\
138                       + int(img\[x - 1, y - 1\]) \\
139                       + int(img\[x - 1, y\]) \\
140                       + int(img\[x - 1, y + 1\])
141 
142                 if sum <= 3 \* 245:
143                   img\[x, y\] = 0
144             else:  # 具备9领域条件的
145                 sum = int(img\[x - 1, y - 1\]) \\
146                       + int(img\[x - 1, y\]) \\
147                       + int(img\[x - 1, y + 1\]) \\
148                       + int(img\[x, y - 1\]) \\
149                       + int(cur\_pixel) \\
150                       + int(img\[x, y + 1\]) \\
151                       + int(img\[x + 1, y - 1\]) \\
152                       + int(img\[x + 1, y\]) \\
153                       + int(img\[x + 1, y + 1\])
154                 if sum <= 4 \* 245:
155                   img\[x, y\] = 0
156 cv2.imwrite(filename,img)
157     return img
158 
159 def \_get\_dynamic\_binary\_image(filedir, img\_name):
160   '''
161 自适应阀值二值化
162   '''
163 
164   filename =   './out\_img/' + img\_name.split('.')\[0\] + '\-binary.jpg'
165   img\_name = filedir + '/' + img\_name
166   print('.....' + img\_name)
167   im = cv2.imread(img\_name)
168   im = cv2.cvtColor(im,cv2.COLOR\_BGR2GRAY)
169 
170   th1 = cv2.adaptiveThreshold(im, 255, cv2.ADAPTIVE\_THRESH\_GAUSSIAN\_C, cv2.THRESH\_BINARY, 21, 1)
171 cv2.imwrite(filename,th1)
172   return th1
173 
174 def \_get\_static\_binary\_image(img, threshold = 140):
175   '''
176 手动二值化
177   '''
178 
179   img = Image.open(img)
180   img = img.convert('L')
181   pixdata = img.load()
182   w, h = img.size
183   for y in range(h):
184     for x in range(w):
185       if pixdata\[x, y\] < threshold:
186         pixdata\[x, y\] = 0
187       else:
188         pixdata\[x, y\] = 255
189 
190   return img
191 
192 
193 def cfs(im,x\_fd,y\_fd):
194   '''用队列和集合记录遍历过的像素坐标代替单纯递归以解决cfs访问过深问题
195   '''
196 
197   # print('\*\*\*\*\*\*\*\*\*\*')
198 
199   xaxis=\[\]
200   yaxis=\[\]
201   visited =set()
202   q = Queue()
203 q.put((x\_fd, y\_fd))
204 visited.add((x\_fd, y\_fd))
205   offsets=\[(1, 0), (0, 1), (-1, 0), (0, -1)\]#四邻域
206 
207   while not q.empty():
208       x,y=q.get()
209 
210       for xoffset,yoffset in offsets:
211           x\_neighbor,y\_neighbor = x+xoffset,y+yoffset
212 
213           if (x\_neighbor,y\_neighbor) in (visited):
214               continue  # 已经访问过了
215 
216 visited.add((x\_neighbor, y\_neighbor))
217 
218           try:
219               if im\[x\_neighbor, y\_neighbor\] == 0:
220 xaxis.append(x\_neighbor)
221 yaxis.append(y\_neighbor)
222 q.put((x\_neighbor,y\_neighbor))
223 
224           except IndexError:
225               pass
226   # print(xaxis)
227   if (len(xaxis) == 0 | len(yaxis) == 0):
228     xmax = x\_fd + 1
229     xmin = x\_fd
230     ymax = y\_fd + 1
231     ymin = y\_fd
232 
233   else:
234     xmax = max(xaxis)
235     xmin = min(xaxis)
236     ymax = max(yaxis)
237     ymin = min(yaxis)
238     #ymin,ymax=sort(yaxis)
239 
240   return ymax,ymin,xmax,xmin
241 
242 def detectFgPix(im,xmax):
243   '''搜索区块起点
244   '''
245 
246   h,w = im.shape\[:2\]
247   for y\_fd in range(xmax+1,w):
248       for x\_fd in range(h):
249           if im\[x\_fd,y\_fd\] == 0:
250               return x\_fd,y\_fd
251 
252 def CFS(im):
253   '''切割字符位置
254   '''
255 
256   zoneL=\[\]#各区块长度L列表
257   zoneWB=\[\]#各区块的X轴\[起始,终点\]列表
258   zoneHB=\[\]#各区块的Y轴\[起始,终点\]列表
259 
260   xmax=0#上一区块结束黑点横坐标,这里是初始化
261   for i in range(10):
262 
263       try:
264           x\_fd,y\_fd = detectFgPix(im,xmax)
265           # print(y\_fd,x\_fd)
266           xmax,xmin,ymax,ymin=cfs(im,x\_fd,y\_fd)
267           L = xmax - xmin
268           H = ymax - ymin
269 zoneL.append(L)
270 zoneWB.append(\[xmin,xmax\])
271 zoneHB.append(\[ymin,ymax\])
272 
273       except TypeError:
274           return zoneL,zoneWB,zoneHB
275 
276   return zoneL,zoneWB,zoneHB
277 
278 
279 def cutting\_img(im,im\_position,img,xoffset = 1,yoffset = 1):
280   filename =  './out\_img/' + img.split('.')\[0\]
281   # 识别出的字符个数
282   im\_number = len(im\_position\[1\])
283   # 切割字符
284   for i in range(im\_number):
285     im\_start\_X = im\_position\[1\]\[i\]\[0\] - xoffset
286     im\_end\_X = im\_position\[1\]\[i\]\[1\] + xoffset
287     im\_start\_Y = im\_position\[2\]\[i\]\[0\] - yoffset
288     im\_end\_Y = im\_position\[2\]\[i\]\[1\] + yoffset
289     cropped = im\[im\_start\_Y:im\_end\_Y, im\_start\_X:im\_end\_X\]
290     cv2.imwrite(filename + '\-cutting-' + str(i) + '.jpg',cropped)
291 
292 
293 
294 def main():
295   filedir = './easy\_img'
296 
297   for file in os.listdir(filedir):
298     if fnmatch(file, '\*.jpeg'):
299       img\_name = file
300 
301       # 自适应阈值二值化
302       im = \_get\_dynamic\_binary\_image(filedir, img\_name)
303 
304       # 去除边框
305       im = clear\_border(im,img\_name)
306 
307       # 对图片进行干扰线降噪
308       im = interference\_line(im,img\_name)
309 
310       # 对图片进行点降噪
311       im = interference\_point(im,img\_name)
312 
313       # 切割的位置
314       im\_position = CFS(im)
315 
316       maxL = max(im\_position\[0\])
317       minL = min(im\_position\[0\])
318 
319       # 如果有粘连字符,如果一个字符的长度过长就认为是粘连字符,并从中间进行切割
320       if(maxL > minL + minL \* 0.7):
321         maxL\_index = im\_position\[0\].index(maxL)
322         minL\_index = im\_position\[0\].index(minL)
323         # 设置字符的宽度
324         im\_position\[0\]\[maxL\_index\] = maxL // 2
325         im\_position\[0\].insert(maxL\_index + 1, maxL // 2)
326         # 设置字符X轴\[起始,终点\]位置
327         im\_position\[1\]\[maxL\_index\]\[1\] = im\_position\[1\]\[maxL\_index\]\[0\] + maxL // 2
328         im\_position\[1\].insert(maxL\_index + 1, \[im\_position\[1\]\[maxL\_index\]\[1\] + 1, im\_position\[1\]\[maxL\_index\]\[1\] + 1 + maxL // 2\])
329         # 设置字符的Y轴\[起始,终点\]位置
330         im\_position\[2\].insert(maxL\_index + 1, im\_position\[2\]\[maxL\_index\])
331 
332       # 切割字符,要想切得好就得配置参数,通常 1 or 2 就可以
333       cutting\_img(im,im\_position,img\_name,1,1)
334 
335       # 识别验证码
336       cutting\_img\_num = 0
337       for file in os.listdir('./out\_img'):
338         str\_img = ''
339         if fnmatch(file, '%s-cutting-\*.jpg' % img\_name.split('.')\[0\]):
340           cutting\_img\_num += 1
341       for i in range(cutting\_img\_num):
342         try:
343           file = './out\_img/%s-cutting-%s.jpg' % (img\_name.split('.')\[0\], i)
344           # 识别验证码
345           str\_img = str\_img + image\_to\_string(Image.open(file),lang = 'eng', config='\-psm 10') #单个字符是10,一行文本是7
346         except Exception as err:
347           pass
348       print('切图:%s' % cutting\_img\_num)
349       print('识别为:%s' % str\_img)
350 
351 if \_\_name\_\_ == '\_\_main\_\_':
352   main()

View Code

write at the end

Today, the editor will also share with you a set of Python learning materials and open classes. The contents are all notes and materials suitable for beginners with zero basic knowledge. You can understand and understand them even if you don't know programming.
Friends, if you need it, you can click here [Get it for free]

Insert image description here

1. Learning routes in all directions of Python

When you first start learning python, if you don't even plan the complete learning steps, it is basically impossible to learn python. He organized all directions of Python and formed a summary of knowledge points in various fields. (The picture is too big to fit here. If you don’t have the full version, you can get it for free at the end of the article)
Insert image description here

2. Introductory learning video

When we watch videos to learn, we cannot just move our eyes and brains but not our hands. The more scientific learning method is to use them after understanding. At this time, hands-on projects are very suitable.
Insert image description here

3. Practice is the only criterion for testing truth

Learning python is just like learning mathematics. You can't just read the book without doing the questions. Looking at the steps and answers directly will make people mistakenly think that they have mastered everything, but they will still be at a loss when they encounter the questions.

Therefore, in the process of learning python, you must remember to write code frequently. You only need to read the tutorial once or twice.
Insert image description here

4. Interview materials

We must learn Python to find a high-paying job. The following interview questions are the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.
Insert image description here

Guess you like

Origin blog.csdn.net/2301_78095909/article/details/130880280