Denoising of simple captcha using python PIL library

First of all , I would like to thank the blogger for the complete process of character image verification code recognition and Python implementation . Most of my knowledge points are learned from him.

To identify the verification code, after collecting enough samples, the first thing to do is to process the original image of the verification code. Before identifying and classifying the verification code, it generally includes: converting the color image into a grayscale image, converting the grayscale image 2 There are three basic processes of valueization and denoising. Here we only take a relatively simple verification code as an example to introduce how to denoise images through python's PIL library.

First look at the unprocessed captcha image:

The Image class of the PIL library is mainly used for image processing.

1. Convert color images to grayscale images

First use the open method of Image to open the above picture, you can get a PIL.Image.Image object, and then you can call convert, filter, point and putpixel and other methods to process the picture.

We can convert the above color image to grayscale through the convert method:

# encoding=utf8

from PIL import Image

def main():
	image = Image.open('RandomPicture.png')
	imgry = image.convert('L')
	imgry.save('gray.png')

if __name__ == '__main__':
	main()

operation result:

It can be seen from the saved image that the original color image has been turned into a grayscale image, or it can also be considered as a black and white image. What is a grayscale image? We know that a color image is composed of pixels of different colors, and a grayscale image can be similarly thought of as a combination of pixels of different grayscale values. Any color is composed of red, green, and blue primary colors. If the original color of a certain point is RGB (R, G, B), then we can convert it to grayscale through the following methods:

1. Floating point arithmetic: Gray=R*0.3+G*0.59+B*0.11
2. Integer method: Gray=(R*30+G*59+B*11)/100
3. Shift method: Gray =(R*76+G*151+B*28)>>8;
4. Average method: Gray=(R+G+B)/3;
5. Only take green: Gray=G;

After obtaining Gray by any of the above methods, replace R, G, B in the original RGB (R, G, B) with Gray to form a new color RGB (Gray, Gray, Gray), use it to replace The original RGB (R, G, B) is a grayscale image .

Take a look at the code implementation:

# encoding=utf8

from PIL import Image

def main():
	image = Image.open('RandomPicture.png')
	print 'image mode: ', image.mode
	print image.getpixel((0, 0))
	print '-' * 40
	imgry = image.convert('L')
	print 'imgry mode: ', imgry.mode
	print imgry.getpixel((0, 0))

if __name__ == '__main__':
	main()

operation result:

image mode:  RGB
(21, 10, 26)
----------------------------------------
imgry mode:  L
15

Code description:

The mode value of the current PIL.Image.Image object (that is, the currently opened image) can be obtained through the image.mode method, and the mode value indicates whether the unit color of the image is composed of three RGB values or grayscale values. ;

And getpixel can get the RGB value or gray value of a pixel. We know that the picture is composed of many pixels, each pixel has a corresponding coordinate x and y on the picture, and "(0, 0)" represents the pixel of the upper left corner of the picture.

From the above results, we can know that before converting the picture to grayscale, the color of the pixel represented by "(0, 0)" is composed of RGB: (21, 10, 26); After converting to grayscale, the color value of the pixel represented by "(0, 0)" becomes a value: "15". We can also know by printing imgry.mode that the image has become a grayscale image at this time. , the color of each pixel of it becomes a grayscale value.

In fact, at this time, we can also simply calculate, using the floating-point algorithm mentioned above to bring the above three values (21, 10, 26) into the calculation:

>>> 21*0.3+10*0.59+26*0.11
15.059999999999999

The results show that the RGB values are indeed turned into grayscale values by floating point arithmetic.

2. Grayscale image binarization

Now that we have the grayscale image, the next step is to binarize the grayscale image. The so-called binarization is to convert a grayscale image into an image composed of black and white. The idea is to determine a threshold, pixels greater than the threshold are represented as white, and pixels less than the threshold are represented as black, so as to divide the pixels (gray values) of the image into two parts: 0 and 1, for example, 0 represents black and 1 represents white. , and then we can use a string of 0s and 1s to represent a picture.

The point method is used to binarize the grayscale image, which can receive a grayscale-to-binary mapping table. The specific principle has not yet been understood. The code implementation process is as follows:

# encoding=utf8

from PIL import Image

def get_bin_table(threshold=115):
	'''
	Get the mapping table from grayscale to binary
	0 means black, 1 means white
	'''
	table = []
	for i in range(256):
		if i < threshold:
			table.append(0)
		else:
			table.append(1)
	return table

def main():
	image = Image.open('RandomPicture.png')
	imgry = image.convert('L')
	table = get_bin_table()
	binary = imgry.point(table, '1')
	binary.save('binary.png')

if __name__ == '__main__':
	main()

operation result:

It is not difficult to see from the results that we have converted the original color image from a grayscale image to a picture consisting of only black and white, and achieved binarization. It should be noted here that the value of the threshold parameter is suitable for the current verification code picture, and the value needs to be determined by debugging according to the different verification code types.

Then let's see what the color value of the pixel represented by the (0, 0) coordinate is:

# encoding=utf8

from PIL import Image

def get_bin_table(threshold=115):
	'''
	Get the mapping table from grayscale to binary
	0 means black, 1 means white
	'''
	table = []
	for i in range(256):
		if i < threshold:
			table.append(0)
		else:
			table.append(1)
	return table

def main():
	image = Image.open('RandomPicture.png')
	print 'image mode: ', image.mode
	print image.getpixel((0, 0))
	co = image.getcolors()
	print co
	print '-' * 40
	imgry = image.convert('L')
	print 'imgry mode: ', imgry.mode
	print imgry.getpixel((0, 0))
	co = imgry.getcolors()
	print co
	print '-' * 40
	table = get_bin_table()
	binary = imgry.point(table, '1')
	print 'binary mode: ', binary.mode
	print binary.getpixel((0, 0))
	co = binary.getcolors()
	print co

if __name__ == '__main__':
	main()

operation result:

image mode:  RGB
(21, 10, 26)
None
----------------------------------------
imgry mode:  L
15
[(1, 2), (2, 3), (1, 4), (1, 5), (4, 6), (3, 8), (3, 9), (6, 10), (4, 11), (4, 12), (7, 13), (8, 14), (3, 15), (12, 16), (7, 17), (6, 18), (5, 19), (13, 20), (9, 21), (9, 22), (4, 23), (5, 24), (7, 25), (3, 26), (6, 27), (7, 28), (3, 29), (3, 30), (3, 31), (5, 32), (1, 33), (3, 35), (2, 36), (2, 37), (2, 38), (2, 39), (1, 41), (3, 42), (1, 43), (2, 44), (7, 45), (3, 46), (5, 47), (1, 48), (3, 49), (3, 50), (3, 51), (5, 52), (4, 53), (1, 54), (7, 55), (7, 56), (10, 57), (4, 58), (5, 59), (6, 60), (5, 61), (12, 62), (7, 63), (10, 64), (12, 65), (14, 66), (15, 67), (11, 68), (9, 69), (11, 70), (7, 71), (9, 72), (5, 73), (10, 74), (5, 75), (5, 76), (5, 77), (8, 78), (7, 79), (3, 80), (5, 81), (6, 82), (5, 83), (3, 84), (3, 85), (6, 86), (2, 87), (3, 88), (2, 90), (3, 91), (1, 93), (2, 94), (3, 95), (1, 96), (3, 97), (2, 99), (3, 100), (3, 101), (1,102), (3, 104), (4, 105), (1, 106), (3, 108), (4, 110), (4, 111), (4, 112), (3, 113), (3, 114), (5, 115), (2, 116), (3, 117), (8, 118), (8, 119), (8, 120), (7, 121), (9, 122), (9, 123), (11, 124), (11, 125), (2, 126), (10, 127), (9, 128), (7, 129), (13, 130), (11, 131), (11, 132), (9, 133), (16, 134), (11, 135), (12, 136), (8, 137), (14, 138), (12, 139), (13, 140), (20, 141), (22, 142), (19, 143), (14, 144), (23, 145), (17, 146), (10, 147), (18, 148), (13, 149), (11, 150), (26, 151), (16, 152), (14, 153), (11, 154), (17, 155), (10, 156), (12, 157), (12, 158), (20, 159), (18, 160), (16, 161), (22, 162), (20, 163), (16, 164), (13, 165), (14, 166), (13, 167), (11, 168), (17, 169), (8, 170), (16, 171), (20, 172), (12, 173), (10, 174), (10, 175), (10, 176), (11, 177), (7, 178), (8, 179), (7, 180), (5, 181), (7, 182), (4, 183), (7, 184), (4, 185), (4,186), (5, 187), (6, 188), (2, 189), (1, 190), (4, 191), (6, 192), (12, 193), (8, 194), (10, 195), (3, 196), (13, 197), (9, 198), (19, 199), (18, 200), (20, 201), (16, 202), (18, 203), (24, 204), (33, 205), (25, 206), (33, 207), (38, 208), (31, 209), (46, 210), (39, 211), (53, 212), (54, 213), (33, 214), (42, 215), (54, 216), (60, 217), (50, 218), (36, 219), (48, 220), (32, 221), (45, 222), (28, 223), (24, 224), (21, 225), (19, 226), (21, 227), (13, 228), (12, 229), (12, 230), (13, 231), (5, 232), (8, 233), (4, 234), (5, 235), (1, 236), (1, 237), (2, 238), (1, 239), (1, 240), (1, 242), (1, 243)]210), (39, 211), (53, 212), (54, 213), (33, 214), (42, 215), (54, 216), (60, 217), (50, 218), (36, 219), (48, 220), (32, 221), (45, 222), (28, 223), (24, 224), (21, 225), (19, 226), (21, 227), (13, 228), (12, 229), (12, 230), (13, 231), (5, 232), (8, 233), (4, 234), (5, 235), (1, 236), (1, 237), (2, 238), (1, 239), (1, 240), (1, 242), (1, 243)]210), (39, 211), (53, 212), (54, 213), (33, 214), (42, 215), (54, 216), (60, 217), (50, 218), (36, 219), (48, 220), (32, 221), (45, 222), (28, 223), (24, 224), (21, 225), (19, 226), (21, 227), (13, 228), (12, 229), (12, 230), (13, 231), (5, 232), (8, 233), (4, 234), (5, 235), (1, 236), (1, 237), (2, 238), (1, 239), (1, 240), (1, 242), (1, 243)]
----------------------------------------
binary mode:  1
0
[(503, 0), (1993, 1)]

Code description:

Through the value of binary mode, we can know that the pixel value of the image obtained after binarization is represented by 0 or 1, and the pixel value represented by the current (0, 0) is 0, which represents black. Through the above picture, we can also Know that the top left vertex is indeed black.

In the above code, we also use the getcolors method, which is used to return pixel information, which is a list with elements: [(the number of pixels of this type, (pixels of this type)),(...),...] , when the list is very large, it will return None, which is why the above color image will return None when calling getcolors. And [(503, 0), (1993, 1)] represents the binary black and white image we get, which consists of 503 black pixels and 1993 white pixels.

Through binary.size, we can get the width and height values of the binarized black and white image: (78, 32), which means that the image consists of 78X32 pixels, which is exactly equal to the sum of 503+1993. (78, 32) also indicates that the picture has 32 lines in the horizontal direction, and each line has 78 pixels. Print out the picture represented by 0 and 1 and take a look:

# encoding=utf8

from PIL import Image

def get_bin_table(threshold=115):
	'''
	Get the mapping table from grayscale to binary
	0 means black, 1 means white
	'''
	table = []
	for i in range(256):
		if i < threshold:
			table.append(0)
		else:
			table.append(1)
	return table

def main():
	image = Image.open('RandomPicture.png')
	imgry = image.convert('L')
	table = get_bin_table()
	binary = imgry.point(table, '1')
	width, height = binary.size
	lis = binary.getdata() # Returns all the pixel values of the picture, you need to use list() to display the specific values
	lis = list(lis)
	start = 0
	step = width
	for i in range(height):
		for p in lis[start: start+step]:
			if p == 1: # Turn the white dots into spaces for easy viewing
				p = ' '
			print p,
		print
		start += step

if __name__ == '__main__':
	main()

operation result:

From the above results, it can be roughly seen that the picture represents "959c".

3. Remove noise

It is not difficult to see from the above results that in addition to the "0" representing "959c", there are other "noises" represented by "0" in the picture. We need to remove them as much as possible to facilitate later recognition training.

For noise removal, I also borrowed the complete process of character image verification code recognition and the "nine palace grid" method in Python implementation . The code implementation:

# encoding=utf8

from PIL import Image

def sum_9_region_new(img, x, y):
	'''Determine noise'''
	cur_pixel = img.getpixel((x, y)) # The value of the current pixel
	width = img.width
	height = img.height

	if cur_pixel == 1: # If the current point is a white area, the neighborhood value is not counted
		return 0

	# Since there are black dots around the current picture, the surrounding black dots can be removed
	if y < 3: # In this example, the black dots in the first two lines can be removed
		return 1
	elif y > height - 3: # bottom two lines
		return 1
	else: # y is not on the boundary
		if x < 3: # first two columns
			return 1
		elif x == width - 1: # right non-vertex
			return 1
		else: # with 9 field conditions
			sum = img.getpixel((x - 1, y - 1)) \
				  + img.getpixel((x - 1, y)) \
				  + img.getpixel((x - 1, y + 1)) \
				  + img.getpixel((x, y - 1)) \
				  + cur_pixel \
				  + img.getpixel((x, y + 1)) \
				  + img.getpixel((x + 1, y - 1)) \
				  + img.getpixel((x + 1, y)) \
				  + img.getpixel((x + 1, y + 1))
			return 9 - sum

def collect_noise_point(img):
	'''collect all noise'''
	noise_point_list = []
	for x in range(img.width):
		for y in range(img.height):
			res_9 = sum_9_region_new(img, x, y)
			if (0 < res_9 < 3) and img.getpixel((x, y)) == 0: # find outliers
				pos = (x, y)
				noise_point_list.append(pos)
	return noise_point_list

def remove_noise_pixel(img, noise_point_list):
	'''According to the position information of the noise, remove the black noise of the binary image'''
	for item in noise_point_list:
		img.putpixel((item[0], item[1]), 1)

def get_bin_table(threshold=115):
	'''Get the grayscale to binary mapping table, 0 means black, 1 means white'''
	table = []
	for i in range(256):
		if i < threshold:
			table.append(0)
		else:
			table.append(1)
	return table

def main():
	image = Image.open('RandomPicture.png')
	imgry = image.convert('L')
	table = get_bin_table()
	binary = imgry.point(table, '1')
	noise_point_list = collect_noise_point(binary)
	remove_noise_pixel(binary, noise_point_list)
	binary.save('finaly.png')

if __name__ == '__main__':
	main()

operation result:

As you can see from the screenshots, we have removed the noise around the image and some isolated noise.

Another thing to say is that in addition to the above steps, we can also do other processing on the image (such as increasing contrast, brightness, sharpening, etc.) through PIL's ImageEnhance, and ImageFilter, and I will not give an example here, because different images After these treatments, the effect may be different.

Finally, I would like to thank these bloggers for their sharing, which provided a lot of references for me to learn about verification code recognition:

Character image verification code recognition complete process and Python implementation

Python3.5+sklearn uses SVM to automatically identify letter verification codes

python simple verification code recognition

Python-based PIL library learning (1)

Denoising of simple captcha using python PIL library

Guess you like