python - parse verification code images (easy to get started quickly, you can learn it at a glance)

Give you a digital verification code picture and ask you to parse the numbers in the picture

Really different verification code pictures should be analyzed using different methods, that is, different codes. Since we do not use some real algorithms and data training here, the running results may have deviations, losses, etc. Phenomenon.

To parse the numbers in the verification code image, you can proceed through the following steps:

  1. Prepare your environment: Make sure you have the appropriate image processing and optical character recognition (OCR) tools installed on your computer.

  2. Import dependent libraries: According to the tool and programming language you choose, import the corresponding dependent libraries, such as OpenCV for image processing, Tesseract OCR library for character recognition, etc.

  3. Read the verification code image: Use the image processing library to load the verification code image and convert it into an appropriate data format, such as a grayscale image or a binary image.

  4. Image preprocessing: Preprocess loaded images to optimize character recognition accuracy. For example, operations such as image smoothing, binarization, and denoising can be applied to enhance the clarity of characters in the image.

  5. Character recognition: Use OCR tools to perform character recognition on pre-processed images. You can call the corresponding OCR library or API and provide it with an image as input. Depending on the tool you choose, some parameters may need to be adjusted to optimize recognition accuracy.

  6. Result Extraction: Extract numbers from the output of OCR tools. Typically, an OCR tool returns a piece of text from which you can extract the numbers using string processing techniques or regular expressions.

Please note that different CAPTCHA images may have different fonts, sizes, and backgrounds. Some verification codes may also use techniques such as distortion and interference lines to increase the difficulty of identification. Therefore, the actual implementation steps may need to be adjusted and modified based on the specific situation. In addition, recognition accuracy is also affected by factors such as image quality and character clarity.

from PIL import Image
import pytesseract

# 加载验证码图片
image_path = "21.png"  # 替换为实际的图片路径
image = Image.open(image_path)

# 转换为灰度图像
gray_image = image.convert('L')

# 对图像进行二值化处理
threshold = 127
binary_image = gray_image.point(lambda p: p > threshold and 255)

# 使用 Tesseract OCR 进行数字识别
code = pytesseract.image_to_string(binary_image, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

print("验证码识别结果:", code)
import cv2
import pytesseract

# 加载验证码图片
image_path = "1.png"  # 替换为实际的图片路径
image = cv2.imread(image_path)

# 将图像转换为灰度图像
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# 对图像进行预处理:去噪、边缘检测和二值化处理
processed_image = cv2.GaussianBlur(gray_image, (5, 5), 0)
processed_image = cv2.Canny(processed_image, 50, 150)
_, processed_image = cv2.threshold(processed_image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# 使用 Tesseract OCR 进行数字识别
code = pytesseract.image_to_string(processed_image, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

print("验证码识别结果:", code)

import cv2
import pytesseract

# 加载验证码图片
image_path = "12.png"  # 替换为实际的图片路径
image = cv2.imread(image_path)

# 将图像转换为灰度图像
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# 对图像进行二值化处理
_, binary_image = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# 去除干扰线
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
clean_image = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel)

# 使用 Tesseract OCR 进行数字识别
code = pytesseract.image_to_string(clean_image, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

print("验证码识别结果:", code)

Picture: 1.png 

Image: 21.png

 

Image: 12.png

 

 How to tessdata each language collection package and tesseract driver

Please click on the link below for detailed explanation

How to install tessdata language collection packages and tesseract driver-CSDN Blog

You can learn it even with 0 basic knowledge

Guess you like

Origin blog.csdn.net/weixin_66547608/article/details/134135162