Image segmentation of handwritten digit recognition system

background

This article mainly introduces some technical experience related to handwritten digit recognition that I studied when I was in school, mainly related to digital image processing, feature extraction, neural network and other related technologies. .

Although many of the more mature algorithms currently available on the Internet are used, I have made a lot of algorithm improvements on these foundations. .

And in order to write this project, I was also specifically wrote a set of neural network library, starting from the image processing to the final identification process does not use any third-party libraries are starting from zero, or write
did not use opencv ah what .

The qt used by the upper UI at the time, although it was considered to be cross-platform at the time, was still a student at that time and lacked code experience, so my basic library was not very good at handling cross-platform. .

那个基础库,我稍微简单说下,那是我的第一个开发库,是一个类似boost的c++模板库,里面用到了很多c++的模板元编程的特性,但是现在已经对c++无爱了,所以早已废弃不用了。

不过也就是这个库的开发,很大程度上影响了我之后的编码风格,也是至此之后,我重点转向了对c的开发上。。

This recognition system was only used by me to train my hands to learn neural networks at the time. It can’t be compared with those mature ones. The recognition rate is not very high. It can only be used for reference and learning.

Introduction

Based on the basic BP algorithm and digital image and processing, this article explores how to achieve high-robust, high-precision, and high-efficiency offline digital recognition by improving the network and image processing algorithms, and combining practice.

Here I mainly study offline single number recognition, the main steps are:

overview_1

Digital sample collection

Mainly adopt the digital sample specification of 5 rows and 10 columns. The acquisition method is to obtain the image by scanning the sample card, and try to avoid the distortion of the sample, as shown in the figure:

overview_2

Binary image

Mainly use global threshold segmentation method and adaptive local threshold segmentation method to achieve adaptive segmentation under different brightness backgrounds, and compare the results.

Digital extraction

At present, clustering method, matrix segmentation method, and connected region labeling method are mainly considered, and their advantages and disadvantages are compared, and the algorithm with the best effect is selected.

Image normalization

Mainly adopt bilinear interpolation and nearest neighbor interpolation to realize the enlargement, in order to reduce the distortion caused by the shrinking of the image, it is currently planned to adopt the averaging method to realize it.

Feature extraction

It mainly adopts two methods: pixel-by-pixel extraction and PCA principal component extraction.

Sample learning

The neural network based on the BP algorithm (back propagation learning algorithm) is mainly used for identification, and certain improvements and optimizations are made to the BP to improve the training effect and appropriately increase the training rate.

At present, the improved algorithm of BP mainly adopts the method of adding momentum term and adaptive step length.

For the BP algorithm, the main steps are:

    前向计算=〉反向计算=〉权值修正=〉循环迭代

In order to further improve the network and realize high-precision and high-efficiency recognition, it is planned to consider adopting a multi-network integration method for optimization.
It mainly integrates basic BP networks with different weights and hidden layers. The results of each network classification are weighted and output to achieve effective classification.

Threshold segmentation

Threshold segmentation is a region-based image segmentation technology. Its basic principle is to divide image pixels into several categories by setting different feature thresholds.

In this article, we mainly deal with two types of segmentation. Let the threshold value be T, and the gray level corresponding to the image pixel is f(x, y), then the image after threshold segmentation is g(x, y)defined as:

split_1

Therefore, the pixel marked 1 corresponds to the object, which is the foreground, and the object marked 0 corresponds to the background, which is what we usually call image binarization.

The main problem of using threshold segmentation for image binarization is the selection of threshold. Facts have proved that the appropriateness of threshold selection plays a decisive role in the effect of segmentation.

There are three commonly used threshold segmentation methods:

Global threshold method

The information of the entire image is used to find the optimal threshold for the image, and only this fixed threshold is used in the binarization segmentation process, so the amount of calculation is small, but the segmentation effect for images with poor brightness conditions is poor.

Local threshold method

It divides the original image into several small sub-images, and then calculates the optimal threshold for each sub-image. Therefore, the effect is better, but the overhead is large, and the local size is not easy to determine, too small is easy to be distorted, and too large is not significant.

Dynamic threshold method

Its threshold value calculation method not only depends on the gray value of the pixel and the gray value of the pixel in the field, but also is related to the coordinate position of the pixel. This method is flexible, but it has high complexity, computational complexity, and time overhead. Both are relatively large.

In the book digital image processing written by Gonzalez , a minimum error threshold is given. The gray histogram is fitted with a double-peak Gaussian density curve by using the conjugate gradient method to find the best The threshold is very effective, but the calculation is too large, and it is difficult to process images with insignificant double peaks. Additional single-peak detection and interpolation processing are needed, which are too complicated and difficult to implement.

split_2

In this paper, the adaptive OTSU local threshold method is used to segment the image, and OTSU and the local threshold method are improved, which not only improves the performance but also improves the segmentation effect. It can also achieve better segmentation for images with uneven brightness. .

Maximum Between-Class Variance Method (OTSU)

The maximum between-class variance method proposed by Otsu in 1978 has been widely used because of its simple calculation, stability and effectiveness. The main idea is to select a threshold to minimize the intra-class variance or maximize the between-class variance. The Otsu algorithm is not only simple to calculate, but can also be applied to multiple threshold determinations, so it can be said to be a very good threshold selection method.
We usually maximize the variance between classes to achieve threshold segmentation. The variance between classes is defined as:

otsu_1

among them

|| u || The total mean gray value in the image||
|| u1 || The mean gray value of pixels in the image less than the threshold T||
|| u2 || The mean gray value of the pixels in the image greater than the threshold T||
|| n1 || The number of pixels in the image smaller than the threshold T ||
|| n2 || The number of pixels in the image larger than the threshold T||

Therefore, only by traversing 256 gray levels, finding the gray value that maximizes the variance between classes is the optimal threshold T.

Implementation and improvement of OTSU

However, if the average value and the number of pixels on both sides of the threshold need to be recalculated for each traversal, the calculation amount is quite large. If the result of the previous calculation can be used in the next traversal, the calculation amount can be greatly reduced.
Assuming that the grayscale histogram is, the total mean value of the image, and the total number of pixels in the image is, then the recursive method is as follows:

otsu_2

To further simplify the calculation, we used can otsu_3be replaced, to give

otsu_4

Since n is unchanged in recursion and can be omitted, it can be changed to

otsu_5

Since this article is aimed at the segmentation of character images, since the strokes of characters are usually thin, usually less than 1/4 of the image, the threshold can be adjusted appropriately to achieve a better segmentation effect. The improved threshold is

otsu_6

Realization and improvement of local threshold

However, in actual images, due to the influence of noise or other interference and other factors, OTSU threshold segmentation cannot make the image segmentation obtain satisfactory results, and will often cause serious segmentation errors. This is because the gray image histogram distribution is not necessarily
obvious peaks and valleys, the pixel gray value merely reflects the magnitude of the size of the pixel gray level, it does not reflect the pixel spatial information about the neighborhood.

Through specific experiments, it is found that:

当图像亮度分布不均匀时,往往无法得到好的分割效果,通常会出现大块的黑块,或者过渡分割而丢失信息的情况。

Therefore, you can reduce the occurrence of these situations by dividing the image into blocks and performing OTSU segmentation for each small block, but this will cause an undesirable "checkerboard" effect. In order to avoid this situation, you can use the following Improved local threshold algorithm:

遍历图像中每一像素,在该像素的邻域内进行灰度统计,计算OTSU阈值,并仅对该点进行阈值分割。

这样就能在较好的分割效果下实现像素平滑过渡,避免了“棋盘”效应,由于在当像素移动时,只有一行或一列改变,所以可以在每步移动中,以新数据更新前一个位置得到的直方图,从而避免了每次重新计算整个直方图,大大减少了计算量,使其在一个可接受的范围内。

为了防止部分区域受到噪声干扰而产生的黑块现象,可以在进行局部阈值处理前,进行三阶的平滑处理,效果相当显著。

result

Original image

split_3

Image processed by global threshold

split_4

Improved local thresholding image

split_5

to sum up

As can be seen from the above figure, the effect of the image after the improved local threshold processing is quite obvious, but there are still some shortcomings. .

That is, the processed image strokes are thicker, which is easy to fill in the holes in the numbers, especially the numbers 4, 6, 8, 9 which contain small holes. These all need to be further improved.

In the follow-up, I will also summarize: tilt correction, digital extraction, feature extraction, some experience and improved algorithms related to neural networks. .

Finally, post two more hnr projects, screenshots of the interface. .

before
after


Personal homepage: TBOOX open source project
Original source: http://www.tboox.org/cn/2016/07/28/hnr-split-image/

Guess you like

Origin blog.csdn.net/waruqi/article/details/53201637