Digit extraction in handwritten digit recognition system

introduction

The so-called digital segmentation refers to the process of extracting a single digital area in an image after binarization. Number segmentation is an indispensable key step in number recognition. Only by accurately extracting numbers can they be identified one by one.

Number division method

There are quite a few ways to divide numbers, mainly as follows:

Segmentation based on histogram

此类方法就是对每行和每列黑色像素数进行统计,生成行直方图和列直方图,并采用某种阈值选取法分别对图像进行行列分割。

这种方法简单快速,对于按矩阵分布的规则数字的分割效果相当好,但是无法对不规则分布的数字进行分割,因此具有一定的局限性。

Cluster-based segmentation

聚类就是一个将数据集划分为若干组或类的过程,通过聚类使得同一组内的数据对象具有较高的相似度,而不同组中的数据对象则是不相似的,由于数字图像的特征正好符合这类情况,因而可以使用聚类来达到分割数字的效果。

其方法主要包括基于距离矩阵的聚类分割、k-均值聚类分割、模糊C-均值聚类分割

此类分割方法对数字的位置和大小没有限制,非常适合对位置和大小不规则的数字进行,但这种方法也有明显的不足,其中基于距离矩阵的聚类分割的计算量太大,尤其是对较大的图像,而且矩阵的占用空间相当多,因此在实际中无法使用。

而k-均值聚类分割、模糊C-均值聚类分割这些动态聚类虽然解决了这些问题,但是他们对于初始中心的选取极为敏感,虽然已经有各种对于中心选取得优化算法,但是其分类个数必须人为指定之一限制,最终使此类算法无法应用到实际的数字分割中。

Segmentation based on connected region labeling of binary image

二值图像连通区域标记是指将图像中符合某种连通规则的目标像素点用相同的标号表示出来。

因此这种方法非常适用于数字分割,因为每个数字本身就是一个连通区域,而且这种方法不受分类数的限制,也适用于不规则分布的数字图像,实现简单快速,是一种相当好的分割方法。

In this paper, this method is used to segment the digital area of ​​the image, and appropriate improvements have been made to it.

So far, the connected region labeling methods of binary images mainly include the following categories:

  • Two scan method
第一次扫描时, 将临时标号存储在一个与图像大小一样的二维数组中并形成等价对。

扫描结束时,通过某种搜索方法合并等价标号; 第二次扫描时, 用等价标号中最小的标号值赋予所有等价标号对应的像素点。
  • Two-way repeated scanning method:
第一次扫描时, 将每个目标像素点标记为一个唯一的标号。

然后, 通过正向和反向反复扫描标号图像, 并在每个像素的邻域内传播最小标号, 直到没有标号变化时为止。
  • Regional growth method
依次扫描二值图像的每一个像素点。当找到某个未标记的目标像素点时, 将其压入堆栈并从该点开始反复标记其邻域, 直到堆栈为空。

Considering that digital segmentation is only a small step in digital recognition, it is quite unworthy to spend too much time here, so I use a relatively fast two-scan method for segmentation.

In the two-scan method, merging equivalent labels is a very critical step, so how to improve the merging speed is very important, so I mainly use the following improved area marking method:

1. 构造一散列表,以标号作为关键字进行散列,说白了就是一数组。每个元素指向一个双向链表,在链表中存储同一类别的像素点坐标。
2. 构造一个与图像同样大小的标记矩阵,用于存储每个像素的分类标号
3. 初始化标记矩阵,并对图像第一行和第一列中的黑色像素进行标记,标记依次递增,并将相应的像素点位置保存到对应标记的散列表中。
4. 依次遍历其它各行各列,若当前像素点为黑色,则将其左前、左上角、正上、右上角这四个邻点中为黑色的像素的最小标记赋给当前点,并将这四点中为黑色的像素点归并到最小标记中,具体归并方法为:将四点中为黑色的像素点的原标记在散列表中所指向的像素链表与最小标记所在链表进行合并,并更新标记值。若其四个邻点中没有黑色像素点,那么当前点属于新类,创建一个新的标记值及相应的链表。
5. 若当前像素点为白色,则标为无效标记,继续第(4)步。

Filter invalid areas

Since the image will have some noise and invalid blocks more or less, it is usually necessary to do some denoising and filtering of non-digital areas before segmentation to achieve a good segmentation effect.

Taking into account the slenderness of the number itself, it is not suitable for the median filtering algorithm to remove salt and pepper noise.

Here I use the simplest discrete denoising method. I only take out some discrete noise points. For other large invalid regions, I use statistics to classify the average height and width of the sample, set appropriate thresholds to filter, and determine the apparent height and width. The areas that do not meet the numerical characteristics are also filtered one by one.

As can be seen from the experimental results, this effect is still quite good.

Experimental result

extract_digital_1
extract_digital_2

However, it can be seen that the marked order is very sensitive to the height of the sample area because it is marked by row. The height is slightly different, and the original order will be disrupted after segmentation (especially for those with a relatively regular arrangement. Image). If you want to segment in order, you must perform additional sorting operations in the later stage of the processing. The specific steps are:

  1. Convert the original classified image hashed by the mark into a matrix hash based on the center of the area image. At this time, the generated matrix is ​​a sparse matrix, and only the point at the center of the area points to the corresponding digital image
  2. The matrix is ​​divided into rows according to a certain threshold. Since the corners in the matrix are sparse, the threshold is better and the segmentation effect is better. It does not require too much optimization to achieve good results.
  3. Sort the area image of each row according to the abscissa value of the center position to restore the original digital distribution. The effect is as follows:

extract_digital_3

Moreover, a major advantage of area marking is that the processing effect for images with extremely irregular digital distribution is also quite good, which can solve the problem of entangled numbers and difficult to segment. The effects are as follows:

extract_digital_4

to sum up

Although the effect of segmentation using the area notation method is quite satisfactory, there are still some shortcomings. For example, only numbers with good connectivity can be segmented, while those with faults cannot achieve better results. Segmentation. For this reason, additional fault repair is required before segmentation, and the process of fault repair is more complicated and the repair effect is limited.

Therefore, to realize a highly versatile digital segmentation algorithm, further research is needed.

references

If you want to learn more about digital image processing, you can go to my bookcase . The book Digital Image Processing written by Gonzalez is still very classic. .


Personal homepage: TBOOX open source project
Original source: http://www.tboox.org/cn/2016/07/30/hnr-extract-digital/

Guess you like

Origin blog.csdn.net/waruqi/article/details/53201558