Adaptive threshold method (graphic summary)

concept:

There are many simple but difficult-to-understand formulas on the Internet for this method. This article will start with simple examples and understand those formulas step by step, but does not implement codes. After all, as long as you understand the idea of ​​​​this algorithm, the implementation of codes is actually ever-changing. , and even use the CV library without writing code at all.

Why did this binarization algorithm appear, and where is the fatal flaw of OTSU?

In the Otsu algorithm OTSU, since the global average is used to find the best global threshold, but in the scene with uneven illumination of the picture, it is easy to use the slightly blurred edge target image as the background. In order to avoid this scene, we use A more delicate local adaptive threshold can be considered, which continuously calculates and updates the local threshold according to the brightness of different areas of the image, and at the same time "refreshes" the local image until the entire target image is "swiped", so for different areas of the image , adapting to different thresholds, so it is called locally adaptive thresholding method.
As shown in the figure below:
insert image description here
I use the white line to distinguish the part in the middle. Assuming that the global threshold method is used for binarization. Part of the text in the lower left corner is obviously artificially retarded, so how do we humans know that the dark part also needs to be recognized? And how to teach the machine?

case:

Think carefully, in fact, we can adapt to a change in the font color of a book page from very bright to very dark, so we can accurately grasp which part should be recognized and which part does not need to be recognized. If we let the machine also have this ability, it will also slow down. Slowly learn to recognize text in "dark" places
Think about it again, the most important point of binarization is the threshold, so what can we do to make the threshold not a global fixed value, but to make some adjustments according to different parts of the image?

The moving average method, this formula learned in junior high school or high school comes in handy. First, we divide the image into N small parts:
insert image description here
based on each small grid of the initial division, do a simple division again, and get the image in the upper left corner of the figure below. Shown:
insert image description here
In this way, we assume that we can obtain the four areas in the upper left corner, and the average gray value of M pixels in each area is u1, u2, u3, u4, then the red area represented by these four areas The average gray value is (uppercase) U1 = (u1+u2+u3+u4) / 4 to get a relatively accurate average gray value threshold of a small grid. According to this method, we will push back a few grids

⚠️ Note that many examples on the Internet use pixels, but in actual projects, an image may be hundreds of megabytes, several gigabytes or even more, and the efficiency of using pixels is very low, so the example here uses the average of subdivided areas Value, in fact, the core idea is the same, the 2 2 grid
used in the example here , the reality can also choose 5 5, 6 6, 9 9 according to the actual situation...

insert image description here

At this time, a global floating threshold is continuously updated according to the formula of the moving average method: UT
UT = (U1+U2+U3+U4+U5) / 5

This threshold can "recognize" the text in the dark very well, why? Because the average threshold luminance of U1, U2, and U3 is very high, but when "sweeping" through the U4 area, it has been "averaged", that is, the threshold luminance of the distinction has been "lowered". In U6 and U7 areas, it can also quickly "adapt" to changes in light, and accurately "read" the target text for binary recognition. Question: What if the transition
area of ​​U4 cannot lower the threshold instantly?
Easy to handle

Method 1: Segment the image a little more finely, so that the speed of brushing will be slower, and the threshold will have a sufficient transition area to reduce

insert image description here

Method 2: Calculate the percentage value of the threshold value of the previous area, that is, only take a few percent of the blackness of the last time, which is equivalent to "I will learn from/refer to the experience of my predecessor, but I have special circumstances in my area. Will do it 100% according to other people's threshold experience", this percentage value can be found through multiple attempts to find the one with the best segmentation effect

For example:
set the gray value of the pixels in area i as [p1, p2, p3...pi], the average threshold of the area is: Ui t is the percentage of the average gray threshold of the
s pixels before the pi area,
then the current area My threshold is set to:
Ui+ = Ui / ( 100 - t /100)

With the above foundation, how do we use this ever-changing threshold to binarize our target image?

There are many examples on the Internet that are black background/dark background, white image/light color image; here the case uses black characters on a white background, and the target image is black characters, so when doing binarization, judge: set the pixels in the i
area The gray value of the point is [p1, p2, p3...pi], the average threshold of the area is: Ui, t is the percentage of the average gray threshold of the s pixels before the pi area Ui+ = Ui / ( 100 - t /
100 )
px < Ui+ ?
Threshold/100 will get a smaller gray value segmentation value, this 100 can be adjusted, not necessarily hard-coded, if the pixel is smaller than the smaller segmentation, that is, it is also close to black , then directly output black
, if yes, output 0 (represent black)
if not, output 255 (represent white)

The above is my summary of the adaptive threshold method. The pictures are all my own P. If there are any mistakes, please correct them and let's exchange and learn together.

Guess you like

Origin blog.csdn.net/whiteBearClimb/article/details/123872499