Wellner adaptive threshold binarization algorithm

Reference document: Adaptive Thresholding for the DigitalDesk.pdf

            Adaptive Thresholding Using the Integral Image.pdf 


1. The origin of the problem

      A reality: When a camera is used to take a black-on-white paper, the image obtained by the camera is not a true black-and-white image. No matter what angle it was taken from, the image is actually grayscale or color. Unless the lighting is carefully set up, the image of the paper placed on the table by the camera is not representative of the original. Unlike inside a scanner or printer, it is very difficult to control the light source on the table surface. This open space may be affected by table lamps, pendant lights, windows, moving shadows, etc. The human visual system can automatically compensate for these, but the machine does not take these factors into account and the results will be poor.

     This problem is especially acute when dealing with that kind of high-contrast line art or text, because those things are true black or white. The camera, on the other hand, produces a grayscale image with different levels. Many applications must clearly know which part of the image is pure black or pure white in order to pass the text to OCR software for recognition. These systems cannot use grayscale images (typically 8 bits per pixel), so they must be converted to black and white. There are many ways to do this. In some cases, these images use some dithering techniques to make them look more like grayscale images if they end up being viewed by people. But for machine processing, such as text recognition, selective copy operations, or multiple image synthesis, the system cannot use dithered images. The system only needs simple lines, text or relatively large blocks of black and white. The process of obtaining such a black and white image from a grayscale image is often referred to as thresholding.

     There are many ways to threshold an image, but the basic process is to examine each grayscale pixel and decide whether it is white or black. This paper describes different algorithms that have been developed to threshold an image, and then proposes a more suitable algorithm. This algorithm (here we call the fast adaptive thresholding method) may not be the most suitable. But he handles the problems we describe fairly well.

Second, the global threshold method

       In a way, thresholding is an extreme form of contrast enhancement, or it makes bright pixels brighter and dark pixels darker. The easiest (and most commonly used) method is to set the pixels in the image below a certain threshold to black, and the rest to white. Then the question is how to set this threshold. One possibility is to choose the middle of all possible values, so for an 8-bit deep image (ranging from 0 to 255), 128 would be chosen. This method works well for images with black pixels really below 128 and white above 128. But if the image is over or underexposed, the image may be completely white or completely black. So it is better to find the actual value range of the image instead of the possible value range. First find the maximum and minimum of all pixels in the image, then take the midpoint as the threshold. A better way to choose a threshold is to look not only at the actual extent of the image, but also at its distribution. For example, if you want the image to resemble a black line drawing, or the effect of text on white paper, then you want most of the pixels to be the background color and a few to be black. A histogram of a pair of pixels might look like Figure 1.

 

                                      

                                figure 1 

       In the image above, a large spike in background color can be found, along with a small spike in black ink. The entire curve may want to be shifted left or right depending on the surrounding light, but in any case, the ideal threshold is at the trough between the two peaks. That's great in theory, but how does he actually perform in practice?

        

                                   figure 2 

      Figure 2 and its histogram show that the whole technique works well. The smoothed histogram shows 2 potential peaks, and it is not difficult to calculate an approximate ideal threshold by fitting the histogram curve or simply taking the average between the two peaks. This is not a typical image because it has a lot of black and white pixels. The algorithm must also threshold images like Figure 3. In the histogram of this image, the smaller black peaks have been buried in the noise, so it is unlikely to reliably determine a minimum between the peaks.

           

                                                                          image 3

       In any case, a large (background) peak is always present and easy to find, so a useful thresholding strategy can be described as follows:

  1) Calculate the histogram.

     2) Smooth the histogram data according to a certain radius, and calculate the maximum value of the smoothed data. The purpose of smoothing is to reduce the effect of noise on the maximum value, as shown in Figures 2 and 3.

     3) According to the distance between the above peak value and the minimum value (excluding the item of 0 in the histogram), the threshold value is selected according to a certain proportion.

      Experiments have shown that half this distance can produce reasonably good results on a wide range of images, from very bright to almost completely dark images. For example, in Figure 3, the peak is at 215 and the minimum is 75, so a threshold of 145 can be used. Figure 4 shows four images captured under different lighting conditions and the effect of thresholding according to the above-mentioned histogram-based technique. Although the private server images have this wide illumination range (as can be seen from the histogram), the algorithm selects a more appropriate threshold, and the thresholded images are basically the same.

                      

                          

                      

                     

                                Figure 4

       This histogram-based global thresholding technique works well for parts of the image where the lighting conditions are uniform or those where the lighting does not vary much like those mentioned above. But he could not get satisfactory results under normal office lighting conditions. Because the same threshold is used for the entire image, some areas of the image become too white and others too dark. So most of the text becomes unreadable, as shown in Figure 5.

             

                                  Figure 5

       Producing better binarized images from unevenly illuminated paper images requires an adaptive thresholding algorithm. This technique changes the threshold based on the background brightness of each pixel. The following discussion is accompanied by Figure 5 to first show the effect of the new algorithm. This is a challenging test because the image has light sources around the edges and it has black text on a white background (the whole word PaperWorks, and white text ("XEROX") on a black background, and grey text on a white background ("The best way...") There are also different shades and a tiny horizontal black line under the word "PaperWorks".

Three adaptive thresholds

       An ideal adaptive thresholding algorithm should be able to produce images with uneven illumination as well as the global thresholding algorithm described above would produce uniformly illuminated images. To compensate for more or less illumination, the brightness of each pixel needs to be normalized before deciding whether a pixel is black or white. The question is how to decide the background brightness for each point. A simple way is to take a blank page before taking the required binary image. This blank page can be used as a reference image. For each pixel to be processed, the corresponding reference image pixel is subtracted from it before processing.

       This method works very well as long as the lighting conditions have not changed in any way between the reference image and the actual image to be processed, however, the lighting conditions can be affected by shadows from people, desk lamps, or other moving objects. If the room has windows, the lighting conditions will also change over time. One solution would be to photograph a blank page for reference at the same location, at the same time, but this is just as inconvenient if using a scanner

       Another way is to estimate the background brightness of each pixel by making some assumptions about what the image should actually look like. For example, we can assume that the image is mostly the background (aka white) and that black is only a small part of the image. Another assumption is that the background light changes relatively slowly. There are many algorithms that are feasible based on the above assumptions. Since there is no mathematical theory for adaptive thresholding, there is no standard or optimal way to implement it. Instead, some special methods are more used than others. Since these methods are special, it is useful to measure their performance. To this end, Haralick and Shapiro put forward the following suggestions: the area should be uniform with the grayscale; the interior of the area should be simple without too many small holes; the adjacent areas should have significantly different values; the edges of each part should also be simple , should not be uneven, and its space should be accurate.

       According to Pratt's theory, no quantitative performance indicators have been proposed for image binarization. It seems that the main way to evaluate the performance of an algorithm is to simply look at the result and judge whether it is good or not. For text images, there is a feasible quantification method: the images under different lighting conditions are processed by different binarization algorithms and sent to the OCR system, and then the OCR recognition results are compared with the original text. While this method may be useful, it cannot be used in the algorithm described below because it is impossible to give a criterion that looks good. For some interactive applications, such as copy-paste operations, the user must wait until the binary is processed. So another important indicator is speed. The following sections present the results that different adaptive thresholding algorithms have produced.

4. Adaptive Threshold Based on Wall Algorithm

       The description of the algorithm developed by RJ Wall to dynamically calculate the threshold according to the background brightness can be found in Castleman, K. Digital Image Processing. Prentice-Hall Signal Pro-cessing Series, 1979. The following description is basically in accordance with his paper. First, the image is divided into smaller blocks, and the histogram of each block is calculated separately. Based on the peaks of each histogram, its threshold is then calculated for each block. Then, the threshold value of each pixel is obtained by interpolation according to the threshold value of adjacent blocks. Figure 6 is the result of processing Figure 5 with this algorithm.

                                                        

                                       Image 6

     This image is divided into 9 blocks (3*3), and the threshold of each block is chosen to be 20% lower than the peak value. This result is better than the global threshold, but it is computationally intensive and slow. Another problem is that for some images, the local histogram may be tricked by a large number of black or white points, causing the threshold to not vary smoothly across the entire image, and the result may be very bad, see Figure 7.

            

                                Figure 7 

Five, fast adaptive threshold

       Most of the algorithms documented in the literature are more complex than Wall's algorithm and therefore require more running time. It is feasible to develop a simple and faster adaptive threshold algorithm, so let us introduce the relevant theory.

       The basic idea of ​​the algorithm is to traverse the image and calculate a moving average. If a pixel is significantly below this average, it is set to black, otherwise it is set to white. Only one traversal is enough, and it is easy to implement the algorithm in hardware. It is interesting to note the similarity between the algorithm below and IBM's 1968 hardware implementation.

      Suppose Pn is the pixel located at point n in the image. At the moment we assume that the image is a single line with all the lines connected in order. This results in some exceptions at the beginning of each line, but this exception is smaller than if each line starts at zero.

        

       Suppose fs (n) is the sum of the last s pixels at point n:

                 

       The final image T(n) is either 1 (black) or 0 (white) depending on whether it is darker than t percent of the average of its first s pixels.

        

       Using 1/8 the width of the image for s and a value of 15 for t seems to work well for different images. Figure 8 shows the results of scanning lines from left to right using this algorithm.

           

            Figure 8 Figure 9  

       Figure 9 is the result of processing from right to left using the same algorithm, note that in this image, the smaller text on the far left is incomplete. There are also more holes in the character PaperWorks. Likewise, the black edge on the far right is much narrower. This is mainly due to the background light source of the image being progressively darker from left to right.

       Another problem is how to start the algorithm, or how to calculate g(0). One possibility is to use s* p0 , but P0 is not a typical value due to the result of the edges. So another possibility is 127*s (based on the median of 8-bit images). In any case, both schemes affect only a small fraction of the values ​​of g. When calculating g s (n), the weight of g(0) is:

             

      So if s=10, then for any n>6, g(0) contributes less than 10 % of g10(n), and for n>22, g(0) contributes less than 1%. For s=100, the sharing of g(0) is less than 10% after 8 pixels and less than 1% after 68 pixels.

       It should be better if the mean is not calculated from one direction. Figure 12 shows the effect of calculating the mean using another method. This method replaces the average value in a certain direction by calculating the average value of the pixels on both sides of the symmetry of point n. At this point f(n) is defined as follows:

         

      Another alternative is to calculate the average alternately from left to right and right to left, as follows:

     

    

       This produces an effect that is not much different from the center average.

        A small modification that may produce better results for most images is to keep the average effect of the previous line (in the opposite direction of the current line), and then take the average of the current line and the average of the previous line. Average as the new average, i.e. using:

      

       This makes the calculation of the threshold take into account the information in the vertical direction, resulting in the result shown in the figure:

                 

       Note the effect of his segmentation of the characters. This is also one of the few algorithms that retains that horizontal line under PaperWorks.

   Some of the original texts are now unreasonable and have not been translated.

      From the above, the Wellner adaptive filtering threshold is actually a one-dimensional smoothing of the pixel with a specified radius, and then the original pixel and the smoothed value are compared to determine whether it is black or white. A large part of the article is devoted to the question of the orientation of those pixels that are sampled, whether they are completely on the left, completely on the right, symmetrical, or taking into account the effect of the previous row. However, in general, he only considers the effect of pixels in the row direction on smoothing. Afterwards, Derek Bradley and Gerhard Roth proposed in their paper Adaptive Thresholding Using the Integral Image a two-dimensional smoothed value of a rectangular area with a W*W template instead of one-dimensional weighted values. Thus, the directional problem of one-dimensional smoothness is discarded.

     Of course, for two-dimensional smoothing, they proposed an algorithm with 0(1) time complexity, which is actually very simple, that is, the accumulation table of the entire image is calculated first. Then loop again to obtain the accumulated value centered on a certain pixel through the addition and subtraction of the addition order.

    Here's a short answer code for the original Wellner's algorithm from left to right (I don't think it's good to combine all the rows into one, and consider scanning rows for invalid data if that's the case):

?
public static void WellneradaptiveThreshold1(FastBitmap bmp, int Radius = 5, int Threshold = 15)
{
     if (bmp == null ) throw new ArgumentNullException();
     if (bmp.Handle == IntPtr.Zero) throw new ArgumentNullException();
     if (bmp.IsGrayBitmap() == false ) throw new ArgumentException( "Binaryzation functions can only be applied to 8bpp graymode Image." );
     if (Radius < 0 || Radius > 255) throw new ArgumentOutOfRangeException();
     if (Threshold < 0 || Threshold > 100) throw new ArgumentOutOfRangeException();
     int Width, Height, Stride, X, Y;
     int Sum, InvertThreshold, XX, OldValue;
     byte * Pointer;
     Width = bmp.Width; Height = bmp.Height; Stride = bmp.Stride; Pointer = bmp.Pointer; InvertThreshold = 100 - Threshold;
     byte * Row = ( byte *)Marshal.AllocHGlobal(Width);
     for (Y = 0; Y < Height; Y++)
     {
         Pointer = bmp.Pointer + Stride * Y;
         Sum = *Pointer * Radius;
         Win32Api.CopyMemory(Row, Pointer, Width);
         for (X = 0; X < Width; X++)
         {
             XX = X - Radius;
             if (XX < 0) XX = 0;
             Sum += Row[X] - Row[XX];
             if (Row[X] * 100 * Radius < Sum * InvertThreshold)
                 Pointer[X] = 0;
             else
                 Pointer[X] = 255;
         }
     }
     Marshal.FreeHGlobal((IntPtr)Row);
}

  

  This is based on my own FastBitmap class. It is also very simple to change to GDI+ Bitmap class. Note that binary processing is generally only for grayscale images.

     The operation must first make a backup of the row data, because the pixel value will be changed during the calculation process.

     At the same time, the code of the two-dimensional Wellner algorithm is also given for your reference:

?
public static void WellneradaptiveThreshold2(FastBitmap bmp, int Radius = 5, int Threshold = 50)
  {
      if (bmp == null ) throw new ArgumentNullException();
      if (bmp.Handle == IntPtr.Zero) throw new ArgumentNullException();
      if (bmp.IsGrayBitmap() == false ) throw new ArgumentException( "Binaryzation functions can only be applied to 8bpp graymode Image." );
      if (Radius < 0 || Radius > 255) throw new ArgumentOutOfRangeException();
      if (Threshold < 0 || Threshold > 100) throw new ArgumentOutOfRangeException();
 
      int Width, Height, Stride, X, Y;
      int Sum, X1, X2, Y1, Y2, Y2Y1, InvertThreshold;
      byte * Pointer;
      Width = bmp.Width; Height = bmp.Height; Stride = bmp.Stride; Pointer = bmp.Pointer; InvertThreshold = 100 - Threshold;
      int * Integral = ( int *)Marshal.AllocHGlobal(Width * Height * 4);
      int * IndexOne, IndexTwo;
      for (Y = 0; Y < Height; Y++)
      {
          Sum = 0;
          Pointer = bmp.Pointer + Stride * Y;
          IndexOne = Integral + Width * Y;
          for (X = 0; X < Width; X++)
          {
              Sum += *Pointer;
              if (Y == 0)
                  *IndexOne = Sum;
              else
                  *IndexOne = *(IndexOne - Width) + Sum;
              IndexOne++;
              Pointer++;
          }
      }
 
      for (Y = 0; Y < Height; Y++)
      {
          Pointer = bmp.Pointer + Stride * Y;
          Y1 = Y - Radius; Y2 = Y + Radius;
          if (Y1 < 0) Y1 = 0;
          if (Y2 >= Height) Y2 = Height - 1;
          IndexOne = Integral + Y1 * Width;
          IndexTwo = Integral + Y2 * Width;
          Y2Y1 = (Y2 - Y1) * 100;
          for (X = 0; X < Width; X++)
          {
              X1 = X - Radius; X2 = X + Radius;
              if (X1 < 0) X1 = 0;
              if (X2 >= Width) X2 = Width - 1;
              Sum = *(IndexTwo + X2) - *(IndexOne + X2) - *(IndexTwo + X1) + *(IndexOne + X1);
              if (*Pointer * (X2 - X1) * Y2Y1 < Sum * InvertThreshold)
                  *Pointer = 0;
              else
                  *Pointer = 255;
              Pointer++;
          }
      }
      Marshal.FreeHGlobal((IntPtr)Integral);
  }

  

  Where if (*Pointer * (X2 - X1) * Y2Y1 < Sum * InvertThreshold) is to avoid time-consuming division operations and improve program speed.

      In fact, the above method of calculating the average is not the fastest, and there are codes that can achieve more than 2 times the speed. And the above algorithm also has a problem, that is, for a slightly larger image, the accumulation process will exceed the range that int can express, so the result will be incorrect. Of course, in C#, we can use the long type to save the result, but This has two consequences: First, the program occupies more memory. For most 32-bit operating systems today, the calculation speed of the 64-bit represented by long is much slower than that of 32-bit. Of course it's different for 64-bit.

      Regarding the effect of the stickers in the paper, it seems that the parameters of his algorithm cannot be achieved. I don't know if it is because our original pictures are obtained from the screenshots of the paper and the quality is degraded.

      In general, this kind of binary value based on local features is better than the one-size-fits-all effect of the global threshold in many cases. For Wellner, the size of the search radius has a great influence on the results, but also uncertain. For example, the beautiful picture of Lena:

           

                          原图                                                           大律法                       S=50,T=15的效果。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326945463&siteId=291194637