Python Image Processing on Captcha how to remove noise

Ahmet Aziz Beşli :

I am so new on Image Processing and what I'm trying to do is clearing the noise from captchas;

For captchas, I have different types of them:

enter image description here

enter image description here

enter image description here

For the first one what I did is :

First Step

enter image description here

Firstly, I converted every pixel that is not black to the black. Then, I found a pattern that is a noise from the image and deleted it. For the first captcha, it was easy to clear it and I found the text with tesseract.

But I am looking for a solution for the second and the third.

How this must go like? I mean what are the possible methods to clear it?

This is how I delete patterns:

def delete(searcher,h2,w2):
    h = h2
    w = w2
    search = searcher
    search = search.convert("RGBA")
    herear = np.asarray(search)
    bigar  = np.asarray(imgCropped)

    hereary, herearx = herear.shape[:2]
    bigary,  bigarx  = bigar.shape[:2]

    stopx = bigarx - herearx + 1
    stopy = bigary - hereary + 1

    pix = imgCropped.load()

    for x in range(0, stopx):
        for y in range(0, stopy):
            x2 = x + herearx
            y2 = y + hereary
            pic = bigar[y:y2, x:x2]
            test = (pic == herear)
            if test.all():
                for q in range(h):
                    for k in range(w):
                        pix[x+k,y+q] = (255,255,255,255) 

Sorry for the variable names, I was just testing function.

Thanks..

Ahmet Aziz Beşli :

Here is my solution,

enter image description here

Firstly I got the background pattern(Edited on paint by hand). From:

enter image description here

After that, I created a blank image to fill it with differences between the pattern and image.

img = Image.open("x.png").convert("RGBA")
pattern = Image.open("y.png").convert("RGBA")

pixels = img.load()
pixelsPattern = pattern.load()

new = Image.new("RGBA", (150, 50))
pixelNew = new.load()

for i in range(img.size[0]):
    for j in range(img.size[1]):
         if(pixels[i,j] != pixelsPattern[i,j]):
             pixelNew[i,j] = pixels[i,j]

new.save("differences.png")

Here are the differences..

enter image description here
and finally, I added blur and cleared the bits that is not black.

Result :

enter image description here

With pytesseract result is 2041, it is wrong for this image but the general rate is around %60.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=21736&siteId=1