Code to generate OCR training set, boss: no data? you new one

The setting sun was strong, even a little dazzling, and it penetrated the glass and illuminated the keyboard.

This is an ordinary plastic keyboard. The gaps are full of dust and dander. Under the illumination of strong light, it is more clearly visible, just like the field of view under a microscope.

After working for so many years, I have long understood that the keyboard is only a tool, and the only thing that affects the technical level is the demand put forward by the boss.

The cursor on the IDE was flashing there, it never moved a single space, but it never rested, just like my current thinking.

The morning scene kept repeating in my mind:

The boss said: Didn't you say that OCR technology is very mature now? Then we do it ourselves!

I said: the framework is very mature. But we don't have the data...

The boss asked: What data do you want?

My answer: If you want a machine to recognize "1", you have to train it with at least 500 pictures of "1".

The boss asked: What about training "2"?

My answer: 500 photos of "2", and they cannot be repeated, and repeated counts as 1 photo. Think about it, there are more than 3,000 commonly used Chinese characters, and we have no material! Do you want to buy some online...

The boss fell into deep thought, and suddenly his glasses flashed: Hey, if you let a programmer come out, you can even come out with a new wife.

I quickly explained: that is the object.

I just wanted to say that there is no one in hand, just an intern.

The boss's phone rang suddenly, and he covered the phone and said to me: I'm going on a business trip. In three days, when I come back, you must give my wife... no, give the data set to new!

WeChat picture_20220716004140.jpg

Day 1: Draw a black frame and enter a word

Early in the morning, I walked into the office with soy milk.

My office is not big, there are only two workstations, one for me and one for intern Xiao Wang. However, the sign at the door clearly reads the words "Industrial Park Software R&D Center". The boss said that it will expand to a technical team of 200 people in the future.

Xiao Wang is now in his senior year, and he is an intern here this year. The position is a research and development engineer. Xiao Wang works very seriously. He comes earlier than me every day. He should be a good seedling if he is trained well.

"Excuse me, is this the finance department?" an uncle asked with his head sticking out of the door.

"no!".

"No, no, why are you so loud! Your room is very similar to the finance department. There are two people in a cabin. You are the accountant and he is the cashier."

I walked to Xiao Wang, Xiao Wang was reading the Nuggets blog, and there were many great gods in it: TF boy, Chun brother, Lin Sanxin….

Xiao Wang calls me the boss because I am the person in charge of the "Industrial Park Software R&D Center". I am in charge of him and he is in charge of me.

跟你安排一个任务,很简单,用python先画一个32*32像素的黑色背景,然后在上面写上白色的字,你去写吧。

小王很快就写好了。

from PIL import Image
from PIL import ImageDraw

# 画出一个32*32的黑色框
img = Image.new("RGB", (32, 32), "black") 
# 在黑框里写上字
draw = ImageDraw.Draw(img)
draw.text((0,0), "2", (255, 255, 255))
# 保存画好的图片
img.save("2.png")

小王上午就来找我汇报工作。

我试过了在黑色背景图上写上字母、数字、符号,都是可以的。

untitled-1 copy.png

我连连点头,称赞小王很棒。

小王问,接下来要做什么。我说,明天再告诉你。

第二天:先加载字体,再绘制字符

第二天,小王问我,老大,今天什么任务啊,我昨天等了一下午。

我说,运行你昨天的代码,输出个汉字试试。

小王执行了下面的代码: draw.text((0,0), "汉", (255, 255, 255)),结果报错了:AttributeError: 'ImageFont' object has no attribute 'getmask2'

小王愣在一旁,我却微微一笑:

你没有加载支持汉字的字体,就直接绘制汉字是不行的。今天的任务就是:你百度解决汉字的绘制。

小王直到下午才来找我汇报工作。

han.png

他不但展示了效果,还跟我汇报了代码。

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont

img = Image.new("RGB", (320, 320), "black") 
draw = ImageDraw.Draw(img)
# 加载一种字体, 320是字体的大小,和黑框一样大
font = ImageFont.truetype("chinese_fonts/fangzheng_heiti.TTF", 320)
# 将字体作为参数传入
draw.text((0,0), "汉", (255, 255, 255),font)
img.save("汉.png")

跟昨天的相比,区别就是调用 ImageFont.truetype("字体文件路径", 字体大小)加载了字体文件,然后调用draw.text(……,font)的时候,把字体font传入,这样字体大小也可以控制了。

我连连点头,称赞小王很棒。并继续说,汉字的可以了,你再试试数字和符号。

小王又调用draw.text(",")、raw.text("2")draw了个“,”和“2”。

2.png

“什么感觉?看完这些图片,你什么感觉?”,我问小王,声音有些严厉。

“没什么感觉啊,这……这不挺好的!”,小王回答道。

我提高了音量,敲着屏幕:“不居中啊,大哥,逗号那么明显,你看不出来吗?”。

刚刚还沉浸在骄傲中的小王有些疑惑,但是他依然很镇定:这个好弄,draw.text((x,y),……)我改下x和y坐标就行了。

“今天能改完吗?”,我问他。

“肯定能改完!调个坐标就完事了”,小王很自信的样子。

“好”,我告诉小王:“你要记得一件事,不要针对某一个字符调坐标,多调几个,不管是出1,2,3,4,还是甲乙丙丁,都要居中,记住了吗?”。

下午,我回家时,小王说要加个班。

凌晨2点,我在家上厕所时,远程查看了一下公司的网络流量数据,不断有关于“python”、“字体居中”的搜索。

第三天:字体居中,添加椒盐

第三天一早,我在家吃过饭,又从楼下买了包子和豆浆。

一进办公室,我发现小王趴在办公桌上不动了。

我心里就是一惊,别再是怎么着了吧。昨天的任务对他来说可真是够难的,我连忙晃动他:小王,醒醒,醒醒,小王!

小王慢慢地睁开眼,打了个哈欠,伸了个懒腰:天亮了吗?

小王发现我在旁边,才反应过来:哦,这是在公司啊,我的“居中”功能还没有实现呢!

老大,为什么我怎么调都有问题,x=20,y=30,对这个字符可以,换别字符的就不行了呢?如何才能写出通用的代码啊?那得加多少ifelse if判断啊?

我说,你先把早饭吃了,完了我告诉你个知识点,你马上就能完成任务了。

integration.png

其实,字体制作时,有很多规则。图中红框表示的区域,代表这是字体的范围,就算内部是空白也是人家的领地。里面,还有一个字符区域,是具体显示的内容,比如逗号,就靠下,为了便于标记,字符在字体内有一个偏移量offset属性,表示它相对于字体区域的偏移情况。

所以,你要让一个字在背景里居中,他的坐标绝不是你肉眼看到的,而且是千变万化的,需要你结合字体的宽高以及偏移量来计算。 integration 2.png

小王很兴奋,有规则就好办了。

很快,他就写好了代码。

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont

width,height = 32,32 # 因为宽高多处使用,定义成变量
font_size = 32
char = "好" # 要绘制的字符

img = Image.new("RGB", (width, height), "black") 
draw = ImageDraw.Draw(img)
# 加载一种字体, 32是字体的大小,和黑框一样大
font = ImageFont.truetype("chinese_fonts/fangzheng_fangsong.ttf", font_size)
# 获取字体的宽高
font_width, font_height = draw.textsize(char, font)
offset_x, offset_y = font.getoffset(char)

# 计算字体绘制的x,y坐标,主要是让文字画在图标中心
x = (width - font_width - offset_x) // 2
y = (height - font_height - offset_y) // 2
# 将字体作为参数传入
draw.text((x,y), char, (255,255, 255),font)
img.save("好.png")

小王试了试,不管是数字、字母、符号还是汉字,确实都可以居中了。 2022-07-15_235428.png

其实,关键点就是通过draw.textsize(char, font)获取了字体的宽高,通过font.getoffset(char)获取了偏移量的信息。如果要将字体在一个背景中居中,其实就是背景的长度减去字体长度再减去偏移量长度,这是字符和背景的缝隙,缝隙除以2,那就是把缝隙分到两边了,它就居中了。

今天,小王虽然没有睡觉,但是他却很开心,我告诉他帮公司解决了一个大难题,让他早早地回去休息了。

今天晚上,老板应该出差回来了。

我拿出3天前就写好的代码,跑了起来。

from __future__ import print_function
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw
import os
import shutil
import time
import cv2

# 要生成的文本
label_dict = {0: '你', 1: '好', 2: '掘', 3: '金', 4: ':', 5: '1', 6: '+', 7: '2', 8: ',', 9: 'g', 10: 'o', 11: '!'}

# 文本对应的文件夹,给每一个分类建一个文件
for value,char in label_dict.items():
    train_images_dir = "dataset"+"/"+str(value)
    if os.path.isdir(train_images_dir):
        shutil.rmtree(train_images_dir)
    os.makedirs(train_images_dir)

def makeImage(label_dict, font_path, width=32, height=32, rotate = 0, salt = 22):

    # 从字典中取出键值对
    for value,char in label_dict.items():
        # 创建一个黑色背景的图片
        img = Image.new("RGB", (width, height), "black") 
        draw = ImageDraw.Draw(img)
        # 加载一种字体,字体大小是图片宽度的90%
        font = ImageFont.truetype(font_path, int(width*0.9))
        # 获取字体的宽高
        font_width, font_height = draw.textsize(char, font)
        offset_x, offset_y = font.getoffset(char)
        # 计算字体绘制的x,y坐标,主要是让文字画在图标中心
        x = (width - font_width - offset_x) // 2
        y = (height - font_height - offset_y) // 2
        # 绘制图片,在那里画,画啥,什么颜色,什么字体
        draw.text((x,y), char, (255, 255, 255), font)
        # 设置图片倾斜角度
        if rotate != 0:
            img = img.rotate(rotate)
        
        # 将数据转为np格式
        np_img = np.asarray(img.getdata(), dtype='uint8')
        # 降维,3通道转为1通道,并组成矩阵
        np_img = np_img[:, 0].reshape((height, width))
        for i in range(salt): #添加噪声
            temp_x = np.random.randint(0,np_img.shape[0])
            temp_y = np.random.randint(0,np_img.shape[1])
            np_img[temp_x][temp_y] = 255

        # 命名文件保存,命名规则:dataset/编号/img-编号_r-选择角度_时间戳.png
        time_value = int(round(time.time() * 1000))
        img_path = "dataset/{}/{}_{}.png".format(value, time_value, rotate)
        cv2.imwrite(img_path, np_img)
        
# 存放字体的路径
font_dir = "./chinese_fonts"
for font_name in os.listdir(font_dir):
    # 把每种字体都取出来,每种字体都生成一批图片
    path_font_file = os.path.join(font_dir, font_name)
    # 倾斜角度从-5到5度,每个角度都生成一批图片
    for k in range(-5, 5, 1):	
        # 每个字符都生成图片
        makeImage(label_dict, path_font_file, rotate = k, salt = 5-k)

This code not only generates text, but also adds interference items, such as moderately rotating the picture, such as adding random noise to the picture, we call it salt and pepper (pepper is black, salt is white, which means black and white noise) . Because, when the documents we identify are given, there will be unclear situations, so we have to train according to the interference, so that the effect is closer to the real situation.

2022-07-16_001029.png

Actually, the code to generate the character set is very simple, and I wrote it the afternoon I argued with my boss.

What keeps me struggling is whether to let Xiao Wang write this time. Let Xiao Wang write, I need to spend several times more energy, because I can finish writing the work that I told him. I remembered that Xiao Wang was already the 10th intern from the company, and the first few had learned a little and left.

In the end, I chose to cultivate Xiao Wang. Do your best, listen to destiny.

Now it seems that although it is only 3 days, Xiao Wang has made great progress.

Day 4: Crossover

I handed over the character set to the boss and said that it was developed by Xiao Wang. The boss was very happy and said that he would give Xiao Wang a salary increase of 200 yuan.

I was hesitating whether to tell Xiao Wang. Xiao Wang found me, he was a little embarrassed.

Actually, I also thought of it.

Xiao Wang said: Boss, I found a new job, and the other party recognized my ability very much, especially for my ability to automatically generate character sets, they also needed it very much, and the salary doubled for me... So...

"Okay, bless you! When are you going to leave?", I didn't have the slightest wave in my heart.

Xiao Wang said urgently: This afternoon... is it alright?

Can!

Looking at the station where Xiao Wang left, I took out a plan from the drawer, which was the script for training Xiao Wang to complete the entire OCR recognition project.

I casually turned a page, and the knowledge point of this page is: Why do the training data sets of pictures mostly use black backgrounds and white fonts?

I smiled: that's because the color value of black is 0, and the color value of white is 255. The computer ignores 0 and pays more attention to 255 of white. If it's black on white, then requiring the computer to focus on 0 makes it a pain.

There are also how to train, how to tune, how to deploy and so on.

Anyway, he has a good place to go, and I can be considered a businessman.

Crossed, all crossed myself.

微信图片_20220716004134.jpg

I am participating in the recruitment of the creator signing program of the Nuggets Technology Community, click the link to register and submit .

Guess you like

Origin juejin.im/post/7120640342298198024