『OCR_recognition』chineseocr


1. Chineseocr recognition process

step 1: text direction detection

  • Contains: a four classification algorithm of VGG16 (0,90,180,270), small angle detection, estimate_skew_angle
  • This step can be skipped because the algorithm is robust within a certain angle range

step 2: Text detection

  1. Use yolo to detect the area text_proposals containing the text box;
    Note: The image input is fixed and compressed to 608*608, the width of the box is fixed to 8, and the height is 11~283, a total of 9 anchors. It does not matter if the box does not frame the text. The height of the main box is If the height overlap of the text exceeds a certain threshold, it can be defined as a positive sample
  2. text_proposals uses the text line construction algorithm to merge in the same
    line ; Note: The text line construction method is to search for 3 boxes on the left and right, and if the overlap of the height is greater than a certain threshold, connect them together. Of course, there are some detailed operations, you need to look at the code logic , In short, this step is to connect the proposals of the same line together
  3. Fit a straight line to the center of the box in the same row, and then use xmin, xmax plus the straight line parameters to obtain the four points of the box (with rotation properties), boxes;
  4. Sort boxes with y, which is equivalent to sorting the text lines from top to bottom;
  5. Input to CRNN one by one according to each row of boxes.

step 3: text recognition

  • Rotate the ROI region image according to the box. Before entering the CNN, the image height is compressed to 32, and the output size is (batch, w, 512), because the height is fixed to 32. After 5 times of downsampling in the CNN, the height of the feature map becomes 1. Next, the output of CNN needs to be input to BLSTM. The dimension of LSTM output is w, which means that there is a prediction for every 8 pixels of the original image, because the horizontal downsampling is only 3 times, which is equivalent to every proposal. There is a prediction, so there will definitely be repeated predictions, and CTC is used to de-duplicate to generate the true label sequence

Two, Darknet extract text_proposals

The author extracts text_proposals by improving the method of yolo

step 1: The first improvement is to limit the width to 8, and the rest is the same. There are a total of nine achors, three layers of output, and 3 anchors prediction for each layer, which is different from the fixed resize of the original input of yolo 416*416, which is fixed here 608*608, The classification is 2, and it is judged whether it is text or not.

# 文字检测引擎 
pwd = os.getcwd()
opencvFlag = 'keras' # keras,opencv,darknet,模型性能 keras>darknet>opencv
IMGSIZE = (608,608)  # yolo3 输入图像尺寸
# keras 版本anchors
keras_anchors = '8,11, 8,16, 8,23, 8,33, 8,48, 8,97, 8,139, 8,198, 8,283'
class_names = ['none','text',]
kerasTextModel=os.path.join(pwd, "models", "text.h5") 	# keras版本模型权重文件

############## darknet yolo  ##############
darknetRoot = os.path.join(os.path.curdir,"darknet") 	# yolo 安装目录
yoloCfg     = os.path.join(pwd,"models","text.cfg")
yoloWeights = os.path.join(pwd,"models","text.weights")
yoloData    = os.path.join(pwd,"models","text.data")

Step 2: Next, according to the total number of anchors and the post-manufacturing process, yolo will make predictions on the three output feature maps anchors_num/3=9/3, and predict anchors of 3 scales on each layer.

def yolo_text(num_classes,anchors,train=False):
    imgInput = Input(shape=(None,None,3))
    darknet = Model(imgInput, darknet_body(imgInput))
    num_anchors = len(anchors)//3
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))

    x = compose(DarknetConv2D_BN_Leaky(256, (1,1)),
    			UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[152].output])
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(DarknetConv2D_BN_Leaky(128, (1,1)),
    			UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[92].output])
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))
    
    out = [y1,y2,y3]
    if train:
        num_anchors = len(anchors)
        y_true = [Input(shape=(None, None,num_anchors//3, num_classes+5)) for l in range(3)]
        loss = Lambda(yolo_loss,output_shape=(4,),name='loss',
        			  arguments={
    
    'anchors': anchors,
        			   			 'num_classes': num_classes, 
        			   			 'ignore_thresh': 0.5,})(out+y_true)
        
        def get_loss(loss,index):
            return loss[index]
        
    
        lossName = ['class_loss','xy_loss','wh_loss','confidence_loss']   
        lossList = [Lambda(get_loss,output_shape=(1,),name=lossName[i],arguments={
    
    'index':i})(loss) for i in range(4)]
        textModel = Model([imgInput, *y_true], lossList)
        return textModel
        
    else:
        textModel = Model([imgInput],out)
        return textModel

step 3: Next, load the pre-trained model

textModel.load_weights('models/text.h5')  # 加载预训练模型权重

step 4: Next, read and enhance the data

trainLoad = data_generator(jpgPath[:num_train], anchors, num_classes,splitW)
testLoad  = data_generator(jpgPath[num_train:], anchors, num_classes,splitW)

Step 5: After getting the box with the rotating scale from the label, divide the boxes according to the line width of 8.

def get_box_spilt(boxes,im,sizeW,SizeH,splitW=8,isRoate=False,rorateDegree=0):
    """ isRoate:是否旋转box """
    size = sizeW,SizeH
    if isRoate:
        # 旋转box
        im,boxes = get_rorate(boxes,im,degree=rorateDegree)
    # 采用 padding 的方式不改变比例的压缩图像
    newIm,f  = letterbox_image(im, size)
    # 图像压缩后。boxes 也要做相应的压缩
    newBoxes = resize_box(boxes,f)
    #按照行 8 分割 box,一直分割覆盖包含最后
    newBoxes = sum(box_split(newBoxes,splitW),[])
    newBoxes = [box+[1] for box in newBoxes]
    return newBoxes,newIm
    
def box_split(boxes,splitW = 15):
    newBoxes = []
    for box in boxes:
        w = box['w']
        h = box['h']
        cx = box['cx']
        cy=box['cy']
        angle = box['angle']
        x1,y1,x2,y2,x3,y3,x4,y4 = xy_rotate_box(cx,cy,w,h,angle)
        splitBoxes =[]
        i = 1
        tanAngle = tan(-angle)
        
        while True:
            flag = 0 if i==1 else 1
            xmin = x1+(i-1)*splitW
            ymin = y1-tanAngle*splitW*i
            xmax = x1+i*splitW
            ymax = y4-(i-1)*tanAngle*splitW +flag*tanAngle*(x4-x1)
            if xmax>max(x2,x3) and xmin>max(x2,x3):
                break
            splitBoxes.append([int(xmin),int(ymin),int(xmax),int(ymax)])
            i+=1
        
        newBoxes.append(splitBoxes)
    return newBoxes

Step 6: Determine the optimizer

adam = tf.keras.optimizers.Adam(lr=0.0005)

step 7: loss setting

def yolo_loss(args, anchors, num_classes, ignore_thresh=.5):
	# 详细看代码
	pass

Guess you like

Origin blog.csdn.net/libo1004/article/details/111722863