文字背景有纹理情况下的预处理

很多时候纹理很影响识别效果,所以偶尔需要预处理一下,毕竟纹理的样本不好生成啊,样本不够训练的效果不好,只能这样了。

主要思路就是:去除表格----去纹理----聚类招文字----提取文字区域生成结果

    result = cv2.bilateralFilter(result, 5, 75, 75)
    #聚类提取浅色文字
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
    flags = cv2.KMEANS_RANDOM_CENTERS

    img = result.reshape((-1,1))
    img = np.float32(img)
    ompactness,labels,centers = cv2.kmeans(img,3,None,criteria,10, flags)

    thresh_num = (centers[2]+centers[1])/2
    #根据聚类结果进行阈值化
    _,new = cv2.threshold(result, thresh_num, 255, cv2.THRESH_TOZERO_INV)
    new = cv2.bitwise_not(new)

    _,new = cv2.threshold(new, thresh_num, 255, cv2.THRESH_BINARY)
    #腐蚀膨胀并提取roi
    kernel = np.ones((3,3),np.uint8)
    new_erode = cv2.erode(new,kernel,iterations = 1)
    new = cv2.bitwise_not(new_erode)
    roi = np.uint8(np.full(result.shape,255))
    backimage = cv2.bitwise_and(roi, result, mask=new)
    roi = np.uint8(np.full(result.shape,255))
    backimage = cv2.bitwise_or(new_erode, backimage)

以上不包含表格去除的步骤,只包含后三部分。识别率的提升还是很可观的~~

猜你喜欢

转载自blog.csdn.net/wi162yyxq/article/details/103701282