OpenCV—python OCR文本检测

本文链接： https://blog.csdn.net/wsp_1138886114/article/details/100135824

文章目录

一、形态学文本区域检测
二、MSER+NMS文本区域检测

2.1 MSER
2.2 NMS

一、形态学文本区域检测

图像形态学操作，包括膨胀、腐蚀基本操作，即可实现简单场景的文字检测。
其中，“膨胀”就是对图像中的高亮部分进行扩张，让白色区域变多；“腐蚀”就是图像中的高亮部分被蚕食，让黑色区域变多。通过膨胀、腐蚀的一系列操作，可将文字区域的轮廓突出，并消除掉一些边框线条，再通过查找轮廓的方法计算出文字区域的位置出来。
关于形态学详情请查看
主要的步骤如下：

读取图片，并转为灰度图
图片二值化，或先降噪后再二值化，以便简化处理
膨胀、腐蚀操作，突出轮廓、消除边框线条
查找轮廓，去除不符合文字特点的边框
返回文字检测的边框结果

示例：在以下示例中，我选取了绿通道，因为该图片是绿色的，该通道文字信息含量较高。

import cv2
import numpy as np


def match_loc(model_imgpath,test_imgpath):
    model_img = cv2.imread(model_imgpath)
    test_img = cv2.imread(test_imgpath)

    b_model, g_model, r_model = cv2.split(model_img)
    b_test, g_test, r_test = cv2.split(test_img)
    w, h = g_model.shape[::-1]

    result = cv2.matchTemplate(g_test, g_model, cv2.TM_CCOEFF_NORMED)
    (min_val, score, min_loc, max_loc) = cv2.minMaxLoc(result)
    bottom_right = (max_loc[0] + w, max_loc[1] + h)
    print('max_loc',max_loc)
    kuang = cv2.rectangle(test_img, max_loc, bottom_right, 255, 2)
    cv2.imshow('test_img',kuang)
    cv2.waitKey(0)
    return test_img,max_loc, g_test


def Morph_exam(test_img,g_test):
    sobel = cv2.Sobel(g_test, cv2.CV_8U, 1, 0, ksize=3)
    ret, binary = cv2.threshold(sobel, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)
    cv2.imshow('binary',binary)
    cv2.waitKey(0)

    # 形态核：膨胀让轮廓突出--- 腐蚀去掉细节--再膨胀，让轮廓更明显
    element1 = cv2.getStructuringElement(cv2.MORPH_RECT, (30, 9))
    element2 = cv2.getStructuringElement(cv2.MORPH_RECT, (24, 6))

    dilation = cv2.dilate(binary, element2, iterations=1)
    erosion = cv2.erode(dilation, element1, iterations=1)
    dilation2 = cv2.dilate(erosion, element2, iterations=2)
    cv2.imshow('dilation2',dilation2)
    cv2.waitKey(0)

    # 查找轮廓和筛选文字区域
    region = []
    _,contours, hierarchy = cv2.findContours(dilation2, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    for i in range(len(contours)):
        cnt = contours[i]
        area = cv2.contourArea(cnt)
        if (area < 800):
            continue
        rect = cv2.minAreaRect(cnt)
        print("rect is: ",rect)

        # 获取box四点坐标, 根据文字特征，筛选可能是文本区域的矩形。
        box = cv2.boxPoints(rect)
        box = np.int0(box)
        height = abs(box[0][1] - box[2][1])
        width = abs(box[0][0] - box[2][0])
        if (height > width * 1.3):
            continue
        region.append(box)

    # 绘制轮廓
    for box in region:
        cv2.drawContours(test_img, [box], 0, (0, 255, 0), 2)
    cv2.imshow('img', test_img)
    cv2.waitKey(0)
    return region

if __name__ == '__main__':
    model_imgpath = './HK/moban.png'
    test_imgpath = './HK/test.png'
    test_img,max_loc, g_test = match_loc(model_imgpath, test_imgpath)
    Morph_exam(test_img,g_test)

在这里插入图片描述

二、MSER+NMS文本区域检测

2.1 MSER

MSER最大稳定极值区域（MSER-Maximally Stable Extremal Regions），该算法是2002提出的，主要是基于分水岭的思想来做图像中斑点的检测。

关于分水岭详情请点击查看
原理：MSER对一幅已经处理成灰度的图像做二值化处理，这个处理的阈值从0到255递增，这个阈值的递增类似于在一片土地上做水平面的上升，随着水平面上升，高高低低凹凸不平的土地区域就会不断被淹没，这就是分水岭算法，而这个高低不同，就是图像中灰度值的不同。而在一幅含有文字的图像上，有些区域（比如文字）由于颜色（灰度值）是一致的，因此在水平面（阈值）持续增长的一段时间内都不会被覆盖，直到阈值涨到文字本身的灰度值时才会被淹没，这些区域就叫做最大稳定极值区域。
关于【MSER的基本原理】请点击。

$q(i)=\frac{|Q_i - Q_{i-\Delta}|}{|Q_{i-\Delta}|}$
其中， $Q_i$ 表示阈值为 $i$ 时的某一连通区域， $\Delta$ 为灰度阈值的微小变化量， $q(i)$ 为阈值是 $i$ 时的区域 $Q_i$ 的变化率。
当 $q(i)$ 为局部极小值时，则 $Q_i$ 为最大稳定极值区域。

cv2.MSER_create()参数设置：: _delta 变化量q(i)
_min_area 修剪小于minarea的区域
_max_area 修剪大于maxArea的面积
_max_variation 修剪该区域的大小与其子区域相似
_min_diversity 对于彩色图像，追溯至截止MSER，其分集小于最小分集
_max_evolution 对于彩色图像，改进的步骤
_area_threshold 对于彩色图像，区域阈值导致重新初始化
_min_margin 对于彩色图像，忽略太小的边距
_edge_blur_size 边缘模糊的光圈大小

示例：mser = cv2.MSER_create(_delta=2, _min_area=200, _max_variation=0.7)
MSER官网： https://docs.opencv.org/3.4/d3/d28/classcv_1_1MSER.html
MSER具有以下特点：
1、对图像灰度具有仿射变换的不变性；
2、稳定性：具有相同阈值范围内所支持的区域才会被选择；
3、无需任何平滑处理就可以实现多尺度检测，即小的和大的结构都可以被检测到。

2.2 NMS

NMS是经常伴随图像区域检测的算法，作用是去除重复的区域，在人脸识别、物体检测等领域都经常使用，全称是非极大值抑制（non maximum suppression），顾名思义就是抑制不是极大值的元素，所以用在这里就是抑制不是最大框的框，也就是去除大框中包含的小框。
NMS的基本思想是遍历将所有的框得分排序，选中其中得分最高的框，然后遍历其余框找到和当前最高分的框的重叠面积（IOU）大于一定阈值的框，删除。然后继续这个过程，找另一个得分高的框，再删除IOU大于阈值的框，循环。
在这个例子中，就是设定一个IOU阈值（比如0.5，也就是如果两个框的重叠面积大于其中一个框的50%，那么就删除那个框），然后遍历所有框，对剩下的每个框，遍历判断其余框中与他重叠面积大于阈值的，则删除。最后剩下的就是不包含重叠部分的文本框了。

import cv2
import numpy as np


def match_loc(model_imgpath,test_imgpath):
    model_img = cv2.imread(model_imgpath)
    test_img = cv2.imread(test_imgpath)

    b_model, g_model, r_model = cv2.split(model_img)
    b_test, g_test, r_test = cv2.split(test_img)
    w, h = g_model.shape[::-1]

    result = cv2.matchTemplate(g_test, g_model, cv2.TM_CCOEFF_NORMED)
    (min_val, score, min_loc, max_loc) = cv2.minMaxLoc(result)
    bottom_right = (max_loc[0] + w, max_loc[1] + h)
    print('max_loc',max_loc)
    kuang = cv2.rectangle(test_img, max_loc, bottom_right, 255, 2)
    cv2.imshow('test_img',test_img)
    cv2.waitKey(0)
    return test_img,max_loc, g_test

def MSER(test_img,g_test):
    imgcopy = test_img.copy()
    mser = cv2.MSER_create()
    regions, _ = mser.detectRegions(g_test)  # 获取文本区域
    hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]  # 绘制文本区域
    # cv2.polylines(test_img, hulls, 1, (0, 255, 0))
    # cv2.imshow('temp_img', test_img)

    # 将不规则检测框处理成矩形框
    keep = []
    for c in hulls:
        x, y, w, h = cv2.boundingRect(c)
        # 筛选出可能是文本区域的方框，若去掉会出现很多小方框
        if (h > w * 1.3) or h < 25:
            continue
        keep.append([x, y, x + w, y + h])
        cv2.rectangle(imgcopy, (x, y), (x + w, y + h), (255, 255, 0), 1)
    cv2.imshow("imgcopy", imgcopy)
    cv2.waitKey(0)
    return keep

def NMS(boxes, overlapThresh):  # NMS 方法（Non Maximum Suppression，非极大值抑制）
    if len(boxes) == 0:
        return []
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    pick = []

    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = (x2 - x1 + 1) * (y2 - y1 + 1)

    # 按得分排序（如没有置信度得分，可按坐标从小到大排序，如右下角坐标）
    idxs = np.argsort(y2)

    # 开始遍历，并删除重复的框
    while len(idxs) > 0:
        # 将最右下方的框放入pick数组
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # 找剩下的其余框中最大坐标和最小坐标
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # 计算重叠面积占对应框的比例，即 IoU
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)
        overlap = (w * h) / area[idxs[:last]]

        # 如果 IoU 大于指定阈值，则删除
        idxs = np.delete(idxs, np.concatenate(([last], np.where(overlap > overlapThresh)[0])))
    return boxes[pick].astype("int")

def MSER_NMS(test_img,g_test):
    test_copy = test_img.copy()
    keep = MSER(test_img, g_test)
    keep2 = np.array(keep)
    pick = NMS(keep2, 0.5)
    print("[x] after applying non-maximum, %d bounding boxes" % (len(pick)))
    for (startX, startY, endX, endY) in pick:
        cv2.rectangle(test_copy, (startX, startY), (endX, endY), (255, 185, 120), 2)
    cv2.imshow("After NMS", test_copy)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == '__main__':
    model_imgpath = './HK/moban1.png'
    test_imgpath = './HK/moban.png'
    test_img,max_loc, g_test = match_loc(model_imgpath, test_imgpath)
    MSER_NMS(test_img, g_test)

在这里插入图片描述

特别鸣谢：
非极大值抑制： https://www.pyimagesearch.com/2014/11/17/non-maximum-suppression-object-detection-python/