Este es el primer día de mi participación en el Desafío de actualización de noviembre. Para obtener detalles del evento, consulte: Desafío de última actualización de 2021

Reconoce números con OpenCV y Python

Este artículo demuestra cómo reconocer números en una imagen usando OpenCV y Python.

En la primera parte de este tutorial, analizaremos qué es una pantalla de siete segmentos y cómo podemos aplicar operaciones de visión por computadora y procesamiento de imágenes para reconocer este tipo de dígitos (¡no se requiere aprendizaje automático!)

Pantalla de siete segmentos

Probablemente ya esté familiarizado con las pantallas de siete segmentos, incluso si no reconoce la terminología específica. Un buen ejemplo de una pantalla de este tipo es su despertador digital clásico:

imagen-20211108135630394 Cada número de la alarma está representado por un componente de siete segmentos de la siguiente manera:

imagen-20211108140852713

La pantalla de siete segmentos puede presentar un total de 128 estados posibles:

imagen-20211108140904381

Solo nos interesan 10 de ellos, los números del 0 al 9:

imagen-20211108140913319

Nuestro objetivo es escribir código OpenCV y Python para reconocer cada uno de estos diez estados digitales en una imagen.

Diseño de un reconocedor de dígitos OpenCV

Usaremos la imagen del termostato como entrada:

imagen-20211108140921872

Pasos para identificar:

Paso 1: Coloque la pantalla LCD en el termostato. Esto se puede hacer mediante la detección de bordes, ya que hay suficiente contraste entre la carcasa de plástico y la pantalla LCD.

Paso 2: extraiga la pantalla LCD. Dado un mapa de borde de entrada, puedo encontrar los contornos y buscar los contornos de los rectángulos: el área rectangular más grande debe corresponder a la pantalla LCD. La transformación de perspectiva me dará una buena extracción de LCD.

Paso 3: Extraiga el área digital. Una vez que tengo la pantalla LCD, puedo concentrarme en extraer los números. Dado que parece haber un contraste entre el área digital y el fondo de la pantalla LCD, creo que el umbral y la manipulación morfológica pueden lograrlo.

Paso 4: Identifica los números. Reconocer los dígitos reales usando OpenCV implicará dividir el ROI del dígito en siete partes. A partir de ahí, puedo aplicar el recuento de píxeles en la imagen con umbral para determinar si un segmento determinado está "activado" o "desactivado".

Entonces, para ver cómo completamos este proceso de cuatro pasos para el reconocimiento de dígitos usando OpenCV y Python, siga leyendo.

Reconocer números usando visión artificial y OpenCV

Sigamos adelante y empecemos con el ejemplo. Cree un nuevo archivo, llámelo identifique_dígitos.py e inserte el siguiente código:

# import the necessary packages
from imutils.perspective import four_point_transform
from imutils import contours
import imutils
import cv2
# define the dictionary of digit segments so we can identify
# each digit on the thermostat
DIGITS_LOOKUP = {
    (1, 1, 1, 0, 1, 1, 1): 0,
    (0, 0, 1, 0, 0, 1, 0): 1,
    (1, 0, 1, 1, 1, 1, 0): 2,
    (1, 0, 1, 1, 0, 1, 1): 3,
    (0, 1, 1, 1, 0, 1, 0): 4,
    (1, 1, 0, 1, 0, 1, 1): 5,
    (1, 1, 0, 1, 1, 1, 1): 6,
    (1, 0, 1, 0, 0, 1, 0): 7,
    (1, 1, 1, 1, 1, 1, 1): 8,
    (1, 1, 1, 1, 0, 1, 1): 9
}
复制代码

导入我们所需的 Python 包。引入mutils，这是我的一系列便利函数，可以更轻松地使用 OpenCV + Python。如果您还没有安装 imutils，现在应该花一点时间使用 pip 在您的系统上安装该软件包：使用 OpenCV 和 Python 识别数字

pip install imutils
复制代码

定义一个名为 DIGITS_LOOKUP 的 Python 字典。他们对表的关键是七段数组。数组中的 1 表示给定的段已打开，零表示该段已关闭。该值是实际数字本身：0-9。

一旦我们识别了恒温器显示器中的段，我们就可以将数组传递到我们的 DIGITS_LOOKUP 表中并获得数字值。作为参考，该词典使用与上面图 2 中相同的段顺序。让我们继续我们的例子：

# load the example image
image = cv2.imread("example.jpg")
# pre-process the image by resizing it, converting it to
# graycale, blurring it, and computing an edge map
image = imutils.resize(image, height=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 200, 255)
复制代码

加载我们的图像。

然后我们通过以下方式预处理图像：-

调整大小。
将图像转换为灰度。
使用 5×5 内核应用高斯模糊以减少高频噪声。
通过 Canny 边缘检测器计算边缘图。

应用这些预处理步骤后，我们的边缘图如下所示：

imagen-20211108140933252 注意 LCD 的轮廓是如何清晰可见的——这完成了步骤 #1。我们现在可以继续第 2 步，提取 LCD 本身：

# find contours in the edge map, then sort them by their
# size in descending order
cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
displayCnt = None
# loop over the contours
for c in cnts:
    # approximate the contour
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    # if the contour has four vertices, then we have found
    # the thermostat display
    if len(approx) == 4:
        displayCnt = approx
        break
复制代码

为了找到 LCD 区域，我们需要提取边缘图中区域的轮廓（即轮廓）。

然后我们按面积对等高线进行排序，确保将面积较大的等高线放在列表的前面。

给定我们排序的轮廓列表，逐个循环它们并应用轮廓近似。

如果我们的近似轮廓有四个顶点，那么我们假设我们已经找到了恒温器显示。这是一个合理的假设，因为我们输入图像中最大的矩形区域应该是 LCD 本身。

获得四个顶点后，我们可以通过四点透视变换提取 LCD：

# extract the thermostat display, apply a perspective transform
# to it
warped = four_point_transform(gray, displayCnt.reshape(4, 2))
output = four_point_transform(image, displayCnt.reshape(4, 2))
复制代码

应用这种透视变换为我们提供了一个自上而下的 LCD 鸟瞰图：

imagen-20211108140948248

获得 LCD 的这个视图满足第 2 步——我们现在准备从 LCD 中提取数字：

# threshold the warped image, then apply a series of morphological
# operations to cleanup the thresholded image
thresh = cv2.threshold(warped, 0, 255,
    cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (1, 5))
thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
复制代码

为了获得数字本身，我们需要对扭曲图像进行阈值处理，以在较亮的背景（即 LCD 显示屏的背景）中显示暗区（即数字）：

imagen-20211108140956034 然后我们应用一系列形态学操作来清理阈值图像：

imagen-20211108141005262

现在我们有一个很好的分割图像，我们再次需要应用轮廓过滤，只是这次我们正在寻找实际的数字：

# find contours in the thresholded image, then initialize the
# digit contours lists
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
digitCnts = []
# loop over the digit area candidates
for c in cnts:
    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)
    # if the contour is sufficiently large, it must be a digit
    if w >= 15 and (h >= 30 and h <= 40):
        digitCnts.append(c)
复制代码

为此，我们在阈值图像中找到轮廓。初始化digitsCnts 列表——这个列表将存储数字本身的轮廓。

在每个轮廓上循环。

对于每个轮廓，我们计算边界框，确保宽度和高度是可接受的大小，如果是，则更新digitsCnts 列表。

如果我们循环遍历digitsCnts内部的轮廓并在图像上绘制边界框，结果将如下所示：

imagen-20211108141017066

果然，我们在液晶显示屏上找到了数字！最后一步是实际识别每个数字：

# sort the contours from left-to-right, then initialize the
# actual digits themselves
digitCnts = contours.sort_contours(digitCnts,
    method="left-to-right")[0]
digits = []
复制代码

在这里，我们只是根据 (x, y) 坐标从左到右对数字轮廓进行排序。

这个排序步骤是必要的，因为不能保证轮廓已经从左到右排序（与我们读取数字的方向相同）。

接下来是实际的数字识别过程：

# loop over each of the digits
for c in digitCnts:
    # extract the digit ROI
    (x, y, w, h) = cv2.boundingRect(c)
    roi = thresh[y:y + h, x:x + w]
    # compute the width and height of each of the 7 segments
    # we are going to examine
    (roiH, roiW) = roi.shape
    (dW, dH) = (int(roiW * 0.25), int(roiH * 0.15))
    dHC = int(roiH * 0.05)
    # define the set of 7 segments
    segments = [
        ((0, 0), (w, dH)),  # top
        ((0, 0), (dW, h // 2)), # top-left
        ((w - dW, 0), (w, h // 2)), # top-right
        ((0, (h // 2) - dHC) , (w, (h // 2) + dHC)), # center
        ((0, h // 2), (dW, h)), # bottom-left
        ((w - dW, h // 2), (w, h)), # bottom-right
        ((0, h - dH), (w, h))   # bottom
    ]
    on = [0] * len(segments)
复制代码

遍历每个数字轮廓。对于这些区域中的每一个，我们计算边界框并提取数字 ROI。

我在下面包含了每个数字 ROI 的 GIF 动画：

Figura 11: Extraer el ROI de cada dígito individual calculando el cuadro delimitador y aplicando el corte de matriz NumPy.

给定数字 ROI，我们现在需要定位和提取数字显示的七个部分。

根据 ROI 尺寸计算每个段的大致宽度和高度。然后我们定义一个 (x, y) 坐标列表，这些坐标对应七个线段。此列表遵循与上面图 2 相同的段顺序。这是一个示例 GIF 动画，它在正在调查的当前片段上绘制一个绿色框：

Figure 12: An example of drawing the segment ROI for each of the seven segments of the digit.

最后，初始化我们的 on 列表——该列表中的值 1 表示给定的段是“打开”的，而值为零表示该段是“关闭的”。给定七个显示段的 (x, y) 坐标，识别一个段是打开还是关闭是相当容易的：最后，初始化我们的 on 列表——该列表中的值 1 表示给定的段是“打开”的，而值为零表示该段是“关闭的”。给定七个显示段的 (x, y) 坐标，识别一个段是打开还是关闭是相当容易的：

# loop over the segments
    for (i, ((xA, yA), (xB, yB))) in enumerate(segments):
        # extract the segment ROI, count the total number of
        # thresholded pixels in the segment, and then compute
        # the area of the segment
        segROI = roi[yA:yB, xA:xB]
        total = cv2.countNonZero(segROI)
        area = (xB - xA) * (yB - yA)
        # if the total number of non-zero pixels is greater than
        # 50% of the area, mark the segment as "on"
        if total / float(area) > 0.5:
            on[i]= 1
    # lookup the digit and draw it on the image
    digit = DIGITS_LOOKUP[tuple(on)]
    digits.append(digit)
    cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 1)
    cv2.putText(output, str(digit), (x - 10, y - 10),
        cv2.FONT_HERSHEY_SIMPLEX, 0.65, (0, 255, 0), 2)
复制代码

我们开始循环遍历每个线段的 (x, y) 坐标。

我们提取片段 ROI，然后计算非零像素数（即片段中“开启”的像素数）。

Si la proporción de píxeles distintos de cero al área total del segmento es superior al 50 %, entonces podemos suponer que el segmento está "activado" y actualizar nuestra lista en consecuencia. Después de recorrer siete segmentos, podemos pasar la lista a DIGITS_LOOKUP para obtener los números.

Luego dibujamos un cuadro delimitador alrededor de los números y mostramos los números en la imagen de salida. Finalmente, nuestro último bloque de código imprime el número en nuestra pantalla y muestra la imagen de salida:

# display the digits
print(u"{}{}.{} \u00b0C".format(*digits))
cv2.imshow("Input", image)
cv2.imshow("Output", output)
cv2.waitKey(0)
复制代码

Observe cómo reconocemos correctamente los números en la pantalla LCD usando Python y OpenCV:

Resumir

En la publicación de blog de hoy, demuestro cómo usar OpenCV y Python para identificar números en imágenes.

Este método es específicamente para pantallas de siete segmentos (es decir, las pantallas digitales que normalmente vería en relojes de alarma digitales).

Al extraer cada uno de los siete segmentos y aplicar operaciones morfológicas y de umbral básicas, podemos determinar qué segmentos están "encendidos" y cuáles están "apagados".

A partir de ahí, podemos buscar el segmento de encendido/apagado en la estructura de datos del diccionario de Python para determinar rápidamente el número real, ¡no se requiere aprendizaje automático!

Como mencioné al comienzo de esta publicación de blog, aplicar la visión por computadora para identificar números en las imágenes del termostato tiende a complicar demasiado el problema en sí mismo: el uso de termómetros de registro de datos es más confiable y requiere mucho menos trabajo.

¡Espero que hayas disfrutado la entrada del blog de hoy!