Health code color recognition and information extraction

background

    The epidemic has been going on for a long time, and I plan to make an application for color recognition and information extraction of health codes. This article uses opencv and PaddleOCR, Flask to complete

PaddleOCR

    PaddleOCR aims to create a rich, leading and practical OCR tool library to help developers train better models and implement their applications.

OpenCV

    OpenCV is a cross-platform computer vision and machine learning software library released under the Apache2.0 license (open source), which can run on Linux , Windows , Android and Mac OS operating systems. [1]   It is lightweight and efficient - it consists of a series of C functions and a small number of C++ classes, and provides interfaces in languages ​​such as Python, Ruby, and MATLAB, and implements many general algorithms in image processing and computer vision.

Flask

    Flask is a lightweight customizable framework, written in Python, which is more flexible, lightweight, secure and easy to use than other frameworks of the same type. It can be well developed in combination with the MVC pattern , and the developers can work together, and a small team can complete the realization of small and medium-sized websites or Web services with rich functions in a short time . In addition, Flask also has strong customization, users can add corresponding functions according to their own needs, and realize the enrichment and expansion of functions while keeping the core functions simple. Its powerful plug-in library allows users to realize personalized websites Customize and develop a powerful website.

WeChat QR code recognition

    Combining traditional computer vision and deep learning technologies, the WeChat scan code engine solves business pain points and technical difficulties such as multiple codes in one image, small codes in large images, and robust decoding. With only 3 lines of code, it is easy to have the scanning ability of WeChat.

import cv2

detector = cv2.wechat_qrcode_WeChatQRCode("detect.prototxt", "detect.caffemodel", "sr.prototxt", "sr.caffemodel")
img = cv2.imread("img.jpg")
res, points = detector.detectAndDecode(img)

print(res, points)
复制代码

    The range of the QR code can be obtained from the above code. Next, the predetermined color range is mainly used to generate the contour line to determine whether it exists.


# 检测颜色
def detect_color(image, color):
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)  # HSV
    inRange_hsv = cv2.inRange(hsv, color_dist[color]['Lower'], color_dist[color]['Upper'])
    contours = cv2.findContours(inRange_hsv.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2]
    if len(contours) > 0 and draw_color_area(image, contours) > 0:
        return True
    else:
        return False

# 标记颜色区域
def draw_color_area(image, contours):
    allarea, index = 0, -1
    for i in range(len(contours)):
        area = cv2.contourArea(contours[i])
        allarea = area + allarea
    return allarea

复制代码

text recognition

    Padleocr is mainly used in text recognition. After all the stupid methods currently used are recognized, regular expressions are used to match suitable text, mainly focusing on nucleic acid time and whether it is negative. In order to use offline, it is best to download the model file in advance. Initialization code:

ocr = PaddleOCR(rec_model_dir='./ocr/rec/ch/ch_PP-OCRv3_rec_infer',det_model_dir='./ocr/det/ch/ch_PP-OCRv3_det_infer', cls_model_dir='./ocr/cls/ch_ppocr_mobile_v2.0_cls_infer')
复制代码

Text recognition code:

def getText(img):
    res = ocr.ocr(img, det=True, cls=False)
    pattern = re.compile('[0-9]+')
    qgtime = '暂无数据'
    isYin = ''
    for i in res:
        #print(i)
        match= pattern.findall(i[1][0])
        if (i[1][0].find(u"小时")>-1 or i[1][0].find(u"天")>-1) and match:
            qgtime=i[1][0]
        if (i[1][0].find(u"阴")>-1 or i[1][0].find(u"阳")>-1) and i[1][0].find(u"性")>-1:
            isYin=i[1][0]
    return qgtime,isYin
复制代码

Upload file interface

    Users need to use flask to complete the identification operation by uploading image files. The upload interface needs to simply set the cross-domain (convenient for debugging) and routing interface, and at the same time, verify the file suffix name to ensure that the specified file is uploaded.

# 判断文件是否合法
def allowed_file(filename):
    return '.' in filename and filename.rsplit('.', 1)[1] in ALLOWED_EXTENSIONS

@app.route('/detect', methods=['POST'], strict_slashes=False)
@cross_origin(supports_credentials=True)
def dataDectect():
    #print(datetime.datetime.now())
    starttime = datetime.datetime.now()
    file_dir = os.path.join(basedir, app.config['UPLOAD_FOLDER'])  # 拼接成合法文件夹地址

    file_dir = app.config['UPLOAD_FOLDER']  # 拼接成合法文件夹地址
    if not os.path.exists(file_dir):
        os.makedirs(file_dir)  # 文件夹不存在就创建
    f = request.files['img']  # 从表单的file字段获取文件,myfile为该表单的name值
    if f and allowed_file(f.filename):  # 判断是否是允许上传的文件类型
        fname = f.filename
        ext = fname.rsplit('.', 1)[1]  # 获取文件后缀
        unix_time = int(time.time())
        new_filename = str(unix_time) + '.' + ext  # 修改文件名
        filePath = os.path.join(file_dir, new_filename)
        #print(datetime.datetime.now())
        f.save(filePath)  # 保存文件到upload目录

        #print(datetime.datetime.now())
        img = cv2.imread(filePath)
        codeName = webchatQrDetect(img)

        qrtime,isYin=getText(img)
        endtime = datetime.datetime.now()
        duringtime = endtime - starttime
        os.remove(filePath)
        #print(datetime.datetime.now())
        #print('the work use ', duringtime. microseconds/1000000)
        # print('the work end', datetime.datetime.now(), datetime.datetime.now())
        return jsonify({ "运行时间":str(round(duringtime. seconds,3))+'s',"msg": "上传成功",u"核酸时间": qrtime,u'状态':isYin, u"健康码": codeName})
    else:
        return jsonify({"msg": "上传失败"})
复制代码

Project effect

    The interface can be called through postman for testingimage.png

References:

baike.baidu.com/item/Flask/…
github.com/PaddlePaddl…
github.com/WeChatCV/op…
zhuanlan.zhihu.com/p/417226916
blog.yuanpei.me/posts/15096…
zhuanlan.zhihu.com/ p/401841723
github.com/PaddlePaddl…
ai.baidu.com/support/new… blog.csdn.net/Mrli0530/ar…
zhuanlan.zhihu.com/p/430174498
blog.csdn.net/Kukeoo/arti…
blog
. csdn.net/qq_36853469…
blog.csdn.net/juzicode00/…
zhuanlan.zhihu.com/p/348349456

Guess you like

Origin juejin.im/post/7142433372457926669