Python编程:通过百度文字识别提取表格数据

版权声明:本文为博主原创文章,欢迎转载,请注明出处 https://blog.csdn.net/mouday/article/details/85060902

百度文字识别文档:
https://ai.baidu.com/docs#/OCR-Python-SDK/top

安装sdk

pip install baidu-aip

先创建应用,得到appid

要识别的表格图片:
在这里插入图片描述

代码示例

from aip import AipOcr

""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)

with open("names.png", "rb") as f:
    image = f.read()

result = client.basicGeneral(image)
print(result)

识别结果:

{
    "log_id":3213553909522465362,
    "words_result_num":20,
    "words_result":[
        {
            "words":"表格1:"
        },
        {
            "words":"姓名"
        },
        {
            "words":"年龄"
        },
        {
            "words":"性别"
        },
        {
            "words":"李雷"
        },
        {
            "words":"20男"
        },
        {
            "words":"韩梅梅"
        },
        {
            "words":"23女"
        },
        {
            "words":"赵小三"
        },
        {
            "words":"25女"
        },
        {
            "words":"Table2."
        },
        {
            "words":"Name"
        },
        {
            "words":"ge"
        },
        {
            "words":"Gender"
        },
        {
            "words":"Tom"
        },
        {
            "words":"30 Male"
        },
        {
            "words":"Jack"
        },
        {
            "words":"33 Male"
        },
        {
            "words":"one"
        },
        {
            "words":"31Female"
        }
    ]
}

结果不太满意,年龄和性别被合在一起了

猜你喜欢

转载自blog.csdn.net/mouday/article/details/85060902