How Baidu brain character recognition technology, the rapid integration of useful gadgets

I. General Overview

This paper describes the development by me, the main function of cloud-based OCR software Baidu AI cat, performance, evaluation and interpretation of the core code. Because it is merged into a few posts, so lengthy, I hope you can be patient and read, of course, something to suit everyone.

This article is divided into the following parts:
The first part is the cloud cat OCR Software. By the developer personally introduce the main features of the software. ABBYY OCR software such as benchmarking, cloud cat OCR function is more comprehensive, easier to use, the key is cloud cat OCR is now completely free for everyone to use. Of course, because it is a trial version, there may be some bug, when we do not make use of the press on the line. Cloud cat show version of the download address Posts:   https://ai.baidu.com/forum/topic/show/955975
The second part is a cloud-based cat OCR OCR Baidu specific implementation instructions. But also on the part of the core code of the software on display, help everyone to be more creative to make a reference product.
The third part is the cloud cat OCR for use and evaluation of results. But because the cloud cat OCR is developed around the end of 2017, Baidu is not so with the latest OCR interface functions, if the cloud cat to get everyone's support, I consider the development of the new access more Baidu latest AI interfaces, I hope you will cheer.
The last part of this paper is an appendix, attached my interpretation based on code developed Baidu OCR, using the latest Baidu handwriting recognition interfaces, also packaged together for your reference.

The first part of the cloud cat OCR Software

First, the cloud cat OCR Profile

   Baidu cloud cloud cat OCR OCR algorithm is based on software developed by the onslaught of a fox. The software developed by the C # language, running on the Windows platform. Interface call is common main character recognition, General character recognition (high precision) and table identification.

Second, the current implementation of cloud cat OCR main functions:

1. Batch Picture character recognition, you can preview the picture, you can wrap and indent the recognition result, you can control QPS concurrent (QPS function because timeout issues Baidu cloud temporarily shelved);

2. Batch table picture recognition, support automatically open recognition result, users can also choose to directly open the save directory;

3.PDF turn pictures in my notebook (I7 processor configured / 8G memory / 128G SSD hard drive) hardware environment, PDF transfer program modules picture memory occupied by no more than 400M, while time is about 2 minutes convert more than 500 content of PDF files. Support for a key to open the folder conversion results.

4. Cloud cat software supports skins, there are two sets of skin;

5 may be provided API Key and Secret Key;

6. Support identification stopped halfway;

7. Support the setting is changed after the re-recognition of same picture;

8. Support multiple languages;

9. Other functions, such as identifying statistical information, control the font size, the right to recognition results saved as rtf files, select all and copy the recognition results, and so on;

Third, the demo post link

http://ai.baidu.com/forum/topic/show/492371

Fourth, cloud cat OCR demo video link

https://v.qq.com/x/page/r0564n4a87e.html

I suggest that you use 1.2x or 1.5x speed watch, because I am a bit slow the speed of sound.

The second part of the cloud-based cat OCR OCR Baidu specific implementation instructions

I. Overview

   OCR is based on the Baidu cloud cat AI, in the Windows platform running a piece of software. I was developed in Visual Studio2017 integrated development environment with C # language, to develop ways to develop the SDK package. In development, we need to refer to Baidu technical documentation.

Baidu cloud character recognition technology Documents Address:

https://cloud.baidu.com/doc/OCR/index.html

Second, prepare for work

First, we need to download the latest Baidu character recognition SDK package.

C # SDK package download at the following address:

http://ai.baidu.com/sdk#ocr

Unzip the download is complete, the latest package in the folder net45 inside.

Open Visual Studio2017 development environment, select New Project, because I intend to use the console to explain the project, so to choose New Project --C # console project. After the completed project, you need to download the above referenced SDK package in the project.

Third, the core code to explain

(A) function call Baidu OCR text recognition picture, format returned by Json

code show as below:

using System;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Collections.Generic;

using System.Linq;

 

namespace myOCRDemo

{

    class Program

    {

        public static void GeneralBasicDemo()

        {

            // 设置APPID/AK/SK

            var API_KEY = "你的 Api Key";

            var SECRET_KEY = "你的 Secret Key";

            //创建对象

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

            client.Timeout = 60000;  // 修改超时时间

            var image = File.ReadAllBytes("图片文件路径");

            // 调用通用文字识别, 图片参数为本地图片,可能会抛出网络等异常,请使用try/catch捕获

            var result = client.GeneralBasic(image);

            Console.WriteLine(result);

        }

        static void Main(string[] args)

        {

            GeneralBasicDemo();

            Console.Read();

        }

    }

}

注意,具体开发的时候要把上面的API Key和Secret Key分别改为你自己的,至于怎么申请和查看这两个Key,可以参考我写的评测篇帖子。帖子链接如下:

http://ai.baidu.com/forum/topic/show/955989

另外不要忘了把图片文件路径改为你自己的图片路径。下面是识别的结果示例:

原图如下:

(二)解析Json格式,把识别结果转变为更为直观的文本类型

代码如下:

using System;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Collections.Generic;

using System.Linq;

 

namespace myOCRDemo

{

    class Program

    {

        public static void GeneralBasicDemo()

        {

            // 设置APPID/AK/SK

            var API_KEY = "你的Akey";

            var SECRET_KEY = "你的SKey";

            //创建对象

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

            client.Timeout = 60000;  // 修改超时时间

            var image = File.ReadAllBytes(@"你的图片路径");

            // 调用通用文字识别, 图片参数为本地图片,可能会抛出网络等异常,请使用try/catch捕获

            var result = client.GeneralBasic(image);

            //解析Json的代码

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            int num = (int)jo["words_result_num"];

            string[] words = new string[num];

            for (int i = 0; i < num; i++)

                words[i] = jo["words_result"][i]["words"].ToString();

            //返回值

            string txtOCR = null;

            for (int i = 0; i < num; i++)

                txtOCR += words[i] + "\n";

            //显示结果

            Console.WriteLine(txtOCR);

        }

        static void Main(string[] args)

        {

            GeneralBasicDemo();

            Console.Read();

        }

    }

}

程序运行结果如下:

这样就比较符合人类的阅读习惯了,上面这段代码也是核心的基础代码,可以通过这些核心的代码去做一些优化,比如自动换行、自动缩进、根据语言习惯自动改变标点符号等等。

(三)表格识别

百度的表格文字识别的编程较为麻烦,主要分成两步:第一步是提交表格文字识别请求,获得requestId;第二步是根据requestId获取表格文字识别的结果,默认是Excel文件格式,Json结果会返回一段下载地址。

我的程序除了上面这两步以外,还添加了自动下载Excel文件到本地电脑的代码,供各位参考。另外要注意的是,提交识别请求和获得识别结果这两步之间,程序必须设置延时,否则不能获得下载的URL 。经过实际测试,延时为3秒以上较为合适,3秒以下可能会出错。

代码如下:

/// 

 

        /// 表格文字识别

        /// 

 

        public static void myTableRecognitionRequestDemo()

        {

            // 设置APPID/AK/SK

            var API_KEY = "你的API Key";

            var SECRET_KEY = "你的Secret Key";

            //创建对象

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

            client.Timeout = 60000;  // 修改超时时间

            var image = File.ReadAllBytes(@"F:\表格图片1.jpg");//这里要改成你的表格图片路径

            // 调用表格文字识别,可能会抛出网络等异常,请使用try/catch捕获

            var result = client.TableRecognitionRequest(image);

            //解析Json

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            string requestId = jo["result"][0]["request_id"].ToString();

            Console.WriteLine("获得requestId:"+requestId);

            //延时3秒,这句是必须的

            System.Threading.Thread.Sleep(3000);

            //获取表格识别结果

            //有时会得不到链接,需要多尝试几次

            var resultExcel = client.TableRecognitionGetResult(requestId);

            Console.WriteLine("获得的表格识别结果如下:");

            Console.WriteLine(resultExcel);

            //解析Json,获得链接

            JObject joResult = (JObject)JsonConvert.DeserializeObject(resultExcel.ToString());

            string excelURL = joResult["result"]["result_data"].ToString();

            Console.WriteLine("获得的Excel文件下载地址是:\n" + excelURL);

            //自动下载Excel文件到电脑

            WebClient df = new WebClient();

            df.DownloadFile(excelURL, @"F:\识别结果.xls");//这里要改成你的下载文件路径

            Console.WriteLine("下载完毕");

        }

作者使用的测试用图片:

表格文字识别结果截图:

尾记:本文的示例代码都是最新的代码,跟百度SDK文档里面的代码是一致的,而云猫OCR是2017年末就已经写好的了,代码有些陈旧,所以没直接贴源代码了。

代码篇的原帖子地址:

http://ai.baidu.com/forum/topic/show/956037

第三部分  云猫OCR的使用说明及效果评测

一、概述

   笔者是在2017年接触百度云服务平台的,在这里我也称之为百度AI 。笔者根据百度AI提供的函数接口,自行编程实现了一款OCR软件——云猫OCR。云猫OCR大部分的代码开发是在2017年底前完成的,之所以雪藏到现在,是因为笔者的一些私人事务(小孩出生等)——我是利用业余时间进行软件开发的,所以中断了大概一年多的时间,现在才有空继续这个项目。

评测篇的原帖子地址:

http://ai.baidu.com/forum/topic/show/955989

二、评测的具体内容

(一)准备工作

   在使用云猫OCR之前,我们必须先去百度云官网进行注册账号,有了账号以后,我们还要去具体的云服务项目下申请API Key和Secret Key ,一般这两个Key是用户各人保管的,不能随便透露给外人。因为百度云现在已经正式收费,而用户每人每天的免费调用次数都是有限的,提高限额需要支付费用,用户使用百度云AI接口的依据主要就是这两个Key,所以我们要保管好。下面是简单的准备工作图片说明:

(二)正式使用云猫OCR

用户有了百度云API Key和Secret Key之后,就可以正式使用云猫OCR了。具体使用步骤如下:

(三)评测的具体内容

   首先介绍一下云猫OCR调用的百度AI的主要接口,首先是通用文字识别(带位置版),其次是通用文字识别(带位置高精度版),最后是表格文字识别,下面依次介绍这三种识别。

1. 通用文字识别(带位置版)和通用文字识别(带位置高精度版)的混合使用

如上图所示,用户可以选择多种语言(包括德语、法语、西班牙语等等),选择好后点击文字识别即可。因为百度云提供的高精度文字识别接口只支持中英文,而通用的文字识别支持除中英文以外的多种语言,所以笔者在编写软件中,这两种接口是混合使用的,具体怎么混合使用请看代码篇。一般情况下,高精度的文字识别效果比通用的好,但也比较耗时。

本软件支持识别的文字结果在本机保存为文件,具体如下图操作:

保存的文件是rtf格式,可以用WPS或者Office Word打开。下面再给出一次性识别20张图片的统计结果图示:

从上图可以看出,百度云的文字识别结果速度还是不错的,识别速度是平均大概2-3秒一张图。

2. 表格文字识别

表格文字识别的主要步骤如下图所示:

识别的结果软件会自动保存为Excel文件并打开,如图:

从上图可以看出,表格文字识别的速度比普通文字识别要慢一些,大概需要5-6秒。

评测总结:百度OCR对于印刷体的识别还是不错的,比起以前的OCR软件来说,百度OCR可以说是革命性的进步。当然,它也有自己的短板。比如手写体的识别,笔者还没有评测,但百度云通用文字高精度接口对手写体的识别是较差的。再比如QPS并发,我的理解是可以提高OCR文字识别的速度,对于大量的图片文字识别来说尤其是重要,可以节省大量时间。但遗憾的是,百度云对并发好像做的不太好,程序不一定支持QPS并发,这个缺点我们也是希望百度后面能够有所改正。

 

附录:

C#编程实现手写识别

一、概述

    本人是用C#编程,调用百度API接口实现手写体识别的,参考了百度的产品文档。

文档地址:https://cloud.baidu.com/doc/OCR/index.html

二、代码及解说

本人的源代码大部分是来自百度的产品文档,但其中也遇到了一些麻烦。比如文字识别的编码问题,百度的代码给出的编码是Default,但在我的机器上这样做会显示乱码。经过查找资料,我把编码改成UTF8,乱码的问题才得到解决。

作者的所有源代码如下:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Net.Http;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Web;

using System.Net;

 

namespace myHandwrite

{

    public static class FileUtils

    {

        /// 

 

        /// 转base64编码

        /// 

 

        /// 

        /// 

        public static String getFileBase64(String fileName)

        {

            FileStream filestream = new FileStream(fileName, FileMode.Open);

            byte[] arr = new byte[filestream.Length];

            filestream.Read(arr, 0, (int)filestream.Length);

            string baser64 = Convert.ToBase64String(arr);

            filestream.Close();

            return baser64;

        }

    }

    class Program

    {

        // 调用getAccessToken()获取的 access_token建议根据expires_in 时间 设置缓存

        // 返回token示例

        public static String TOKEN = "24.adda70c11b9786206253ddb70affdc46.2592000.1493524354.282335-1234567";

 

        // 百度云中开通对应服务应用的 API Key 建议开通应用的时候多选服务

        private static String clientId = "这里改成你的API Key";

        // 百度云中开通对应服务应用的 Secret Key

        private static String clientSecret = "这里改成你的Secret Key";

        /// 

 

        /// 获取token的函数

        /// 

 

        /// 

        public static String getAccessToken()

        {

            String authHost = "https://aip.baidubce.com/oauth/2.0/token";

            HttpClient client = new HttpClient();

            List> paraList = new List>();

            paraList.Add(new KeyValuePair("grant_type", "client_credentials"));

            paraList.Add(new KeyValuePair("client_id", clientId));

            paraList.Add(new KeyValuePair("client_secret", clientSecret));

 

            HttpResponseMessage response = client.PostAsync(authHost, new FormUrlEncodedContent(paraList)).Result;

            String result = response.Content.ReadAsStringAsync().Result;

            //Console.WriteLine(result);

            //自己加的代码

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            string myToken = jo["access_token"].ToString();

            Console.WriteLine("获得的Token是:" + myToken);

            return myToken;

        }

        

        /// 

 

        /// 手写体文字识别

        /// 

 

        /// 

        /// 

        /// 

        public static string myHandwriting(string token,string filename)

        {

            //string token = "#####调用鉴权接口获取的token#####";

            // 图片的base64编码

            string strbaser64 = FileUtils.getFileBase64(filename); 

            string host = "https://aip.baidubce.com/rest/2.0/ocr/v1/handwriting?access_token=" + token;

            Encoding encoding = Encoding.Default;

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);

            request.Method = "post";

            request.ContentType = "application/x-www-form-urlencoded";

            request.KeepAlive = true;

            //这里加上了一些参数

            String str = "recognize_granularity=big&image=" + HttpUtility.UrlEncode(strbaser64);

            byte[] buffer = encoding.GetBytes(str);

            request.ContentLength = buffer.Length;

            request.GetRequestStream().Write(buffer, 0, buffer.Length);

            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            //显示结果是乱码,尝试改变编码,经过测试需要改成UTF8编码

            StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);

            string result = reader.ReadToEnd();

            Console.WriteLine("手写文字识别:");

            //Console.WriteLine(result);

            //解析Json的代码

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            int num = (int)jo["words_result_num"];

            string[] words = new string[num];

            for (int i = 0; i < num; i++)

                words[i] = jo["words_result"][i]["words"].ToString();

            //返回值

            string txtOCR = null;

            for (int i = 0; i < num; i++)

                txtOCR += words[i] + "\n";

            //显示结果

            Console.WriteLine(txtOCR);

            return txtOCR;

        }

        static void Main(string[] args)

        {

            //这里要改成你的图片路径

            string filename = @"F:\手写体5.jpg";

            string token = getAccessToken();

            myHandwriting(token,filename);

            Console.Read();

        }

    }

}

注意,上面的代码中需要各位改成自己的Akey和Skey,另外要改一下图片路径。如果返回的是乱码,还需要改一下编码。

识别的结果如下:

程序用的图片文件如下:

作者kohakuarc

Guess you like

Origin www.cnblogs.com/AIBOOM/p/12020101.html