Tesseract-OCR图像文字识别技术安装配置(4.0.0-beta.3)及测试

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/dmt742055597/article/details/81181876
  • 安装依赖

yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel

  • 安装Leptonica

选择最新版(1.76.0)安装,下载地址:http://www.leptonica.org/download.html

直接在线下载:

cd /usr/local/src

wget http://www.leptonica.org/source/leptonica-1.76.0.tar.gz

解压:

tar-zxvf leptonica-1.76.0.tar.gz

安装:

cd leptonica-1.76.0

./configure

make

make install

ldconfig

  • 安装Tesseract-OCR

安装最新版(4.0.0-bate.3),下载地址:https://github.com/tesseract-ocr/tesseract/releases

直接在线下载:

wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.3.tar.gz

解压:

tar-zxvf tesseract-4.0.0-beta.3.tar.gz

安装:

cd tesseract-4.0.0-beta.3

./configure

提示错误:

Missing autoconf-archive. Check the build requirements

缺少autoconf-archive安装包,安装:

yum install autoconf-archive

执行:./autogen.sh

错误解决,执行:./configure

提示错误:

error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package

解决方案:

参考文档:https://blog.csdn.net/xjmxym/article/details/79040514

按照上述文档操作之后,执行:

./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/lib

make && make install

 

如果遇见如下问题:

./configure: line 4250: syntax error near unexpected token `-mavx,'

./configure: line 4250: `AX_CHECK_COMPILE_FLAG(-mavx, avx=true, avx=false)'

解决办法:

参考文档:

https://github.com/tesseract-ocr/tesseract/issues/777#issuecomment-288116640

  • 查看tesseract支持语言

tesseract --list-langs

提示错误:

Error opening data file /usr/local/share/tessdata/eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

解决方案:

github下载全套tessdata_fast并上传至/usr/local/share/文件夹下,将tessdata_fast改名为tessdata,执行命令:

/usr/local/bin/tesseract /usr/local/apache/htdocs/uploads/images/test.jpg /usr/local/apache/htdocs/uploads/images/test -l chi_sim

或:

tesseract /usr/local/apache/htdocs/uploads/images/test.jpg /usr/local/apache/htdocs/uploads/images/test -l chi_sim

都可以生成test.txt文件;

提示如下内容可忽略:

Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica

Warning. Invalid resolution 0 dpi. Using 70 instead.

Estimating resolution as 251

  • 总结

一路安装配置下来,最难的不是安装配置,而是出现问题如何解决,去哪里寻找解决问题的答案,找不到答案的时候该怎么办。

第4步确实困扰了我一天,百度、谷歌、github的issue都没能找到解决方案,就在我决定放弃的时候,我想试最后一把。

把GitHub下载的整个tessdata_fast文件夹替换掉/usr/local/share目录下的tessdata,并改名为tessdata,结果竟然成功了,临表涕零啊。

 

   ************************只要细想不滑坡,办法总比困难多************************

猜你喜欢

转载自blog.csdn.net/dmt742055597/article/details/81181876