mac上文字识别(Tesseract-OCR for mac )

0.介绍

Tesseract是一个开源的OCR引擎，能识别100多种语言（中，英，韩，日，德，法...等等），但是Tesseract对手写的识别能力较差。

1.安装

//安装tesseract的同时安装所有语言，语言包比较大，如果安装的话时间较长，建议不安装，按需选择
brew install  --all-languages tesseract 
//安装tesseract，并安装训练工具和语言
brew install --all-languages --with-training-tools tesseract 
//只安装tesseract，不安装训练工具
brew install  tesseract

参考文档:http://khalsa.guru/posts/16

2.下载语言库

下载地址:https://github.com/tesseract-ocr/tessdata

根据自己的需求选择所要的语言库，在这里我们选择的是简体中文所以选择的库是：chi_sim.traineddata
将文件拷贝到到：/usr/local/Cellar/tesseract/3.04.01_2/share/tessdata目录下。

库名-语言表如下

库名	语言
afr	Afrikaans(南非荷兰语)
amh	Amharic(阿姆哈拉语)
ara	Arabic(阿拉伯语)
asm	Assamese(阿萨姆)
aze	Azerbaijani(阿塞拜疆)
aze_cyrl	Azerbaijani - Cyrilic(阿塞拜疆-Cyrilic)
bel	Belarusian(白俄罗斯)
ben	Bengali(孟加拉)
bod	Tibetan(西藏)
bos	Bosnian(波斯尼亚)
bul	Bulgarian(保加利亚语)
cat	Catalan; Valencian(加泰罗尼亚语; 巴伦西亚)
ceb	Cebuano(宿务)
ces	Czech(捷克)
chi_sim	Chinese - Simplified(中国-简体)
chi_tra	Chinese - Traditional(中国-繁体)
chr	Cherokee(切诺基)
cym	Welsh(威尔士)
dan	Danish(丹麦)
dan_frak	Danish - Fraktur(丹麦-Fraktur)
deu	German(德国)
deu_frak	German - Fraktur(德国-Fraktur)
dzo	Dzongkha(不丹文)
ell	Greek, Modern （1453-）(希腊，现代（1453-）)
eng	English(英语)
enm	English, Middle (1100-1500)(英语，中东（1100-1500）)
epo	Esperanto(世界语)
equ	Math / equation detection module(数学/方程式检测模块)
est	Estonian(爱沙尼亚)
eus	Basque(巴斯克)
fas	Persian(波斯)
fin	Finnish(芬兰)
fra	French(法语)
frk	Frankish(法兰克)
frm	French, Middle (ca.1400-1600)(法国，中东（ca.1400-1600）)
gle	Irish(爱尔兰)
glg	Galician(加利西亚)
grc	Greek, Ancient (to 1453)(希腊语，古（到1453年）)
guj	Gujarati(古吉拉特语)
hat	Haitian; Haitian Creole(海天; 海地克里奥尔语)
heb	Hebrew(希伯来语)
hin	Hindi(印地文)
hrv	Croatian(克罗地亚)
hun	Hungarian(匈牙利)
iku	Inuktitut(因纽特语)
ind	Indonesian(印尼)
isl	Icelandic(冰岛)
ita	Italian(意大利语)
ita_old	Italian - Old(意大利语-旧)
jav	Javanese(爪哇)
jpn	Japanese(日本)
kan	Kannada(卡纳达语)
kat	Georgian(格鲁吉亚)
kat_old	Georgian - Old(格鲁吉亚-旧)
kaz	Kazakh(哈萨克斯坦)
khm	Central Khmer(中央高棉)
kir	Kirghiz; Kyrgyz(柯尔克孜; 吉尔吉斯)
kor	Korean(韩国)
kur	Kurdish(库尔德人)
lao	Lao(老挝)
lat	Latin(拉丁)
lav	Latvian(拉脱维亚)
lit	Lithuanian(立陶宛)
mal	Malayalam(马拉雅拉姆语)
mar	Marathi(马拉)
mkd	Macedonian(马其顿)
mlt	Maltese(马耳他)
msa	Malay(马来文)
mya	Burmese(缅甸)
nep	Nepali(尼泊尔)
nld	Dutch; Flemish(荷兰; 佛兰芒语)
nor	Norwegian(挪威)
ori	Oriya(奥里亚语)
osd	Orientation and script detection module(定位及脚本检测模块)
pan	Panjabi; Punjabi(旁遮普语; 旁遮普语)
pol	Polish(波兰)
por	Portuguese(葡萄牙语)
pus	Pushto; Pashto(普什图语; 普什图语)
ron	Romanian; Moldavian; Moldovan(罗马尼亚; 摩尔多瓦; 摩尔多瓦)
rus	Russian(俄罗斯)
san	Sanskrit(梵文)
sin	Sinhala; Sinhalese(僧伽罗语; 僧伽罗语)
slk	Slovak(斯洛伐克)
slk_frak	Slovak - Fraktur(斯洛伐克- Fraktur)
slv	Slovenian(斯洛文尼亚)
spa	Spanish; Castilian(西班牙语; 卡斯蒂利亚)
spa_old	Spanish; Castilian - Old(西班牙语; 卡斯蒂利亚-老)
sqi	Albanian(阿尔巴尼亚)
srp	Serbian(塞尔维亚)
srp_latn	Serbian - Latin(塞尔维亚语-拉丁语)
swa	Swahili(斯瓦希里语)
swe	Swedish(瑞典)
syr	Syriac(叙利亚)
tam	Tamil(泰米尔)
tel	Telugu(泰卢固语)
tgk	Tajik(塔吉克斯坦)
tgl	Tagalog(菲律宾语)
tha	Thai(泰国)
tir	Tigrinya(提格雷语)
tur	Turkish(土耳其)
uig	Uighur; Uyghur(维吾尔族; 维吾尔)
ukr	Ukrainian(乌克兰)
urd	Urdu(乌尔都语)
uzb	Uzbek(乌兹别克斯坦)
uzb_cyrl	Uzbek - Cyrilic(乌兹别克斯坦- Cyrilic)
vie	Vietnamese(越南语)
yid	Yiddish(意第绪语)

3.Tesseract使用

终端输入命令:tesseract --help


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      Usage:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        tesseract 
      
      --help | --help-psm | --version
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        tesseract 
      
      --list-langs [--tessdata-dir PATH]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        tesseract 
      
      --print-parameters [options...] [configfile...]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        tesseract imagename|stdin outputbase|stdout [options...] [configfile...]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      OCR options:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --tessdata-dir PATH   Specify the location of tessdata path.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --user-words PATH     Specify the location of user words file.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --user-patterns PATH  Specify the location of user patterns file.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        -l LANG[+LANG]        Specify language(s) used for OCR.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        -c VAR=VALUE          
      
      Set 
      
      value 
      
      for config variables.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
                              Multiple -c arguments 
      
      are allowed.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        -psm 
      
      NUM              Specify page segmentation mode.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      NOTE: These options must occur 
      
      before 
      
      any configfile.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Page segmentation modes:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      0    Orientation 
      
      and script detection (OSD) only.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      1    
      
      Automatic page segmentation 
      
      with OSD.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      2    
      
      Automatic page segmentation, but 
      
      no OSD, 
      
      or OCR.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      3    Fully 
      
      automatic page segmentation, but 
      
      no OSD. (
      
      Default)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      4    Assume a single 
      
      column 
      
      of 
      
      text 
      
      of 
      
      variable sizes.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      5    Assume a single 
      
      uniform 
      
      block 
      
      of vertically aligned text.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      6    Assume a single 
      
      uniform 
      
      block 
      
      of text.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      7    
      
      Treat the image 
      
      as a single 
      
      text line.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      8    
      
      Treat the image 
      
      as a single word.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      9    
      
      Treat the image 
      
      as a single word 
      
      in a circle.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      10    
      
      Treat the image 
      
      as a single character.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Single options:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        -h, 
      
      --help            Show this help message.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --help-psm            Show page segmentation modes.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
        -v, 
      
      --version         Show version information.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --list-langs          List available languages for tesseract engine.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      --print-parameters    Print tesseract parameters to stdout.

一般使用:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //默认使用eng文字库， imgName是图片的地址，result识别结果
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract imgName result

指定语言:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //指定使用简体中文
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract -l chi_sim imgName result
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      //查看本地存在的语言库
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract --
      
      list-langs

指定多语言:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //指定多语言，用+号相连
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract -l chi_sim+eng imgName result

有个地方需要特别注意，参数psm


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //输入命令，查看psm的参数
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract --help-psm
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      0    Orientation 
      
      and script detection (OSD) only.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      1    Automatic page segmentation 
      
      with OSD.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      2    Automatic page segmentation, but no OSD, 
      
      or OCR.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      3    Fully automatic page segmentation, but no OSD. (
      
      Default)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      4    Assume a single column 
      
      of text 
      
      of variable sizes.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      5    Assume a single uniform block 
      
      of vertically aligned text.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      6    Assume a single uniform block 
      
      of text.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      7    Treat the image 
      
      as a single text line.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      8    Treat the image 
      
      as a single word.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
       
      
      9    Treat the image 
      
      as a single word 
      
      in a circle.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      10    Treat the image 
      
      as a single character.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
       翻译（可能不是很准,最好看原文）：
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      0 定向脚本监测（OSD）
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      1 使用OSD自动分页
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      2 自动分页，但是不使用OSD或OCR（Optical Character Recognition，光学字符识别）
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      3 全自动分页，但是没有使用OSD（默认）
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      4 假设可变大小的一个文本列。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      5 假设垂直对齐文本的单个统一块。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      6 假设一个统一的文本块。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      7 将图像视为单个文本行。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      8 将图像视为单个词。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      9 将图像视为圆中的单个词。
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
      
      10 将图像视为单个字符。

根据情况选择不同的psm值，这很重要，如果选择到不恰当的值会导致识别失败。
例如:

1234.png

使用命令:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //不设置psm值的命令
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract 
      
      1234.png 
      
      1234 -l chi_sim
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      打印:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Tesseract Open Source OCR Engine v3.
      
      04.01 
      
      with Leptonica
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Info 
      
      in fopenReadFromMemory: work-around: writing 
      
      to a temp 
      
      file
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Empty page!!
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Empty page!!
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      //不设置psm值的命令
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract 
      
      1234.png 
      
      1234 -l chi_sim -psm 
      
      6
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      成功识别:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      一二三四
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      一二三四

4.语言训练

提前准备:
1.training tools。（在安装tesseract时候运行brew install --with-training-tools tesseract这句命令会同时安装training tools）
2.jTessBoxEditor工具。
3.训练素材

在这里准备的素材如下:

hui.png

yi.png

执行命令:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract hui.png hui -l chi_sim -psm 10
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      识别结果：瞧
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract yi.png yi -l chi_sim -psm 10
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      识别结果：=

显然自带chi_sim库对隳易这两个字的识别不是很好。为了识别这两个字，我们要对这两个字进行训练。

1.素材合成，(多个素材合成)
打开jTessBoxEditor工具，菜单栏：tools->Merge TIFF...，选中要合成的图片并保存为为：huiyi.fitt。

2.生成box文件


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //命令
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract  huiyi.tif huiyi -l chi_sim -psm 
      
      10 batch.nochop makebox

执行后会在生成一个名为huiyi.box的box文件。

用文本编辑器或者xcode打开:


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      瞧 31 37 112 119 0
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      = 51 86 93 106 1

修改为：


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      隳 31 37 112 119 0
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      易 51 86 93 106 1

保存文件。

3.生成.tr文件


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //命令
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      tesseract huiyi.tif huiyi -psm 
      
      10 nobatch box.train

4.生成unicharset文件


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      //命令
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      unicharset_extractor huiyi.box

注意unicharset_extractor命令是training tools里面的集成命令，如果运行时说没有找到该命令则说明你没有安装training tools。

5.创建font_properties文件
字体特征文件，Tesseract-OCR 3.01 及以上版本在训练之前都要创建font_properties文件。文件格式内容格式如下：


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      fontname italic bold 
      
      fixed serif fraktur
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      //翻译
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      字体名字 倾斜 加粗 固定宽度 衬线体 哥特字体

除了字体之外其他的值都是bool值，0或1

在这里font_properties的内容是：

font 0 0 0 0 0

执行命令：

echo 'font 0 0 0 0 0' > font_properties

5.training

执行命令：

shapeclustering -F font_properties -U unicharset huiyi.tr

会生成：shapetable文件，重命名为huiyi.shapetable

执行命令：

mftraining -F font_properties -U unicharset -O huiyi.unicharset huiyi.tr

会生成：huiyi.unicharset、inttemp，pffmtable文件，将inttemp，pffmtable重命名为：huiyi.inttemp，huiyi.pffmtable

执行命令：

cntraining huiyi.tr

会生成：normproto文件，重命名为huiyi.normproto

6.得到traineddata文件

执行命令：


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      combine_tessdata huiyi.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      //打印
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Combining tessdata files
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      TessdataManager combined tesseract data files.
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      0 (huiyi.config                ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      1 (huiyi.unicharset            ) 
      
      is 
      
      140
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      2 (huiyi.unicharambigs         ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      3 (huiyi.inttemp               ) 
      
      is 
      
      406
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      4 (huiyi.pffmtable             ) 
      
      is 
      
      118222
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      5 (huiyi.normproto             ) 
      
      is 
      
      118282
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      6 (huiyi.punc-dawg             ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      7 (huiyi.word-dawg             ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      8 (huiyi.number-dawg           ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type  
      
      9 (huiyi.freq-dawg             ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      10 (huiyi.fixed-length-dawgs    ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      11 (huiyi.cube-unicharset       ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      12 (huiyi.cube-word-dawg        ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      13 (huiyi.shapetable            ) 
      
      is 
      
      118708
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      14 (huiyi.bigram-dawg           ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      15 (huiyi.unambig-dawg          ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Offset 
      
      for 
      
      type 
      
      16 (huiyi.params-model          ) 
      
      is -
      
      1
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Output huiyi.traineddata created successfully.

将huiyi.traineddata移动到/usr/local/Cellar/tesseract/3.04.01_2/share/tessdata/目录下

执行命令：

cp huiyi.traineddata /usr/local/Cellar/tesseract/3.04.01_2/share/tessdata/

7.验证
执行命令:

成功识别。

结语：好久没写了，这篇是之前就写好的，一直没发，刚过完年诸事繁忙，一直没时间写。在新的一年祝各位同仁前程似景。最近看下有没有时间将Tesseract迁移到iOS上，之前试过效果并不好，主要是识别速度偏慢，而且还没有一个很好的灰度算法用来处理图片。

作者：隳易
链接：http://www.jianshu.com/p/016e55c25521
來源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

转自：https://blog.csdn.net/u010670689/article/details/78374623/

lucca

发布了7 篇原创文章 · 获赞 19 · 访问量 10万+

私信关注