Android development: OCR through Tesseract third-party library

I. Introduction

        What is OCR ? OCR (Optical Character Recognition, Optical Character Recognition) means that an electronic device (such as a scanner or a digital camera) checks characters printed on paper, determines its shape by detecting dark and bright patterns, and then uses character recognition to translate the shape into a computer text process. Simply put, OCR is a technology that optically converts the text in a paper document into a black and white dot matrix image, and then converts the text in the image into a text format through recognition software for further processing by word processing software. Edit processing.

        What is Tesseract ? Tesseract was originally developed at Hewlett-Packard Laboratories Bristol UK and at Hewlett-Packard Co, Greeley Colorado USA between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google. Tesseract was originally developed between 1985 and 1994 at HP Laboratories in Bristol, UK, and at HP in Greeley, Colorado, USA, 1996 Some changes were made in 1998 for porting to Windows, and some c++ization in 1998. In 2005, Tesseract was open sourced by HP. From 2006 to November 2018, it was developed by Google. Simply put, Tesseract is the specific implementation of the "recognition software" mentioned above by OCR.

        The recognition object (input) of OCR is a picture, and the recognition result (output) is computer text. There are mainly two ways to obtain pictures on the Android mobile phone side, one is to select one from the photo album, and the other is to directly take pictures. Therefore, this article will implement the simplest OCR idea: first obtain a picture from the mobile phone, then input it into the Tesseract library, and finally output the recognition result through the library. Because it is just learning how to use the library, the blogger ignores other auxiliary functions, such as photo recognition.

2. Android implements OCR through Tesseract

1. Add the following dependencies to the build.gradle file of the Module

implementation 'com.rmtheis:tess-two:9.1.0'

2. Add the following permissions to the AndroidManifest.xml file

    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

3. Apply for permissions in MainActivity

        It is recommended to execute the following checkPermission method in the onCreate method

    // 检查应用所需的权限,如不满足则发出权限请求
    private void checkPermission() {
        if (ContextCompat.checkSelfPermission(getApplicationContext(),
                Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
            ActivityCompat.requestPermissions(MainActivity.this,
                    new String[]{Manifest.permission.READ_EXTERNAL_STORAGE}, 120);
        }
        if (ContextCompat.checkSelfPermission(getApplicationContext(),
                Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
            ActivityCompat.requestPermissions(MainActivity.this,
                    new String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE}, 121);
        }
    }

4. Read a piece of reading from assets

        Both for display and for identification

    // 从assets中读取一张Bitmap类型的图片
    private Bitmap getBitmapFromAssets(Context context, String filename) {
        Bitmap bitmap = null;
        AssetManager assetManager = context.getAssets();
        try {
            InputStream is = assetManager.open(filename);
            bitmap = BitmapFactory.decodeStream(is);
            is.close();
            Log.i(TAG, "图片读取成功。");
            Toast.makeText(getApplicationContext(), "图片读取成功。", Toast.LENGTH_SHORT).show();
        } catch (IOException e) {
            Log.i(TAG, "图片读取失败。");
            Toast.makeText(getApplicationContext(), "图片读取失败。", Toast.LENGTH_SHORT).show();
            e.printStackTrace();
        }
        return bitmap;
    }

5. Prepare for OCR

        Go to https://github.com/tesseract-ocr/ to download the .traineddata language pack, then put it in the project assets directory, and copy it to the Android file system through the following code

    // 为Tesserect复制(从assets中复制过去)所需的数据
    private void prepareTess() {
        try{
            // 先创建必须的目录
            File dir = getExternalFilesDir(TESS_DATA);
            if(!dir.exists()){
                if (!dir.mkdir()) {
                    Toast.makeText(getApplicationContext(), "目录" + dir.getPath() + "没有创建成功", Toast.LENGTH_SHORT).show();
                }
            }
            // 从assets中复制必须的数据
            String pathToDataFile = dir + "/" + DATA_FILENAME;
            if (!(new File(pathToDataFile)).exists()) {
                InputStream in = getAssets().open(DATA_FILENAME);
                OutputStream out = new FileOutputStream(pathToDataFile);
                byte[] buff = new byte[1024];
                int len;
                while ((len = in.read(buff)) > 0) {
                    out.write(buff, 0, len);
                }
                in.close();
                out.close();
            }
        } catch (Exception e) {
            Log.e(TAG, e.getMessage());
        }
    }

6. Call the following method after clicking the button to perform OCR recognition

    // OCR识别的主程序
    private void mainProgram() {
        // 从assets中获取一张Bitmap图片
        Bitmap bitmap = getBitmapFromAssets(MainActivity.this, TARGET_FILENAME);
        // 同时显示在界面
        main_iv_image.setImageBitmap(bitmap);
        if (bitmap != null) {
            // 准备工作:创建路径和Tesserect的数据
            prepareTess();
            // 初始化Tesserect
            TessBaseAPI tessBaseAPI = new TessBaseAPI();
            String dataPath = getExternalFilesDir("/").getPath() + "/";
            tessBaseAPI.init(dataPath, "eng");
            // 识别并显示结果
            String result = getOCRResult(tessBaseAPI, bitmap);
            main_tv_result.setText(result);
        }
    }

    // 进行OCR并返回识别结果
    private String getOCRResult(TessBaseAPI tessBaseAPI, Bitmap bitmap) {
        tessBaseAPI.setImage(bitmap);
        String result = "-";
        try{
            result = tessBaseAPI.getUTF8Text();
        }catch (Exception e){
            Log.e(TAG, e.getMessage());
        }
        tessBaseAPI.end();
        return result;
    }

7. Compile and run, the effect is as shown in the figure

        I personally feel that the recognition rate is not very accurate, so-so.

8. Paste the source code

        MainActivity.java

import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;

import android.Manifest;
import android.content.Context;
import android.content.pm.PackageManager;
import android.content.res.AssetManager;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import android.widget.ImageView;
import android.widget.TextView;
import android.widget.Toast;

import com.googlecode.tesseract.android.TessBaseAPI;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

public class MainActivity extends AppCompatActivity {

    public static final String TESS_DATA = "/tessdata";
    private static final String TARGET_FILENAME = "vin_demo.png";
    private static final String DATA_FILENAME = "eng.traineddata";
    private static final String TAG = MainActivity.class.getSimpleName();

    private Button main_bt_recognize;
    private TextView main_tv_result;
    private ImageView main_iv_image;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        // 设置布局文件
        setContentView(R.layout.activity_main);
        // 检查并请求应用所需权限
        checkPermission();
        // 获取控件对象
        initView();
        // 设置控件的监听器
        setListener();
    }

    private void setListener() {
        // 设置识别按钮的监听器
        main_bt_recognize.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                // 识别之前需要再次检查一遍权限
                checkPermission();
                // 点击后的主程序
                mainProgram();
            }
        });
    }

    // 获得界面需要交互的控件
    private void initView() {
        main_bt_recognize = findViewById(R.id.main_bt_recognize);
        main_tv_result = findViewById(R.id.main_tv_result);
        main_iv_image = findViewById(R.id.main_iv_image);
    }

    // 检查应用所需的权限,如不满足则发出权限请求
    private void checkPermission() {
        if (ContextCompat.checkSelfPermission(getApplicationContext(),
                Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
            ActivityCompat.requestPermissions(MainActivity.this,
                    new String[]{Manifest.permission.READ_EXTERNAL_STORAGE}, 120);
        }
        if (ContextCompat.checkSelfPermission(getApplicationContext(),
                Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
            ActivityCompat.requestPermissions(MainActivity.this,
                    new String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE}, 121);
        }
    }

    // OCR识别的主程序
    private void mainProgram() {
        // 从assets中获取一张Bitmap图片
        Bitmap bitmap = getBitmapFromAssets(MainActivity.this, TARGET_FILENAME);
        // 同时显示在界面
        main_iv_image.setImageBitmap(bitmap);
        if (bitmap != null) {
            // 准备工作:创建路径和Tesserect的数据
            prepareTess();
            // 初始化Tesserect
            TessBaseAPI tessBaseAPI = new TessBaseAPI();
            String dataPath = getExternalFilesDir("/").getPath() + "/";
            tessBaseAPI.init(dataPath, "eng");
            // 识别并显示结果
            String result = getOCRResult(tessBaseAPI, bitmap);
            main_tv_result.setText(result);
        }
    }

    // 从assets中读取一张Bitmap类型的图片
    private Bitmap getBitmapFromAssets(Context context, String filename) {
        Bitmap bitmap = null;
        AssetManager assetManager = context.getAssets();
        try {
            InputStream is = assetManager.open(filename);
            bitmap = BitmapFactory.decodeStream(is);
            is.close();
            Log.i(TAG, "图片读取成功。");
            Toast.makeText(getApplicationContext(), "图片读取成功。", Toast.LENGTH_SHORT).show();
        } catch (IOException e) {
            Log.i(TAG, "图片读取失败。");
            Toast.makeText(getApplicationContext(), "图片读取失败。", Toast.LENGTH_SHORT).show();
            e.printStackTrace();
        }
        return bitmap;
    }

    // 为Tesserect复制(从assets中复制过去)所需的数据
    private void prepareTess() {
        try{
            // 先创建必须的目录
            File dir = getExternalFilesDir(TESS_DATA);
            if(!dir.exists()){
                if (!dir.mkdir()) {
                    Toast.makeText(getApplicationContext(), "目录" + dir.getPath() + "没有创建成功", Toast.LENGTH_SHORT).show();
                }
            }
            // 从assets中复制必须的数据
            String pathToDataFile = dir + "/" + DATA_FILENAME;
            if (!(new File(pathToDataFile)).exists()) {
                InputStream in = getAssets().open(DATA_FILENAME);
                OutputStream out = new FileOutputStream(pathToDataFile);
                byte[] buff = new byte[1024];
                int len;
                while ((len = in.read(buff)) > 0) {
                    out.write(buff, 0, len);
                }
                in.close();
                out.close();
            }
        } catch (Exception e) {
            Log.e(TAG, e.getMessage());
        }
    }

    // 进行OCR并返回识别结果
    private String getOCRResult(TessBaseAPI tessBaseAPI, Bitmap bitmap) {
        tessBaseAPI.setImage(bitmap);
        String result = "-";
        try{
            result = tessBaseAPI.getUTF8Text();
        }catch (Exception e){
            Log.e(TAG, e.getMessage());
        }
        tessBaseAPI.end();
        return result;
    }
}

        layout_main.xml

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="match_parent">
        <LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:orientation="vertical" >

            <ImageView
                android:id="@+id/main_iv_image"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_gravity="center_horizontal"
                android:layout_marginLeft="5dp"
                android:layout_marginRight="5dp"/>

            <Button
                android:id="@+id/main_bt_recognize"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_marginLeft="5dp"
                android:layout_marginRight="5dp"
                android:layout_gravity="center_horizontal"
                android:text="读取一张图片并识别" />

            <TextView
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_marginLeft="5dp"
                android:layout_marginRight="5dp"
                android:layout_gravity="center_horizontal"
                android:text="识别结果:" />

            <TextView
                android:id="@+id/main_tv_result"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_marginLeft="5dp"
                android:layout_marginRight="5dp"
                android:layout_gravity="center_horizontal" />
        </LinearLayout>
    </ScrollView>
</LinearLayout>

        AndroidManifest.xml

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="com.cs.ocrdemo4csdn">

    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:roundIcon="@mipmap/ic_launcher_round"
        android:supportsRtl="true"
        android:theme="@style/Theme.OCRDemo4CSDN">

        <activity
            android:name=".MainActivity"
            android:exported="true">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>
</manifest>

        build.gradle(Module)

plugins {
    id 'com.android.application'
}

android {
    compileSdk 34

    defaultConfig {
        applicationId "com.cs.ocrdemo4csdn"
        minSdk 21
        targetSdk 34
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
}

dependencies {

    implementation 'androidx.appcompat:appcompat:1.2.0'
    implementation 'com.google.android.material:material:1.3.0'
    implementation 'androidx.constraintlayout:constraintlayout:2.0.4'
    testImplementation 'junit:junit:4.+'
    androidTestImplementation 'androidx.test.ext:junit:1.1.2'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'

    implementation 'com.rmtheis:tess-two:9.1.0'
}

        vin_demo.png

Just download pictures from the Internet (ensure that there are optical characters).

        eng.traineddata

Download it from https://github.com/tesseract-ocr/ , if not, download it from https://github.com/raykibul/Android-OCR-Testing/tree/main .

3. References

        1. Optical character recognition

        2、GitHub - tesseract-ocr/tesseract

        3、GitHub - raykibul/Android-OCR-Testing

Four. Conclusion

        1. I have been tinkering with other people's CSDN blogs for more than a day, but I have not been able to get through, and I have been reporting errors, such as "Could not initialize Tesseract API with language=eng", "getUTF8Text causes android tesseract to crash" and other problems. Later, I read the relatively new code on GitHub (if you refer to my blog and still can’t get through the code, I suggest you look at this code), and then it worked. It is not yet known why.

        2. I think the recognition results of the Tesseract library are not very accurate, especially the recognition rate of the results from photos is very low. From the development history of this library, it seems that no one maintains it anymore (the last The maintainer is Google, and it's stuck in 2018), so, for now (this year is 2023), I feel that it is already a bit of an obsolete technology. You can seek some relatively new technical solutions. After all, the large models are now so awesome, and OCR should be better.

Guess you like

Origin blog.csdn.net/qq_36158230/article/details/131901162
Recommended