Development of HarmonyOS learning path - AI function development (document detection and correction)

basic concept

Document Correction provides auxiliary enhancements to the document remake process and consists of two sub-functions:

  • Document detection: It can automatically identify the document in the picture and return the position information of the document in the original picture. Documents here generally refer to things with a square shape, such as books, photos, picture frames, etc.
  • Document correction: It can correct the shooting angle of the document according to the position information of the document in the original picture, and automatically adjust the shooting angle to the angle facing the document.

Operation Mechanism

  • document detection

    Call the document detection interface, identify the document in the picture, and return the position information of the document in the original picture.

    Figure 1  contains a picture of the document

 As shown by the red dots in the figure above, the document detection interface returns the coordinate information of the four vertices of the photo document relative to the upper left corner of the image. The document detection results are as follows:

{
  "resultCode":0,
  "doc":
     "{
       \"bottom_left\":{\"x\":17,\"y\":440},
       \"bottom_right\":{\"x\":589,\"y\":760},
       \"top_left\":{\"x\":256,\"y\":13},
       \"top_right\":{\"x\":829,\"y\":332}
    }"
}

  • This JSON saves the coordinate information (unit: pixel) of the four corners of the photo document in the original image relative to the upper left vertex of the original image, where resultCode is the return code.

  • document correction

    Correct the shooting angle of the document according to the position information of the document in the original picture (the corrected area can be customized).

    Correction area in picture

Correct the dark blue rectangle (the document area returned by the document detection interface) area in the above figure, and the corrected effect is as follows:

Figure 3  The corrected document image

Constraints and Restrictions

  • The supported image formats include JPEG, JPG, and PNG, and the final output image only supports JPEG format.
  • When shooting, try to place the document on a plane that has a certain color difference from the background color of the document, and try to fill the screen with the document, keeping the border of the document in the mirror to get the best effect.
  • The minimum height and width of the input image is 100 pixels, and the maximum is 10000 pixels.

Document detection and correction development

scene introduction

  • Remake old documents such as paper letters into electronic versions to help improve the remake of old documents.
  • Record the wonderful works in the calligraphy and painting exhibition, and help the works to be more correct.

Interface Description

Document correction provides three function interfaces: setVisionConfiguration(), docDetect() and docRefine().

  • setVisionConfiguration is a member of the IDocRefine interface. Through the incoming DocRefineConfiguration, select the type of document correction to be invoked.
void setVisionConfiguration(DocRefineConfiguration docRefineConfiguration);
  • The following table lists common settings of DocRefineConfiguration:

    interface

    parameter name

    type

    Parameter Description

    setProcessMode()

    mode

    int

    Process mode definition:

    VisionConfiguration.MODE_IN (same process call)

    VisionConfiguration.MODE_OUT (cross-process call)

    The default value is VisionConfiguration.MODE_OUT.

  • Call the docDetect() method of IDocRefine to obtain the detection result.
int docDetect(VisionImage image, DocCoordinates result, VisionCallback<DocCoordinates> visionCallBack);

in:

image is the input image to be detected by the document.

If visionCallback is null, a synchronous call is executed, the result code is returned by the method, and the detection result is returned by result.

If visionCallback is a valid callback function, the function is called asynchronously. When the function returns, the value in result is invalid, and the actual recognition result is returned by the callback function.

When the synchronous mode call is successful, the function returns result code 0. When the call request in asynchronous mode is sent successfully, the function returns result code 700.

  • Call the docRefine() method of IDocRefine to obtain the correction result.
int docRefine(VisionImage image, DocCoordinates coordinates, ImageResult result,
    VisionCallback<ImageResult> visionCallBack);

in:

image is the input image to be corrected.

If visionCallback is null, a synchronous call is executed, the result code is returned by the method, and the correction result is returned in result.

If visionCallback is a valid callback function, the function is called asynchronously. When the function returns, the value in result is invalid, and the actual recognition result is returned by the callback function.

When the synchronous mode call is successful, the function returns result code 0. When the call request in asynchronous mode is sent successfully, the function returns result code 700.

development steps

When using document correction, first add the relevant classes to the project.

import ohos.ai.cv.common.ConnectionCallback;
import ohos.ai.cv.common.VisionCallback;
import ohos.ai.cv.common.VisionImage;
import ohos.ai.cv.common.VisionManager;
import ohos.ai.cv.common.ImageResult;
import ohos.ai.cv.docrefine.DocCoordinates;
import ohos.ai.cv.docrefine.DocRefineConfiguration;
import ohos.ai.cv.docrefine.IDocRefine;
import ohos.app.Context;
import ohos.media.image.PixelMap;

Define the ConnectionCallback callback to implement the operation after the connection capability engine is successful or not.

ConnectionCallback connectionCallback = new ConnectionCallback() {
    @Override
    public void onServiceConnect() {
        // 定义连接能力引擎成功后的操作。
    }

    @Override
    public void onServiceDisconnect() {
        // 定义连接能力引擎失败后的操作。
    }
};

Call the VisionManager.init() method, use the context of this project and the defined connectionCallback as input parameters, and establish a connection with the capability engine. The context should be an instance of ohos.aafwk.ability.Ability or ohos.aafwk.ability.AbilitySlice or subclass instance.

int result = VisionManager.init(context, connectionCallback);

After receiving the onServiceConnect callback to successfully connect to the service, instantiate the IDocRefine interface and use the context of this project as an input parameter. The context should be an instance or subclass instance of ohos.aafwk.ability.Ability or ohos.aafwk.ability.AbilitySlice.

IDocRefine docRefine = VisionManager.getDocRefine(context);

Instantiate the VisionImage object image, and pass in the image pixelMap to be corrected.

VisionImage image = VisionImage.fromPixelMap(pixelMap);

Instantiate the DocCoordinates object docCoordinates.

DocCoordinates docCoordinates = new DocCoordinates();

illustrate

This class is used in synchronous mode to store the document location result sent by the detection interface docDetect().

(Optional) Define a VisionCallback<DocCoordinates> callback.

VisionCallback<DocCoordinates> callback= new VisionCallback<DocCoordinates>() {
    @Override
    public void onResult(DocCoordinates docCoordinates) {
        // 对正确获得的结果进行处理。
    }
    @Override
    public void onError(int i) {
        // 处理错误返回码。
    }
    @Override
    public void onProcessing(float v) {
        // 返回处理进度。
    }
};

illustrate

In asynchronous mode, the onResult() method of this class is used to obtain the detection result docCoordinates (including the detected document coordinates); the onError() method is used to process the error return code; the onProcessing() method is used to return the processing progress, There are currently no functions implementing this interface.

The difference between synchronous and asynchronous modes is whether the last parameter visionCallback of docDetect() is empty. If it is not empty, it is in asynchronous mode. At this time, the custom DocCoordinates will be ignored and the input docCoordinates will be ignored. The interface call results will all be obtained from the callback function visionCallback.

Instantiate the ImageResult object imageResult.

ImageResult imageResult = new ImageResult();

illustrate

This class is used in synchronous mode to store the result of correcting the image sent by the docRefine() method.

(Optional) Defines a VisionCallback<ImageResult> callback.

VisionCallback<ImageResult> callback = new VisionCallback<ImageResult>() {
    @Override
    public void onResult(ImageResult imageResult) {
        // 对正确获得的结果进行处理。
    }
    @Override
    public void onError(int i) {
        // 处理错误返回码。
    }
    @Override
    public void onProcessing(float v) {
        // 返回处理进度。
    }
};

illustrate

  • In asynchronous mode, the onResult() method of this class is used to obtain the corrected result imageResult (including the corrected picture); the onError() method is used to process the error return code; the onProcessing() method is used to return the processing progress, currently There are no functions that implement this interface.
  • The difference between synchronous and asynchronous modes is whether the last parameter visionCallback of docRefine() is empty. If not empty, it is in asynchronous mode. At this time, the custom ImageResult input imageResult will be ignored, and the interface call results will be obtained from the callback class visionCallback.

Configure the correction parameters through DocRefineConfiguration, and you can choose the process call mode, etc. (MODE_IN is recommended to use the same process mode). In the cross-process mode (MODE_OUT), the caller and the capability engine are in different processes; In the process of instantiation, the caller invokes the document correction capability in the engine through reflection. Take the same process call as an example:

DocRefineConfiguration.Builder builder = new DocRefineConfiguration.Builder();
builder.setProcessMode(VisionConfiguration.MODE_IN);
DocRefineConfiguration configuration = builder.build();
docRefine.setVisionConfiguration(configuration);

(Optional) Call the prepare() method of IDocRefine.

result = docRefine.prepare();
说明
如果返回的result不为0,说明当前文档校正能力准备失败,需要处理错误,不再执行以后的动作。在docDetect()和docRefine()方法中会首先调用prepare()启动引擎,如果引擎已经启动则不会再次启动。

Call the docDetect() method of IDocRefine:

result = docRefine.docDetect(image, docCoordinates, null); // 同步

or

result = docRefine.docDetect(image, null, callback); // 异步

illustrate

  • When the synchronous mode call completes, the function returns the result code immediately.
  • When the call request in asynchronous mode is sent successfully, the function returns result code 700. If other values ​​are returned, it means that the asynchronous call request is unsuccessful, and the error needs to be processed first, and the callback function will not be called.
  • If the asynchronous mode call request is sent successfully, the corresponding callback function will be called automatically after the detection is completed.
    • If the onResult() callback is called, the detection is successful, which is equivalent to the case where the result code of the synchronous mode is 0.
    • If the onError() method is called, it means that an error has occurred in the detection, and the specific call result code will be received by the parameter of onError().

The result codes are defined in the following table:

result code

illustrate

0

success

-1

unknown mistake

-2

Unsupported function or interface

-3

Memory allocation failed or object creation failed

-4

Failed to load required library

-10

Engine switch is off

101

fail

102

time out

200

The input parameter is invalid (the image size is wrong)

201

The input parameter is invalid (empty)

210

The input parameters are legal

500

service binding exception

521

Service binding disconnected abnormally

522

service connected

600

Model file exception

601

model file does not exist

602

Model loading failed

700

Asynchronous call request sent successfully

1001

Neural Network Processing Unit Error

Call the docRefine() method of IDocRefine:

result = docRefine.docRefine(image, docCoordinates, imageResult, null); // 同步

or

result = docRefine.docRefine(image, docCoordinates, null, callback); // 异步

illustrate

  • When the synchronous mode call completes, the function returns the result code immediately.
  • When the call request in asynchronous mode is sent successfully, the function returns result code 700. If other values ​​are returned, it means that the asynchronous call request is unsuccessful, and the error needs to be processed first, and the callback function will not be called.
  • If the asynchronous mode call request is sent successfully, the corresponding callback function will be called automatically after the correction is completed.
    • If the onResult() callback is called, it means that the calibration is successful, which is equivalent to the case where the synchronous mode result code is 0.
    • If the onError() method is called, it means that there is an error in the correction, and the specific call result code will be received by the parameter of onError().

The result codes are defined in the following table:

result code

illustrate

0

success

-1

unknown mistake

-2

Unsupported function or interface

-3

Memory allocation failed or object creation failed

-4

Failed to load required library

-10

Engine switch is off

101

fail

102

time out

200

The input parameter is invalid (the image size is wrong)

201

The input parameter is invalid (empty)

210

The input parameters are legal

500

service binding exception

521

Service binding disconnected abnormally

522

service connected

600

Model file exception

601

model file does not exist

602

Model loading failed

700

Asynchronous call request sent successfully

1001

Neural Network Processing Unit Error

Call the release() method of IDocRefine to release resources. Call the release() method of pixelMap to release the image memory.

result = docRefine.release();
if (pixelMap != null) {
    pixelMap.release();
    pixelMap = null;
}

illustrate

不再使用文档校正能力时,调用release()方法释放资源。

调用VisionManager.destroy()方法,断开与能力引擎的连接。

VisionManager.destroy();

Guess you like

Origin blog.csdn.net/weixin_47094733/article/details/131370355