AI technology practice | Use Tencent Cloud intelligent text image enhancement to create a handheld scanner

In daily life and work, limited by photography technology, shooting conditions and other constraints, the text images obtained often have uneven lighting, tilted angles, blurred text, etc. This low-quality text image is not only detrimental to preservation and subsequent research, but also detrimental to optical character recognition. In order to solve the above problems, we specially researched relevant products in the industry and found that Tencent Cloud AI's text image enhancement capabilities can very well create a handheld scanner.

Specifically, the bottom layer of the software uses computer vision technology to provide image processing services for text picture scenes, including edge trimming enhancement, bend correction, shadow removal, moiré removal and other capabilities, which can effectively optimize the quality of document pictures and improve the text quality. The clarity greatly improves the quality of low-quality text images; it is easy for users to operate and only needs to upload the text image that needs to be enhanced, and the image can be automatically processed. After the image processing is completed, the user can download the enhanced image.

Next, I will describe the implementation process of the handheld scanner in detail.

1. Preparation work

In order to use Tencent Cloud's text image enhancement capabilities, I made the following preparations.

1.1. Enable text and image enhancement function

Before using Tencent Cloud text image enhancement, activate the text image enhancement service through the Tencent Cloud official website.

After the service is successfully activated, Tencent Cloud AI Text Recognition provides a free resource package, including 1,000 free quotas for text image enhancement. You can check the resource package usage on the resource package management page .

Through use, I found that the postpaid service is activated on the settings page , so that there is no need to worry about the resource package being exhausted and the call interface failing, but the postpaid settings can only be changed once a month.

 

1.2. Console monitoring information

After using it, I learned that the usage information of all text recognition services can be viewed in the console . You can see the statistics of the current month's calls, billing, number of successes, success rate, etc. from the figure below.

 

2. Operation process

Through the following steps, you can use the image enhancement function of Tencent Cloud AI text recognition to create a handheld scanner.

  • Get personal key

  • View the Image Enhancement API documentation

  • Use the image enhancement function of Tencent Cloud AI text recognition to create a handheld scanner

2.1. Obtain personal key

On the API key management page of Tencent Cloud Access Management , we create a new personal key.


Copy the generated key, you can click here to go directly

 

 

2.2. Image enhancement API interface description

You can select text image enhancement in the API Explorer - input parameters - select the required language - to generate the API call code in the corresponding language.

 

2.3. Use the image enhancement function of Tencent Cloud AI text recognition to create a handheld scanner

The implementation process of handheld scanner products is mainly divided into the following steps:

  • Install the SDK that the environment depends on

  • Call the image enhancement interface

  • Experience the power of a handheld scanner

2.3.1 Install the SDK that the environment depends on

#	安装公共基础包
go get -v -u github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common
#	安装对应的产品包(如 CVM)
go get -v -u github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/cvm
#	一次性下载腾讯云所有产品的包
go get -v -u github.com/tencentcloud/tencentcloud-sdk-go

2.3.2 Call the image enhancement interface

package imageenhancement

import (
	"encoding/base64"
	"fmt"
	"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common"
	"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/errors"
	"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile"
	ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119"
	"io/ioutil"
	"os"
	"testing"
)

//MainImageEnhancement 主函数
func MainImageEnhancement(imagesPath string) {
	//图片地址
	
	// 实例化一个认证对象,入参需要传入腾讯云账户secretId,secretKey,此处还需注意密钥对的保密
	// 密钥可前往https://console.cloud.tencent.com/cam/capi网站进行获取
	credential := common.NewCredential(
		 //这里填入腾讯云账户密钥对
		 "SecretId",
         "SecretKey",
	)
	// 实例化一个client选项,可选的,没有特殊需求可以跳过
	cpf := profile.NewClientProfile()
	cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
	// 实例化要请求产品的client对象,clientProfile是可选的
	client, _ := ocr.NewClient(credential, "ap-guangzhou", cpf)
	//读取图片base64
	toBase64Str, err := imageToBase64(imagesPath)
	respImages, err := imageType208(*client, toBase64Str)
	if err != nil {
		return
	}
	err = writeFile("test.jpg", *respImages)
	if err != nil {
		return
	}
}

//imageType1 切边增强
func imageType1(client ocr.Client, toBase64Str string) (*string, error) {
	// 实例化一个请求对象,每个接口都会对应一个request对象
	request := ocr.NewImageEnhancementRequest()
	request.ImageBase64 = common.StringPtr(toBase64Str)
	request.ReturnImage = common.StringPtr("preprocess")
	request.TaskType = common.Int64Ptr(1)

	// 返回的resp是一个ImageEnhancementResponse的实例,与请求对象对应
	response, err := client.ImageEnhancement(request)
	if _, ok := err.(*errors.TencentCloudSDKError); ok {
		fmt.Printf("An API error has returned: %s", err)
		return nil, err
	}
	if err != nil {
		return nil, err
	}
	return response.Response.Image, nil
}

//imageType2 弯曲矫正
func imageType2(client ocr.Client, toBase64Str string) (*string, error) {
	// 实例化一个请求对象,每个接口都会对应一个request对象
	request := ocr.NewImageEnhancementRequest()
	request.ImageBase64 = common.StringPtr(toBase64Str)
	request.ReturnImage = common.StringPtr("preprocess")
	request.TaskType = common.Int64Ptr(2)

	// 返回的resp是一个ImageEnhancementResponse的实例,与请求对象对应
	response, err := client.ImageEnhancement(request)
	if _, ok := err.(*errors.TencentCloudSDKError); ok {
		fmt.Printf("An API error has returned: %s", err)
		return nil, err
	}
	if err != nil {
		return nil, err
	}
	return response.Response.Image, nil
}

//imageType202 黑白模式
func imageType202(client ocr.Client, toBase64Str string) (*string, error) {
	// 实例化一个请求对象,每个接口都会对应一个request对象
	request := ocr.NewImageEnhancementRequest()
	request.ImageBase64 = common.StringPtr(toBase64Str)
	request.ReturnImage = common.StringPtr("preprocess")
	request.TaskType = common.Int64Ptr(202)

	// 返回的resp是一个ImageEnhancementResponse的实例,与请求对象对应
	response, err := client.ImageEnhancement(request)
	if _, ok := err.(*errors.TencentCloudSDKError); ok {
		fmt.Printf("An API error has returned: %s", err)
		return nil, err
	}
	if err != nil {
		return nil, err
	}
	return response.Response.Image, nil
}

... //这里省略重复的部分,可以扩展其他模式或者任意模式组合

//writeFile base64转image
func writeFile(path string, s string) error {
	//解析base64字符串
	dist, _ := base64.StdEncoding.DecodeString(s)
	//写入新文件
	f, _ := os.OpenFile(path, os.O_RDWR|os.O_CREATE, os.ModePerm)
	defer f.Close()
	_, err := f.Write(dist)
	return err
}

//imageToBase64 img转base64
func imageToBase64(filePath string) (string, error) {
	srcByte, err := ioutil.ReadFile(filePath)
	if err != nil {
		return "", err
	}
	res := base64.StdEncoding.EncodeToString(srcByte)
	return res, nil
}

2.3.3 Experience the effect of handheld scanner

1) Angle correction

  • Original Image

  • Picture after trimming and enhancement

After angle correction, you can see from the picture above that the image after angle correction highlights the text content and improves the quality of the text image.

2) Bend correction

  • Original Image:

  • Corrected image:

After bending correction, you can see from the picture above that the image text after bending correction is clearer, improving the quality of the text image.

3) Remove moiré pattern

  • Original Image:

  • Picture after removing moiré:

After the moiré removal process, the clarity of the text image is greatly improved, the interference of the moiré pattern is eliminated, and the quality of the text image is improved.

4) Remove shadows

  • Original Image:

  • Picture after removing shadows:

After shadow removal processing, the impact of environmental factors on the quality of text images is solved and the quality of text images is improved.

2.3.4 Summary

There are many factors that affect the quality and clarity of text images. Uneven outdoor illumination will cause the grayscale of the image to be too concentrated; the text image obtained by the camera undergoes digital/analog conversion, and noise pollution will be generated during line transmission. The quality of the text image will inevitably decrease. The text image is accompanied by noise, making it difficult to see the details of the text image; in severe cases, the text image is blurred, and even the outline of the text is difficult to see clearly. Therefore, the image must be improved before it can be analyzed and processed. The handheld scanner created through Tencent Cloud AI's text image enhancement solves most of the problems of unclear text images and improves the quality of text images.

Guess you like

Origin blog.csdn.net/tencentAI/article/details/128416650