Application of GPU pooling in AI OCR scenarios

1. History and Concept of AI OCR 

OCR (Optical Character Recognition) refers to using optical methods to convert text in paper documents into black and white dot matrix image files, determining its shape by detecting dark and light patterns, and then using character recognition methods to convert the shape into a black and white dot matrix image file. The process of translation into computer text.

Since AlexNet won the ImageNet competition in 2012, deep learning methods have begun to significantly surpass traditional algorithms in the field of images and videos. Methods based on CV (computer vision) and NLP (natural language processing) convolutional neural networks and long short-term memory have begun to expand to OCR. field. In the AI ​​OCR system, the artificial neural network mainly functions as a feature extractor and classifier. The input is a character image, and the output is the recognition result. The recognition rate is very high, and there is no need to spend a lot of time designing character features.

OCR processing is divided into three major steps: image preprocessing, text detection (Detection), and text recognition (Recognition).

Image preprocessing is used to perform some correction operations on the original image to be processed to help reduce the difficulty of subsequent detection and recognition. For example, adjust image contrast, rotate and align, perform partial cropping, fade out interfering information such as creases and ink dots, etc. The specific processes of most existing deep learning recognition algorithms include image correction, feature extraction, sequence prediction and other modules. The process is as shown in the figure:

File detection algorithm, CTPN, is a text detection algorithm proposed at ECCV 2016. It is currently the most widely circulated and influential open source text detection model and can detect horizontal or slightly slanted text lines. CTPN combines CNN and LSTM deep networks to effectively detect horizontally distributed text in complex scenes. The CTPN model mainly consists of three parts, namely the convolution layer, the Bi-LSTM layer, and the fully connected layer. Its structure is shown in the figure below:

For text recognition, the technical idea of ​​using CRNN network (recurrent convolutional neural network) is to use deep convolution to generate basic image features, and then use Bi-LSTM recurrent network (bidirectional long short-term memory network, which can absorb contextual semantic information) to perform temporal features training (this step uses the before and after characteristics of the text sequence to effectively improve the effect), and finally introduces the CTC loss function to realize end-to-end variable length sequence recognition, solving the problem of characters not being aligned during training. The CRNN network structure consists of three parts, from bottom to top: convolution layer, circulation layer, and transcription layer. Its structure is shown in the figure below:

2. AI OCR helps enterprises reduce costs and increase efficiency

In traditional financial reimbursement scenarios, corporate employees need to manually enter information such as train tickets, accommodation fee invoices, and agent information into the system when making daily travel reimbursements. The accountant then verifies whether the employee's rank matches the reimbursement standard based on the information entered by the employee. In the past, the method of manually entering information required employees to repeatedly check the accuracy and completeness of the information. At the same time, reviewers also needed to spend a lot of time on manual proofreading, which greatly affected work efficiency.

Currently, with the needs of enterprises in the new development stage, the use of artificial intelligence and other technologies to help enterprises improve efficiency and reduce costs has become the strategic direction of enterprise digital transformation.

Many companies have begun to convert scenarios such as bank document processing and financial invoice reimbursement from the original manual processing methods to the use of AI OCR systems. Users upload image files to the AI ​​OCR system through the front-end system. The AI ​​OCR system uses an artificial intelligence deep algorithm model to detect unstructured image features, identify types, extract text, and form structured data, which is then checked for duplicates by the intelligent review system. True, the result data is finally sent to the front-end system to automatically fill in the form. Using the AI ​​OCR system can greatly improve the accuracy of manual entry, reduce manual errors in the process, and greatly improve the processing efficiency and accuracy of financial reimbursement scenarios.

The AI ​​OCR system uses a large number of deep learning models. GPU, as an important engine of AI computing power, uses parallel computing architecture to greatly improve the recognition accuracy and speed, which greatly helps enterprises Realize process automation, save personnel costs, and process data information efficiently.

3. AI OCR application pain points 

With the large number of applications and increasing demand for AI OCR technology, a large amount of computing power is required. However, most of the current GPU computing resources are allocated to a single project, resulting in a lot of waste and operation and maintenance problems:

  • GPU computing resources are currently allocated using physical machine mode or single business system, with coarse allocation granularity and low utilization rate;

  • The allocation of GPU computing power resources is inflexible, and computing power resources cannot be effectively shared and safely isolated;

  • Without a unified GPU computing resource management platform, the platform team cannot grasp GPU resource utilization and task operation status in a timely and periodic manner;

  • The comprehensive operating costs of GPU computing resources in terms of cabinet resources, electricity consumption, etc. during the life cycle are very high;

  • The hardware procurement process has a long cycle and cannot respond to the innovative needs of business scenarios in a timely manner; newly purchased GPU computing resources must be installed and deployed according to system needs, security reinforced, and regularly upgraded, making the platform team work intensively.

4. GPU pooling helps the efficient application of AI OCR technology

Trend Technology is committed to providing users with world-leading AI computing resource pooling solutions and expanding GPU resource pooling capabilities to the entire data center.

OrionX uses software to define AI computing power, subverting the original architecture of AI applications directly calling physical GPUs, adding a software layer to decouple AI applications from physical GPUs, and unifying GPU resources in the resource pool by building a GPU resource pool. For management, maintenance and deployment, the size of the resource pool can be determined according to system management requirements. For example, all physical GPUs in the data center can be included in the resource pool, or a GPU server can be used as a resource pool. This architecture realizes GPU resource pooling, allowing users to use GPU resources efficiently, intelligently and flexibly, achieving the purpose of reducing costs and increasing efficiency.

OrionX AI computing resource pooling software architecture diagram

OrionX also supports the "retrieval" function, that is, OrionX supports running virtual machines or containers on a server without a physical GPU. Users can transparently use GPU resources on other servers through computer networks without modifying the code of the AI ​​application in the virtual machine or container. It is also through this function that OrionX helps users realize a data center-level GPU resource pool and realizes the decoupling of AI applications and GPU physical resources. AI applications can also be quickly mobilized on a pure CUP server that does not meet the training conditions. A GPU card completes the training task.

5. OrionX innovation points and benefits

1. Change the way GPU computing resources are used

Through the method of software-defined computing power, traditional GPU resources are allocated in units of whole cards, and resources are provided based on 1% of computing power and 1MB of video memory as the basic unit. GPUs are allocated on demand, and the overall utilization rate is significantly improved.

2. GPU computing resource pooling

Supporting GPU cross-node calls, AI applications can be deployed anywhere in the data center, regardless of whether there is a GPU on the node. The scope of GPU resource supply extends from a single node to the entire data center interconnected by the network, optimizing the management model and simplifying operation and maintenance operations.

3. Cloudification of GPU resources

GPU resources in the data center are called on demand, dynamically scaled, and released when used up. AI applications can call GPUs of any size according to load requirements, and can even aggregate GPUs from multiple physical nodes; after the container or virtual machine is created, the number and size of virtual GPUs can still be adjusted; when the AI ​​application stops, the GPUs are released immediately Resources are returned to the entire GPU resource pool to facilitate efficient resource flow and full utilization.

6. Expected earnings of OrionX

1. Improve AI scene performance

By implementing GPU pooling, users can share the GPUs on all servers in the data center, greatly improving resource utilization and reducing GPU server procurement costs and cabinet density. AI-related business personnel no longer need to care about the underlying resource status and can focus on more valuable business aspects, making application development more convenient and concise.

2. Improve AI application support capabilities

Through GPU resource segmentation and on-demand resource allocation, AI inference scenarios are conducive to multi-model parallelization, business operation efficiency is significantly improved, and the elastic expansion of business volume can be supported by several times under the same AI computing power conditions.

3. Accelerate project cycle

After GPU resource pooling, it supports dynamic second-level allocation and recycling of GPU computing power and video memory resources, greatly improving the efficiency of GPU resource allocation. At the same time, the AI ​​program code does not need to be modified, which can effectively improve the project launch time.

4. Optimize the use of GPU computing resources

Through the method of software-defined computing power, traditional GPU resources are allocated in units of whole cards, and resources are provided based on 1% of computing power and 1MB of video memory as the basic unit. GPUs are allocated on demand, and the overall utilization rate is significantly improved.

5. Improve operation and maintenance management personnel efficiency

OrionX AI GPU computing resource pooling technology provides a unified UI management and operation page. Through the management terminal, operation and maintenance personnel can quickly and visually collect the allocation and operation utilization of all GPU servers and GPU resources, and support regular output of GPU resource pool operation reports. Resource pooling, full management process, system platform level, operation and maintenance visualization management model, and shrinking boundary range will double the management efficiency.

6. Energy saving and emission reduction

Thanks to the efficient rotation of the OrionX scheduling engine, the number of supported AI applications can be greatly increased, thereby reducing GPU server procurement costs and corresponding server energy consumption and computer room environment energy consumption costs, reducing the overall operating costs of the business system, improving investment efficiency, and helping the country The 2050 dual-carbon emission reduction target has been achieved.

Building an AI computing resource pool can better support the rapid growth of business system innovation in artificial intelligence scenarios during the digital upgrading of enterprise operations. This is specifically reflected in improving infrastructure utilization, reducing equipment operation and personnel operation and maintenance costs, and reducing duplication. Construction, optimization of resource allocation, improvement of service capabilities, etc. can effectively accelerate customers' innovation speed in the field of artificial intelligence!

Guess you like

Origin blog.csdn.net/m0_49711991/article/details/128383656