Prompt Paradigm Industry Practice Sharing! Realization of cross-modal document information extraction based on Paddle UIE-X and Intel OpenVINO

Recently, the Prompt paradigm has attracted much attention. In fact, its ideas have already had some successful application cases in the industry. The Institute of Software of the Chinese Academy of Sciences and Baidu jointly proposed UIE (Universal Information Extraction), a universal information extraction technology that unifies many tasks. Up to now, UIE series models have released three models: UIE, UIE-X, and UIE-senta. Based on the idea of ​​Prompt, UIE series models have become the industry's first choice for information extraction, sentiment analysis and other tasks due to their powerful zero-sample and small-sample capabilities and multi-task unified modeling capabilities.

 
UIE series models (UIE, UIE-X, UIE-senta) basic information table

This industry practice example is based on UIE-X and OpenVINO to realize medical document information extraction, and provides a complete solution for optimizing the deployment of UIE-X models on the Intel x86 platform, lowering the threshold for industrial implementation, and can be migrated to information extraction applications in industries such as finance Scenes.

project link

https://aistudio.baidu.com/aistudio/projectdetail/6335929?contributionType=1

scene difficulty

  • There are many types of documents and various formats. How to effectively combine text, pictures, and layout information for modeling is a big problem;
  • The traditional extraction scheme based on sequence annotation relies on a large amount of domain annotation data, and the cost is extremely high;
  • There are often multiple information extraction requirements such as entities and relationships in the same business, and the cost of separate modeling and training is high.

Model Selection

In addition to plain text content, there are also a large number of business scenarios in enterprises that need to extract and process information from cross-modal documents. For example, in the medical field, there are a large number of medical examination reports, medical records, invoices, and medical image data such as CT images. In order to meet the needs of cross-modal document information extraction, PaddleNLP is based on Wenxin ERNIE-Layout cross-modal layout enhanced pre-training model, integrated PaddleOCR's PP-OCR, PP-Structure layout analysis and other leading capabilities, based on a large amount of information extraction annotation set, training And open source UIE-X - the first information extraction model with both text and document extraction capabilities, multi-language, and open domain.

This case is the actual combat of UIE-X in the medical field. Through a small amount of annotation + model fine-tuning, it can have the end-to-end document information extraction capability of customized scenarios. In order to realize intelligent document information extraction, we adopt the scheme of "defining schema", "Taskflow definition", and "designating documents for information extraction":

  • The first step is to define the task of information extraction and the information to be extracted based on the Prompt paradigm;
  • The second step is to define Taskflow, including loading custom models. Specify the path of the model weight file through task_path, which needs to contain the trained model weight file model_state.pdparams;
  • The third step is to specify the path doc_path where the document for information extraction is located, and perform information extraction.

Flying Paddle can significantly reduce the difficulty for users in model selection through a large and selected model library, reduce time costs, and achieve rapid iteration. In the implementation stage, the Intel OpenVINO™ toolkit is used for model deployment to give full play to the network execution performance on the general x86 platform, optimize the overall cost of the solution, and improve the reasoning performance of the solution.

tuning strategy

  • Based on various performance hint strategies provided in OpenVINO's auto-device, multi-thread configuration is performed according to different usage scenarios to improve inference throughput or reduce latency.
  • Supports Dynamic Input Shape on Intel CPU and GPU to improve the inference performance of the solution during information extraction, and optimize the overall cost of the solution while ensuring the inference delay.

model deployment

The final deployment environment of this project is an Intel x86 hardware platform device. Considering the convenience of development, this example uses Python to deploy the development environment. By inputting medical document pictures and defining the schema for extracting information, the Taskflow framework is used to complete the intelligent information extraction based on UIE-X.

The solution can support Chinese and English prompt/schema and cross-language extraction, and also supports custom OCR results. Pass in the OCR Bounding Box information by configuring the layout parameter to optimize the extraction effect. Paddle AI Studio also provides complete usage examples and development instructions, you can refer to this tutorial to learn quickly, and develop and integrate for actual projects.


Medical document information extraction deployment demo scheme

In order to make it easier for friends to apply the example tutorials, OpenVINO evangelist Dr. Wu Zhuo will give you an in-depth analysis of the entire development process from data preparation, scheme design to model optimization and deployment at 19:00 on June 14 (Wednesday), step by step. Teach everyone to practice coding.

Flying paddle PaddlePaddle

Guess you like

Origin blog.csdn.net/PaddlePaddle/article/details/131208089