VAlSE2023, a major computer vision conference, was held, and Hehe Information shared the cutting-edge progress of intelligent document processing technology

Recently, the 2023 Vision And Learning SEminar (VALSE) has come to a successful conclusion. The conference was sponsored by the Chinese Society for Artificial Intelligence and the Chinese Society of Image and Graphics, and undertaken by Jiangnan University and the Management Committee of Wuxi National High-tech Industrial Development Zone. More than 5,000 experts and scholars, teachers and students from well-known universities, and R&D personnel from technology companies such as OPPO, Huawei, Baidu, and Hehe Information gathered at the conference to explore the development and development of cutting-edge technologies in computer vision, image processing, pattern recognition, and machine learning. application.

Conference site

 

VALSE is a high-level academic seminar for young Chinese scholars at home and abroad in the field of computer vision and machine learning. This conference presented a total of 3 conference keynote reports, 4 conference invited reports, 12 annual progress reports (APR) reports, 4 workshops (Tutorial), 20 seminars (Workshop), the total number of conference reports over the years the most.

This year, VALSE inherited the main procedures of previous conferences. The relevant reports and presentations covered most of the hot research directions in the fields of computer vision, image processing, pattern recognition and machine learning, and discussed the domestic and foreign frontier progress in the above research fields. . The conference invited Professor Gao Wen, an academician of the Chinese Academy of Engineering, Professor Jiao Licheng, a foreign academician of the European Academy of Sciences, and researcher Chen Xilin, director of the Institute of Computing Technology, Chinese Academy of Sciences, to give keynote speeches.

The Workshop session of this conference focuses on technology research and development and application hotspots such as visual knowledge and multiple knowledge expression, language recognition and understanding. The intelligent processing of documents is an important application direction of computer vision in the industry, and it still faces many challenges at this stage. As a representative of the field of intelligent document processing, Hehe Information attended the meeting and shared the research and development of intelligent document processing technology and practical results.

In the VALSE Workshop session, the representative of Hehe Information Technology shared the theme of "Application and Practice of Intelligent Document Image Processing Technology"

 

Relevant technical personnel of Hehe Information mentioned that with the continuous expansion of the application of OCR technology, complex and changeable layouts and diverse text content have brought new challenges to the identification and restoration of documents. The technology of "layout analysis and document restoration" is very important to improve the visual effects of electronic document images and the accuracy of information extraction, and it is one of the company's key directions.

Documents usually contain a lot of non-text content such as pictures and tables. In addition to text information, the layout of an ordinary thesis or manuscript often includes headers, footers, tables, QR codes and other elements. After inputting the document image into the system, the machine will analyze and recognize the text part and layout elements, and associate several lines of text to obtain the correct order and paragraph relationship. This is the layout analysis technology.

The task objectives of layout analysis are divided into two categories: physical layout analysis and logical layout analysis. The former mainly solves the problem of region segmentation, while the latter focuses on the logical relationship between regions or the reading order. How to accurately identify various elements and return them to their proper positions during the digitization of documents is one of the difficulties of this technology.

If the accuracy of layout analysis cannot be improved, missing words and misplacement may occur in the process of document materials being photographed and scanned into electronic documents, and "editable" requirements such as converting pictures to Word and converting pictures to Excel will not be met. satisfy.

"Layout analysis and document restoration technology can help machines 'understand' the document structure, and enable electronic documents to obtain the processing effect of 'what you see is what you get'." During the sharing session, Hehe IT technicians mentioned that the format conversion of document pictures In the process, after detecting and recognizing the characters and coordinate information in the text information, as well as the element information such as paragraphs, seals, and tables in the layout, the machine can "understand" the composition of the document and better "restore" the image into a Editable Word or Excel file.

Hehe Information "Intelligent Document Processing - Document Restoration System Architecture" display

 

It is understood that the layout analysis technology of Hehe Information can divide the document image into different types of content areas (text, graphics, formulas, tables, etc.) by solving the problems of layout segmentation and logical relationship processing between areas, and analyze the areas The relationship between them allows the machine to more accurately determine the text position, font, size, and typesetting method in the document, and accurately obtain information from various image documents with complex layouts.

Hehe Information's "Layout Analysis and Restoration" technical processing effect display

 

In the future, Hehe Information will continue to provide innovative digital and intelligent services for global enterprises and individual users, help improve the efficiency of personal document processing, and accelerate the process of digitizing documents throughout the entire life cycle of enterprises.

Guess you like

Origin blog.csdn.net/INTSIG/article/details/131246153