CSIG Young Scientists Conference was successfully held, Hehe Information created a new paradigm for intelligent document processing integration research

Recently, the 19th China Image and Graphics Society Young Scientists Conference (referred to as the "Conference") was held in Guangzhou. The conference is hosted by the Chinese Society of Image and Graphics (CSIG) and hosted by Pazhou Laboratory, South China University of Technology, Sun Yat-sen University, and the Youth Working Committee of the Chinese Society of Image and Graphics. The conference is oriented to international academic frontiers and national strategic needs. It is committed to supporting outstanding young scholars in the field of image and graphics, providing a platform for academic exchanges and discussions, promoting exchanges and cooperation among scholars, and encouraging "industry-university-research" cooperation in the field of image and graphics. .

At present, artificial intelligence technology represented by large models is reshaping thousands of industries. Compared with general large models, vertical domain large models focus on data and knowledge of specific scenarios, and have higher accuracy and efficiency when dealing with complex and professional scenario problems. The role and development prospects of large models in vertical fields have also become the focus of research from all walks of life. Therefore, under the guidance of the organizer, the Chinese Image and Graphics Society, Hehe Information, together with Pazhou Laboratory, South China University of Technology, Sun Yat-sen University, and the Youth Working Committee of the Chinese Image and Graphics Society, jointly hosted the "Vertical Field Large Model Forum" ( (referred to as the "Forum"). The forum brought together technical experts from universities and representative companies in office, medical, industrial and other industries to share cutting-edge technological achievements and practical ideas, and to find new solutions for the application of large models in vertical fields.

Document processing is an important research direction for large models in vertical fields. Affected by factors such as low image quality, rich layouts, and diverse text fonts and colors, the ability to significantly improve the intelligent analysis and understanding of document images faces challenges. In September 2023, following GPT-4, Open AI released the multi-modal model GPT-4V (V stands for "Vision") with visual functions, which has outstanding document image understanding capabilities and has attracted widespread attention in the industry.

At the forum, Dr. Ding Kai, deputy general manager and senior engineer of Hehe Information Intelligent Technology Platform Division, analyzed the actual performance of GPT-4V in the field of document processing and shared the company's research work in the field of intelligent document processing.

Ding Kai said that GPT-4V’s performance in scene text recognition, language morphology, language type, handwriting recognition, formula recognition, geometric figure recognition, table understanding, etc. is very amazing. The results of complex chart analysis and understanding, document extraction and reasoning are also very good. outstanding. Correspondingly, GPT-4V has flaws in Chinese, handwritten formulas, scene text recognition, and table recognition; when faced with documents with complex layouts such as multiple columns and tables, the processing effect of large models is far behind the current SOTA (State of the Art) There is still a big gap in the methods.

"In the field of intelligent document processing, large models support the recognition and understanding of document element types far beyond traditional algorithms, greatly expanding the capabilities of AI technology in the field of document analysis and recognition, and realizing end-to-end document recognition and understanding. process. The shortcoming is that the current OCR accuracy of large models is far from the best model in the field, and long documents rely on external document parsing engines." Ding Kai believes that technology companies can do things well at the "perception" level to allow large The model can better do "cognition". This fusion research paradigm has positive significance in the field of intelligent document processing.

Currently, Hehe Information-South China University of Technology Document Image Analysis, Recognition and Understanding Joint Laboratory has conducted in-depth research on key technical directions such as pixel-level OCR unified model and OCR large unified model in large model document processing. The relevant work results are in text removal. , text segmentation and tampered text detection tasks have been widely verified. In addition, the laboratory also makes full use of the advantages of sequence prediction through innovative document recognition analysis and LLM (natural language model) application design to better solve the diverse task requirements in document image processing, and through the combination with LLM, it can achieve It provides a higher level of document understanding and analysis, bringing more possibilities to the field of document image processing.

Hehe Information is an artificial intelligence and big data technology company that is committed to providing innovative digital, Intelligent services. Through artificial intelligence technologies such as natural language processing (NLP), computer vision (CV), and deep learning, Hehe Information intelligent document processing system can realize "document import-image processing-text detection and recognition-information extraction-data verification-semantic retrieval" "With abstract" full-process intelligent processing, related products and solutions have been used in many industries such as finance, logistics, and manufacturing around the world.

Guess you like

Origin blog.csdn.net/INTSIG/article/details/135414005