In this competition for college students, hundreds of teams and Hehe Information used AI to overcome difficulties

0 School-enterprise cooperation to overcome problems

Recently, the final of the China University Student Service Outsourcing Innovation and Entrepreneurship Competition ended successfully at Jiangnan University. This event is the only innovation and entrepreneurship national event in the field of service outsourcing industry. It closely follows the modern service economy and the themes of innovation, entrepreneurship, and wealth creation, emphasizing application orientation and industry-university interaction. Capability display platform. The competition guides the public and young students to pay attention to the modern service industry, attracts enterprises to pay attention to young students in colleges and universities, and promotes the reform of college education to meet the development needs of emerging industries. It has gradually become a domestic first-class and internationally influential youth innovation and entrepreneurship exhibition ceremony in the service outsourcing industry.

insert image description here

The content design of this competition fully focuses on the practical problems such as technology and management faced in the development of enterprises, and is more closely integrated with the industry. Intelligent character recognition technology is one of the technologies that the competition focuses on.

With the development of information technology and the continuous expansion of application scenarios, people need to process and utilize a large amount of document information. However, the traditional manual processing method is inefficient and cannot meet the needs of modern life and work. Intelligent analysis and processing of document images is an important and challenging research problem. Intelligent document recognition technology is based on artificial intelligence and machine learning technologies, which can automatically identify various information in documents, such as text, images, tables, barcodes, etc. , and then sort, archive, summarize, extract, etc.

Document image intelligent analysis and processing technology is widely used in all aspects of people's lives, such as automatic analysis and processing of bank bills, automatic identification of express waybills, analysis and identification of textbooks, analysis and understanding of ancient manuscripts, digital archives, digital libraries, etc. Etc., greatly improving the information retrieval, processing, dissemination rate. In short, the emergence and development of document image analysis and recognition technology has greatly facilitated people's lives, and also greatly promoted the development of our society towards intelligence, digitization, and informationization.

Hehe Information has more than ten years of in-depth experience in the field of intelligent text recognition. Based on its own cognition in the industry, it has designed topics such as functional innovation and commercial promotion of Scanning Almighty King based on the needs of students .

Relevant competition questions have attracted nearly 300 teams from more than 70 colleges and universities across the country to actively participate, and many excellent works have also emerged.

Let's take a look at the "wonderful ideas" of young students!

1 Beijing Forestry University: Document Format Conversion

Beijing Forestry University'sdo your bestEmphatically put forward the design scheme of document format conversion.

In the digital age, more and more people need to digitize handwritten text. In particular, there are paper-based and digital office learning needs in schools and various professional fields. For example:

  • College students study note record conversion. Convert handwritten notes, notes, memos and other texts into electronic texts for easier management and retrieval;
  • Special professional special scene. Documents such as medical students, law students, many handwritten medical records and legal documents need to be converted into electronic texts for better management and sharing;
  • In personal life, more and more people need to convert handwritten letters, greeting cards, etc. into electronic texts for preservation and sharing;
  • In the education industry, students need to use handwritten notes and answer questions, and teachers need to review and archive students' handwritten test papers.

insert image description here

Beijing Forestry University: The data in this figure shows the expectation level of the "handwriting to word function" for the three types of students whose note-taking frequency is "often", "occasionally" and "never".

do your bestProposed their technical solution: First, they need to collect a large number of handwritten text pictures, and preprocess these pictures, such as resizing, contrast and brightness, for subsequent training and recognition, and then design a suitable deep learning model, using a part of handwriting Text and pictures are used for model testing and verification, and the model is optimized and adjusted according to the test results to improve the recognition accuracy and robustness of the model.

However, the difficulty of handwritten character recognition is far greater than that of handwritten digits, so there are great challenges in the design and optimization of neural network architecture and the quality of data sets, but their ideas are still very good for the improvement of product functions. of inspiration.

also,do your bestThere are also many interesting features designed:

  • CAD and PDF mutual conversion function

    Serve engineering design majors such as engineering, science, and art, and output viewable PDFs in engineering design to CAD format for re-editing. Converting CAD drawings to PDF format can be conveniently saved and archived, making the files easy to manage and consult. Both CAD and PDF formats can be used as the carrier of digital files, making file transfer, sharing and backup more convenient.

  • Video scanning to extract page frames and convert pictures

    It can be used for video content identification and detection for college students. Extract the page frames in the video and convert them to PDF format and high-definition pictures. Recognition and extraction of PPT text content in learning video materials, online class videos, and screen recordings. It is convenient for students and teachers to refer and learn. Make learning video documents or reports, etc.

These functional designs rely on the actual needs of college students, and have practical reference value for the functional expansion of existing products.

2 Zhejiang University of Traditional Chinese Medicine: personalized question bank

zhejiang university of traditional Chinese medicineFormer Infinity Rabbit TeamUse the form of document survey to analyze the pain points in the current learning process: students often need to make sufficient preparations before, during and after class, and invest a lot of time in learning. However, there are many difficulties in this process, including difficulty in previewing new words before class, slow note-taking in class study, after-class review, numerous homework, inability to share materials in time, easy-to-remember answers for review after completing the test papers, and scattered questions for end-of-term review.

insert image description here

Zhejiang University of Traditional Chinese Medicine: After-school review is the most important scene in the minds of students

Former Infinity Rabbit TeamSummarize the following six key scenarios for functional design of the product

insert image description here

Zhejiang University of Traditional Chinese Medicine: Six Demands

Some scenarios already have existing solutions. For example, you can use the new word explanation function of Scan Almighty King to explain new words. First, take a photo and upload a page of the textbook that needs to be previewed. You can easily explain by clicking on proper nouns or English words that you don’t understand . You can click multiple new words at the same time, and an explanation will appear in the blank space on the right side of the picture. Improve the speed and efficiency of preview, save time to preview other textbooks, and easily improve the efficiency of class.

insert image description here

Another example is the text-to-handwriting function , which can convert computer fonts into handwritten fonts by scanning with the Almighty Scanner. It can be converted according to the handwriting template uploaded by oneself, or can be selected from the handwriting template of Scan Almighty King. The background of handwriting can also be selected. There are various backgrounds such as homework grid lines and grids, making handwriting more authentic and credible

insert image description here

Former Infinity Rabbit TeamIt also optimizes and innovates the original functions of Scan Almighty King. For example, the test paper erasing function - after scanning multiple questions in a photo, each question is automatically separated, and the original test paper erasing function can be used to erase the writing on the questions. Then, use the function of disordering the questions and generating a question bank to easily merge the scattered questions together, and they can appear in a random order, preventing us from memorizing the answers in order and finally changing the order in which the questions appear. The scanned questions can be saved in the question bank. When users need to review and strengthen their impression, they can review the questions from the question bank, which is very convenient.

Furthermore, it is the design of personalized question bank. Such a design is completely based on the combination and expansion of existing functions, but it can enable users to manage and study personalized question banks conveniently, greatly improving user experience and learning effects.

insert image description here

Zhejiang University of Traditional Chinese Medicine: Personalized Question Bank Design

besides,Former Infinity Rabbit TeamSimple social functions are also designed for Scan Almighty King. For example, adding friends, chatting, forwarding, transferring files, etc. For the page design of abnormal situations, the illustration form of the brand image is also used, which not only adds interest but also plays a role in promoting the brand image and improving the technical temperature of the product.

insert image description here

3 Central South University of Forestry and Technology: Interactive Scene Mining

Central South University of Forestry and TechnologyZhejiang core teamDivide the existing toolbox functions of Scanner Almighty into scanning service, format conversion, document editing and other four categories, and conduct a very detailed in-depth analysis and expansion of each function. At the same time, 1,000 college students were randomly surveyed to analyze the situation of college students using the various functions of Scanner and their evaluation of Scanner

insert image description here

Central South University of Forestry and Technology: product module division

insert image description here

Central South University of Forestry and Technology: Research on the use of user functions

Take PPT as an example,Zhejiang core teamFirst, compare the actual needs and pain points of traditional methods

serial number Scenario use Pain points of the traditional way
1 I just want to take pictures of the PPT range You can’t just take pictures of the PPT part, but other areas
2 The position is biased and cannot face the PPT The captured PPT images are distorted and difficult to correct later
3 Shoot multiple PPTs continuously Cannot automatically combine multiple PPTs into one file
4 Extract text information from PPT Cannot automatically recognize and extract text

Then find a solution from the product, that is, use 拍PPTthe function of scanning Almighty King, which will automatically
capture PPT and filter out non-PPT images; after shooting, it will automatically correct the PPT and turn it into a positive PPT image; at the same time, it supports continuous After shooting, use the pdf preview and sharing function to form all PPT photos into a pdf document

insert image description here

Central South University of Forestry and Technology: Scanning Almighty King Solution

Another example is form recognition. Form recognition and processing is a challenging task in intelligent documents, specifically in

  • Diversified table structures: Tables can have many different structures, including merged cells, multi-level headers, intersecting rows and columns, etc., which complicates identifying and parsing tables. Different table structures may require different processing methods;
  • Inconsistent document quality: The quality of scanned documents or images may vary, and there may be problems such as blurring, noise, skew, shadows, etc., which will affect the accuracy of form recognition;
  • Variety of fonts and typography: The variety of fonts, font sizes, colors, etc. in tables makes text recognition more challenging. Different layout methods may lead to recognition errors, especially when the structure of the table is affected by the layout;
  • Merging cells and crossing rows and columns: Merging cells and crossing rows and columns in tables may cause difficulties in data extraction and analysis. Reducing this information correctly to maintain the accuracy of the table structure is a challenge.
  • Text language diversity: the text in the table may be in different languages, and even multiple languages ​​may exist in the same document, which increases the complexity of table data recognition;
  • Ambiguity and context: In some cases, the data in the table may be ambiguous and need to rely on contextual information to understand it correctly. Lack of context can lead to data parsing errors;
  • Large-Scale Datasets and Difficulty in Training: The performance of table recognition usually requires a large amount of labeled data for training, however, well-labeled table datasets can be expensive and time-consuming. At the same time, labeling complex table structures may require specialized domain knowledge.

The processing of forms is a common requirement in daily work

serial number scenes to be used scene description For people
1 write thesis Write a thesis, search for relevant data online, the data is presented in the form of a graph, and needs to be summarized into a table Scientific research group
2 Daily office Relevant paper-based chart information needs to be formed into an electronic form student cadre family
3 Write data analysis report When doing various data analysis reports, the collected data is presented in pictures, and it is impossible to organize and edit the data, looking for the law of the data statistical family

same,Zhejiang core teamCombining with Scan Almighty King, a solution is given. Using the table recognition function, import a picture or take a picture, and convert it into an Excel table with one click. If you want to export multiple sheets, select Add Page in the Excel export interface, and then Import the picture, click on the picture, and finally click "Export to Excel", the data of multiple tables will be automatically summarized into one Excel table.

besides,Zhejiang core teamIt also gives a very rich interactive scene analysis such as taking ID photos, text conversion, and adding watermarks, which improves the product usage plan.

4 Chongqing University of Posts and Telecommunications: Big Model Empowers Smart Documents

Chongqing University of Posts and TelecommunicationsFourier transform teamCombined with more specific technologies, it gives ideas in various functional scenarios

Taking the study and research scene as an example,Fourier transform teamFirstly, the identification creativity of the mind map is analyzed. The mind map is simple but very efficient. It can be applied in any field of study, life, and work. It can split large-scale content, find affiliation, reduce the number of words, and facilitate understanding and memory. Among them, the bracket mind map is to split and analyze a whole thing, so as to reveal the relationship between the whole and the parts, and can form a clearer understanding of the microscopic composition of the whole thing.

insert image description here

Chongqing University of Posts and Telecommunications: Bracket Recognition

Fourier transform teamIt is pointed out that there are currently two ways to make a bracket mind map:

  • The advantage of making an electronic version of the map through software is that it is easy to edit and share, but the device is very restrictive, and it is more convenient to input when the device is connected to a keyboard;
  • Hand-drawing on paper has the advantage of deepening memory while understanding the internal logic of knowledge, but poor editability (typos cannot be erased directly), poor portability, difficult to beautify, and poor interactivity

At present, there are methods to realize the recognition of electronic maps and the hand-drawing of electronic maps, but there is a lack of methods to digitize hand-drawn bracket mind maps. However, there are still many challenges in the realization of this function, such as: recognition accuracy, Regeneration restrictions, sharing security, etc.

Fourier transform teamCombining traditional image processing, counting, bracket recognition, text recognition, hierarchical logic generation and other technologies, the following mind map recognition process is designed. The overall structure is clear and feasible

insert image description here

Chongqing University of Posts and Telecommunications: Mind Map Recognition Process

in,Fourier transform teamA hierarchical logic generation algorithm based on boundary information is also independently designed to aggregate the text block set and the left brace set respectively.

insert image description here

Chongqing University of Posts and Telecommunications: A hierarchical logic generation algorithm based on boundary information

also,Fourier transform teamAlso designed to expand the business

On the basis of developing the corresponding mind map making software or software interface, the collaborative editing function is added. First, teachers can view the mind maps made by students conveniently in real time, improving the efficiency of smart classrooms; People can view high-quality mind maps at the same time to improve the efficiency of sharing and learning. On the basis of the scanned results, the content of the map can also be analyzed, and a multimedia retrieval and recommendation system can be established to meet the needs of students who want to learn all the knowledge represented by the mind map at the same time when learning with the mind map. First, By retrieving relevant learning videos, the second is to retrieve relevant teaching materials, and the third is to supplement knowledge.

In short, a very detailed and feasible design scheme is given from technology to business.

In the tide of the information age, big language models are leading a new era of artificial intelligence with amazing speed and unlimited creativity. Large language models can not only understand and analyze human language, but also generate high-quality, creative text. From writing assistants to content creation, from automated customer service to medical diagnostics, they are driving innovation across industries. These models are constantly self-learning through massive amounts of data and continuously improving their performance. They are able to draw inspiration from multiple domains of knowledge to generate innovative ideas and solutions.

Fourier transform teamGrasping the hot spots of the times, we designed the functional idea of ​​intelligent scanning question answering AI based on the large language model

insert image description here

Chongqing University of Posts and Telecommunications: Intelligent scanning question answering AI based on large language model

The overall process is:

  1. Users scan questions: Users submit questions to be answered by taking pictures or inputting questions.

  2. Knowledge base matching: The AI ​​system performs information retrieval in the pre-built knowledge base and finds the original text of knowledge points related to the question.

  3. Design rich prompt words: The system uses the information related to the question to design several rich prompt words (Prompt), which will be used as the initial text for subsequent input into the large language model.

  4. Input a large language model: Use a powerful large language model, such as GPT-4 or Wenxin Yiyan, to take rich prompt words as input to generate more contextual and semantic answers.

    Next, define the output modes of the two AI systems:

  5. Xueba version: In the Xueba version, it can directly give answers according to the questions, and the answers will be generated by the large language model based on the questions and relevant information.

  6. Tutor version: In the tutor version, you can give the source and explanation of the knowledge points involved in the topic. This can help users better understand the background and relevant knowledge of the problem.

Smart Scan Q&A AI combines information retrieval, the ability to generate large language models, and output customized answers. It can provide users with more comprehensive answers, not only answering questions, but also providing relevant intellectual background and explanation. This technology helps to improve users' learning efficiency, deepen their understanding of knowledge, and provides users with a convenient self-learning support tool.

5 summary

With the development of information technology and the continuous expansion of application scenarios, people need to process and utilize a large amount of document information. However, the traditional manual processing method is inefficient and cannot meet the needs of modern life and work. Therefore, intelligent analysis and processing of document images has become an important and challenging research problem. From the ideas of each participating team, we can see that intelligent document processing technology based on artificial intelligence and machine learning can automatically identify various information in documents, such as text, images, tables, etc., and then classify, archive, Summarization, extraction and other processing have greatly improved the speed of information retrieval, processing, and dissemination. This application is not limited to the needs of students, but can be extended to more application fields, such as automatic identification of express waybills in the logistics industry, automatic analysis and processing of bank notes in the financial industry, etc., which has broad application prospects.

In the practical application of intelligent document processing, the product scanning Almighty King of Hehe Information has shown powerful functions, for example

  • Office document processing : Whether in the office or in the home office environment, users can use Scanner to scan and process various documents, including but not limited to documents, forms, charts, handwritten notes, etc. Not only that, regardless of the lighting conditions and background complexity, the Almighty Scanner can optimize the image through the AI ​​​​smart scanning engine to provide high-definition and high-quality scanning results.
  • Educational data processing : For teachers and students, Scanner can be used to scan, share and save teaching materials, test papers, notes and other educational materials. Especially in the current environment where distance education is becoming more and more popular, Scanner can easily convert paper materials into digital format, which is convenient for teaching sharing and data storage.
  • Business document processing : In business scenarios, Scanner can be used to process various business documents, such as invoices, contracts, orders, etc. Its intelligent high-definition filter function can clearly identify and extract text and graphic information in documents to meet various business needs

Scanner Almighty integrates a variety of advanced technologies in intelligent document processing, such as bending correction, anti-reflection, anti-moiré, etc., with highly accurate recognition capabilities . Furthermore, the multilingual recognition technology of Scanner Almighty is not limited to several mainstream languages, but covers many languages ​​around the world. This enables the Scanner to provide services to users all over the world, no matter what language the user uses, the Scanner can accurately identify and process it. It also makes document processing smoother. Users don't need to make complicated settings or manually select languages. Scanner can automatically identify the language of documents and perform precise processing.

In short, as an office product for efficient document processing, Scanner Almighty greatly improves the user experience and meets the needs of global office. This makes the Almighty Scanner widely used and praised all over the world.

Guess you like

Origin blog.csdn.net/FRIGIDWINTER/article/details/132244191