The encounter between traditional heritage and technology, the digitization and protection of ancient Yi script

  Ancient Yi script is the traditional script of the Yi people in China and has a long history and cultural value. However, character recognition has been a challenging task due to the complex shape of ancient Yi script and the absence of a standardized character set. This article introduces the character recognition technology of ancient Yi script combined information, aiming to improve the accuracy and efficiency of automatic recognition of ancient Yi script and promote the research and protection of ancient Yi script.

1. Protection of ancient Yi script and ancient Yi script books

  Ancient Yi script is a hieroglyphic writing system, and the glyphs are mostly represented by the appearance or partial features of physical objects. It uses horizontal, vertical, dot, fold and other lines and various simple geometric shapes to form glyphs, which has a certain degree of legibility. Due to the long-term language communication barriers and geographical environment restrictions in the Yi area, the development and evolution of ancient Yi language in different regions shows certain differences. Therefore, ancient Yi script is not a unified writing system, but a series of related but distinctive writing methods.
  Ancient Yi characters have an important status and significance in Yi society. It is an important way for the Yi people to understand their own history, culture and social form, and it is also an important carrier for inheriting and promoting the traditional culture of the Yi people. Ancient Yi texts record various aspects of information such as the daily life, religious beliefs, natural environment and historical events of the Yi people, providing precious clues for understanding Yi society.
  However, automatic recognition of ancient Yi script has been a challenging task due to its complex shape and lack of a standardized character set. With the rapid development of computer vision and machine learning technology, researchers have begun to explore ancient Yi character recognition technology.

Background on the protection of ancient Yi books
Insert image description here

  1. Protect cultural heritage: As a unique cultural heritage of the Yi people, ancient Yi script has important historical and cultural value. Protecting ancient Yi books is part of protecting and inheriting the traditional culture of the Yi people, and helps preserve and promote the unique cultural characteristics of the Yi people.
  2. Academic research value: Ancient Yi books are important materials for understanding the history, culture and social form of the Yi people. By studying ancient Yi books, we can gain an in-depth understanding of the development and evolution of Yi society, religious beliefs, daily life and other aspects. The study of Yi folk customs, history, linguistics and other subjects has great academic value.
  3. Promote national pride: Protecting ancient Yi books can help the Yi people build cultural confidence and pride. Inheriting and carrying forward the traditional culture of the Yi people is of great significance to maintaining and promoting national unity and promoting the development of Yi society. The protection of ancient Yi books can allow the Yi people to better understand and identify with their own cultural traditions, and promote the development and inheritance of Yi culture.

2. Important and difficult points in identifying ancient Yi characters

  1. Lack of standardized glyph and word coding: There are diversity and regional differences in ancient Yi glyph and word coding, and there is no unified standardized dictionary and coding specifications. This brings certain difficulties to the identification and interpretation of ancient Yi script, which requires in-depth research and comparison of multiple ancient Yi documents to clarify the glyphs, meanings and grammatical rules.Insert image description here

  2. The glyphs are complex and diverse: The glyphs of ancient Yi are represented by pictograms and simplified geometric lines, and the glyphs are complex and diverse. Different variations and contexts may result in the same glyph representing different meanings, or glyphs having similar shapes but different meanings. Correctly identifying and interpreting these glyphs requires careful research and discernment of ancient Yi script. Original picture
    What it looks like after scanning:Insert image description here

  3. Document quality and preservation status: Ancient Yi documents mostly exist in manuscripts and ancient books. After long-term preservation and transmission, there may be problems such as damage, blur, and missing characters. This brings challenges to the identification and interpretation of documents, which requires document restoration and digital processing, and combined with other clues for auxiliary analysis and verification.

  4. Lack of annotation and linguistic knowledge: The identification of ancient Yi script requires understanding of the Yi language and linguistic knowledge. The words, grammatical structures, context, etc. in the ancient Yi language need to be compared and analyzed with the Yi language. Only by having a certain understanding of the phonetic, semantic and other characteristics of the Yi language can we accurately understand the meaning of the ancient Yi language.

  5. Lack of professional talents and research resources: The research and identification of ancient Yi script requires relevant professional knowledge and skills, and professional talents in this area are relatively scarce. The access to research resources, documents and documentary materials related to ancient Yi is also relatively limited, which brings certain difficulties to the identification and research of ancient Yi.

  In order to overcome these difficulties, it is necessary to establish a standardized glyph and word encoding system through in-depth research on ancient Yi and the study of Yi language, strengthen the preservation and digital processing of ancient Yi documents, and cultivate more ancient Yi research talents, strengthen academic research and interdisciplinary cooperation, and provide more research resources and tools to promote the identification and interpretation of ancient Yi scripts.

3. A breakthrough way to improve image quality and restore details

3.1 Hehe information

  HEIC OCR technology is an advanced technology used to automatically identify and extract text content from images in HEIC format. Hehe Information is an efficient image compression format commonly used in photos taken on iPhone and other Apple devices.
  Text recognition technology (OCR) is a computer vision technology that uses image processing and pattern recognition algorithms to identify text areas in images and convert them into editable and searchable text data. Hehe Information's text recognition technology combines OCR technology with the HEIC image format to achieve accurate recognition and extraction of text content in HEIC images.

3.2 Hehe Information’s text recognition technology

  As globalization further deepens, multilingual recognition has become a key requirement for intelligent document processing systems. However, the challenges brought by multilingual recognition are also considerable. These challenges not only come from the unique character sets, writing rules and grammatical structures between different languages, but also include various complex text forms and layout methods.
  For example, Arabic is written from right to left, and the same letter has different shapes in different positions in the word. For this language, traditional text recognition methods are often difficult to deal with. For another example, the difference between traditional Chinese and simplified Chinese requires character recognition to have the ability to process these two forms. In addition, languages ​​like Thai and Hindi have relatively complex writing systems, and sometimes one character may appear above or below another character, which undoubtedly adds additional difficulties to text recognition.
  Text recognition technology, also known as Optical Character Recognition (OCR, Optical Character Recognition), is a technology that converts image files or scanned documents into editable and searchable text data. This technology can identify printed characters, analyze the content in the image, and convert the character structure into encoding methods such as ASCII or Unicode.

Text recognition technology mainly includes the following steps:

  1. Preprocessing: This step mainly performs operations such as noise removal, grayscale, binarization, and background removal on the image to better recognize text.

  2. Text segmentation: This step is to segment the pre-processed image into words or characters for the next step of recognition.

  3. Feature extraction: This step is to extract the features of the segmented text. These features can be the shape, size, slope, etc. of the text.

  4. Text recognition: This step is to identify the text content through machine learning or deep learning based on the extracted features.

  5. Post-processing: This step is to correct and optimize the recognized text to improve the accuracy of recognition.

  Intelligent text recognition technology is one of the core technologies of Hehe Information. It mainly consists of three core modules: intelligent image processing, complex scene text recognition based on deep learning, and natural language processing (NLP). Among them, intelligent image processing technology can accurately correct document images such as curved surfaces, shadows, and moiré patterns, creating good conditions for subsequent text information extraction and recognition; complex scene text recognition technology can adapt to multi-language and multi-format , multi-style and other complex scenes to perform text extraction, and combine with leading NLP technology to conduct semantic understanding of the identified results.
  In the past three years, Hehe Information's intelligent text recognition technology has won 15 championships in international artificial intelligence competitions such as ICDAR and ICPR. Its academic results have been published at top conferences such as CVPR, AAAI, and ACL, and related projects have won awards from the Chinese Image and Graphics Society. (CSIG) Second Prize of Science and Technology Progress Award.
  Hehe Information has accumulated certain achievements in the field of ancient character recognition. At the World Artificial Intelligence Conference in 2021 and 2022, Hehe Information demonstrated the application of intelligent text recognition technology in oracle bone inscriptions and Western Zhou Zhongding inscriptions (bronze inscriptions), and won the recognition of hundreds of mainstream media including CCTV, People's Daily, Xinhua News Agency, etc. media attention.
  Although the research on ancient Yi script identification is still in its infancy, the introduction of advanced AI technology to establish a unified database will undoubtedly be of great help in enhancing the continuity of ancient Yi script research and reducing tedious retrieval work. Research related to the digitization of ancient Yi script is currently relatively scarce, and this project will fill the current research gaps at home and abroad.
  Hehe Information’s early research on oracle bone inscriptions and bronze inscriptions also made the identification of ancient Yi inscriptions a matter of course: Oracle and ancient Yi inscriptions trace their origins to bone inscriptions, and writing began with bone inscriptions and developed later. There are similarities in the recognition of oracle bone inscriptions, bronze inscriptions, small seal script, official script, regular script, etc. The "Guizhou Ancient Yi Script Image Recognition and Digital Proofreading Project" jointly launched with Shanghai University this time has also become a joint venture. Information intelligent text recognition technology is an important milestone in enabling the protection of minority languages ​​​​and the inheritance of ancient culture.

  1. Efficient recognition: Compared with traditional image formats, the HEIC format can compress image file sizes more efficiently, and text recognition technology can quickly and accurately extract text information from HEIC images to achieve high-efficiency text recognition.Insert image description here

  2. Accuracy and reliability: Hehe Information’s text recognition technology uses advanced OCR algorithms and training models, which can identify text in various fonts, sizes and arrangements in HEIC images, and provide high-quality recognition results.

  3. Multi-language support: Hehe Information's text recognition technology supports text recognition in multiple languages, including common Latin alphabet text, Chinese, Japanese, Korean, etc., which can meet text recognition applications in different languages.

  4. Data extraction and application: Through Hehe Information's text recognition technology, the text content in the HEIC image can be converted into editable text data, which is convenient for users to copy, paste, edit, etc., and also provides automated processing and text analysis. convenient.

  Hehe Information's text recognition technology has wide applications in many fields, including document processing, translation tools, image search, automated data processing, etc. It can improve work efficiency, reduce manual input and transcription errors, and provide a convenient and reliable solution for the utilization and management of digital information.

3.3 Intelligent HD filter technology

  Scanner Almighty's "Smart HD Filter" is officially launched. During use, just click the shooting button to get a picture that is as clear and flat as the original printed document. Compared with traditional scanning software, users do not need to think about shooting angles, light sources, and backgrounds when using "Smart HD Filters." This function can intelligently detect problems in images, automatically determine image optimization methods, and eliminate interference such as blur, darkness, and fingers. All factors are dealt with, and 90% of scanning problems in life and production can be solved with one click.

  The realization of "intelligent HD filter" is inseparable from the support of the intelligent scanning engine AI-Scan. From the three dimensions of "image processing, text recognition, and layout restoration", from perception, cognition to decision-making, the engine uses AI to automatically "physically check" image quality, lock in problems and match corresponding optimization solutions, making image processing more intelligent and text-free. The recognition is more accurate and the layout is restored "what you take is what you get".

  Scanner, an application based on deep learning, provides a powerful intelligent document processing platform. Below we will delve into how it uses deep learning and AI technology for intelligent document processing.

Scanner Almighty Smart HD Filter Function Processing and Detection Recognition Results:

Original image:
Insert image description here
Result after recognition:

Insert image description here

3.3.1 Intelligent scanning engine AI-Scan and Scanner Almighty

  The intelligent scanning engine AI-Scan supports the implementation of many black technologies of Scanner Almighty. This engine mainly includes two parts: image perception and optimized scene decision-making:

  1. Image Perception : General image processing - In this stage, the application uses deep learning models to identify and understand the content of the image. Through deep learning models, applications can perceive features such as lighting, shadows, colors, and tilt angles in images.
    For example, for finger occlusion, it can perform finger removal processing; for images that are too dark or too bright, it can adjust the brightness and contrast of the image; for tilted documents, it can automatically perform tilt correction, etc.

  2. Scenario-based decision-making : Based on the results of image perception and making general and scenario-based judgments, Scanner Almighty can intelligently decide how to optimize the image of the document. *Scenario-based image processing - Based on your input prompts for services that may be needed, select test paper processing when you see the test paper. If the previous processing was not good, further optimize the processing.

3.3.2 Scenario application

  In the practical application of intelligent document processing, Scanner Almighty shows powerful functions:

  1. Office document processing : Whether in the office or in a home office environment, users can use Scanner to scan and process various types of documents, including but not limited to files, tables, charts, handwritten notes, etc. Not only that, regardless of lighting conditions and background complexity, Scanner Almighty can optimize images through the AI ​​​​smart scanning engine to provide high-definition and high-quality scanning results.

  2. Educational data processing : For teachers and students, Scanner Almighty can be used to scan, share and save educational materials such as textbooks, test papers, notes, etc. Especially in the current environment where distance education is becoming more and more popular, Scanner Almighty can conveniently convert paper materials into digital format, which facilitates teaching sharing and data storage.

  3. Business document processing : In business scenarios, Scanner Almighty can be used to process various business documents, such as invoices, contracts, orders, etc. Its intelligent high-definition filter function can clearly identify and extract text and graphic information in documents to meet various business needs.

4. The significance of identifying ancient Yi characters

  The significance of identifying ancient Yi characters is mainly reflected in the following aspects:

  • Protect and inherit the cultural heritage of the Yi people: Ancient Yi script is one of the traditional writing systems of the Yi people and an important part of the Yi people's culture. Through the identification and research of ancient Yi characters, the rich and diverse cultural heritage of the Yi people can be protected and inherited, and ancient Yi documents can be excavated and preserved so that future generations can understand and inherit the cultural connotations of the Yi people's language, history, religion, traditional knowledge, etc.
  • Yi language research and language protection: Ancient Yi script is one of the important expressions of Yi language. Ancient Yi script contains the vocabulary, grammar, expressions and other information of Yi language. Through the identification and interpretation of ancient Yi characters, we can deepen the research and understanding of the Yi language, help linguists study the origin, development and evolution rules of the Yi language, and provide an important research basis for the protection and revitalization of the Yi language.
  • Document research and academic exploration: Ancient Yi documents contain rich historical, cultural, geographical and social information. The identification and research of ancient Yi documents can help reveal the changes in Yi society, traditional customs and knowledge systems. Ancient Yi documents may also involve contacts and exchanges with other ethnic groups and regions, providing important resources and clues for interdisciplinary academic research.
  • Education and cultural exchange: Through the identification and digital processing of ancient Yi characters, ancient Yi teaching materials and resources can be provided to educational institutions to help Yi students learn and inherit their own culture. In addition, the research results of ancient Yi languages ​​can promote the exchange and cooperation of Yi culture and promote mutual understanding and exchanges between different cultures.

  In short, the significance of identifying ancient Yi characters is to protect and inherit the cultural heritage of the Yi people, promote the research and protection of the Yi language, enrich the field of academic research, and promote education and cultural exchanges. Through the knowledge and understanding of ancient Yi language, we can better understand and inherit the unique culture of the Yi people, and contribute to the coexistence and development of multiculturalism.

5. Summary

  This article introduces the text recognition technology of Hehe Information, which can extract the text content in ancient Yi documents from HEIC images, helping to protect and inherit the language and cultural heritage of ancient Yi. Through digital processing and storage, ancient Yi documents can be preserved, disseminated and studied, avoiding the loss and degradation of original documents.
  Provides an efficient and accurate tool for the study of ancient Yi script . The identified ancient Yi texts can be used in multiple disciplines such as linguistic research, historical research, and sociological research, helping to reveal the changes in Yi society, the characteristics and evolution rules of Yi language, and the connotation and evolution of Yi culture. extension.
  It also provides convenience for the education and popularization of ancient Yi language. Through recognition and digital processing, the text content of ancient Yi can be converted into editable text data, which facilitates the production of ancient Yi teaching materials, learning materials and tools, and promotes the learning and inheritance of ancient Yi among Yi students and communities.

Guess you like

Origin blog.csdn.net/m0_75058342/article/details/132909526