Intelligent text recognition technology - AI empowers the protection of ancient Yi texts

Preface

Artificial intelligence has great potential and significance in the protection of ancient Yi books. Through digital, automated and intelligent means, the cultural heritage of ancient Yi can be better protected and inherited, and the inheritance and development of Yi culture can be promoted.

1. What is ancient Yi language?

1.1 Background of ancient Yi script

Ancient Yi script is an ancient writing system used by the Yi people in China. The Yi people are an ethnic minority in China, mainly distributed in Yunnan, Sichuan, Guizhou and other places. The ancient Yi language originated around the 13th century BC. It is the product of the long-term accumulation and development of the Yi people and has a long history and unique cultural connotation.

Ancient Yi script is used in a wide range of areas, including literature, mythology, poetry, ballads, genealogy and other aspects. The Yi people use ancient Yi characters to record rich historical, cultural, religious and social information. Ancient Yi script is not only an important communication tool for the Yi people, but also an important medium for them to inherit culture, promote ideas, and express emotions.

However, due to historical changes and social changes, the use of ancient Yi script has gradually decreased. Currently, ancient Yi script is facing the challenge of protection and inheritance. Relevant institutions and scholars are committed to researching and protecting ancient Yi script to ensure its preciousness. cultural heritage will not be lost.

1.2 Background on the protection of ancient Yi books

Currently, many technology manufacturers around the world, including Google, are using digital technologies such as AI and OCR to protect ancient books. Domestic institutions such as Longquan Temple have also invented an AI technology called "Buddha Native", using a word recognition engine based on deep learning to successfully digitize the Tripitaka version of the "Sixty Avatamsaka".

The emergence of these projects and technologies provides new possibilities for the protection and digitization of ancient books. Promote cooperation and sharing in the field of artificial intelligence and the protection of ancient Yi languages, and attract more artificial intelligence experts and scholars to participate in the protection of ancient Yi languages. Through interdisciplinary and cross-field cooperation, the role of artificial intelligence in the protection of ancient Yi texts can be fully utilized and the effectiveness of protection can be improved.

However, digital technology still faces some challenges in the protection of ancient books. The complexity of ancient books, the fragility of paper, and the particularity of writing require continuous efforts to solve. At the same time, it is also necessary to strengthen the consideration of the storage, backup and security of digital ancient books to ensure the long-term protection and inheritance of these precious cultural heritages.

2. Important and difficult points in identifying ancient Yi characters

2.1 The original source of ancient Yi language is difficult to obtain

First, Yi priest Bomo is usually reluctant to sell ancestral books. For them, selling books is considered a shame as these books carry the wisdom and cultural heritage of their ancestors. They prefer to pass the books on to suitable heirs rather than sell them to outside researchers.

Secondly, some Yi priests will request that their scriptures be cremated with them when they die. This means that these books may be destroyed, making it more difficult to obtain the original ancient Yi texts.

In addition, researchers of ancient Yi languages ​​need to stay in the local area for a long time and establish good relations with the Yi community. It takes time and patience to gain the trust and support of local people. Only by establishing a close relationship with the inheritors of the ancient Yi language can it be possible to obtain their authorization and permission, and then obtain the original text of the ancient Yi language.

After getting the ancient book, if the pages are incomplete or sticky, you need to carefully separate them, and then paste them into larger sheets of paper for easier inspection and reading. Some pieces of paper that have become brittle due to age need to be rejoined, like this:

2.2 The translation process of ancient Yi is cumbersome

1. The main reasons why the ancient Yi translation process is cumbersome are as follows:

  1. 1. The protection and research of ancient Yi script is more difficult: Ancient Yi script is an ancient writing system that has not yet been digitized, and there is no reserved Unicode encoding section. During the translation process, Yi transcribers are required to manually copy the Yi characters and match the international codes with the Yi characters.
  2. Participation of Yi native speakers: If the translator’s native language is not Yi, they need the help of Yi native speakers for transliteration. This collaborative process can take time and effort, making translation more difficult.
  3. 3. Multiple translation processes: First, the translator needs to perform literal translation in Chinese word for word, converting the ancient Yi characters into Chinese characters. Then, they need to use fluent Chinese to translate the ancient Yi meaning into Chinese expressions. This multiple conversions increase the complexity and time-consuming nature of translation.

This four-line Yi-Chinese translation method not only retains the original appearance of the ancient book, but also makes the translated content easy to understand. Although this method has some difficulties in digitization, it has made an important contribution to the protection of ancient books and the inheritance of Yi culture.

In the past, the translation of ancient books usually ended at this step. A faster translator might be able to publish a translation in just one or two years, while a slower translator might take several years. This depends on the length of the original.

If a database and translation system for ancient Yi characters can be effectively established, it will be possible to efficiently identify ancient Yi characters.

3. Hehe Information Intelligent Text Recognition Technology

In the past ten years, Hehe Information has done a lot of research on complex layout recognition and structured intelligent understanding of images with intelligent text recognition technology as its core. The academic results have been published at top conferences such as CVPR, AAAI, and ACL. , and achieved excellent application results, which provides technical support for ancient Yi language research.

Intelligent text recognition technology is one of the core technologies of Hehe Information. It mainly consists of three core modules: intelligent image processing, complex scene text recognition based on deep learning, and natural language processing (NLP). Among them, intelligent image processing technology can accurately correct document images such as curved surfaces, shadows, and moiré patterns, creating good conditions for subsequent text information extraction and recognition; complex scene text recognition technology can adapt to multi-language and multi-format , multi-style and other complex scenes to perform text extraction, and combine with leading NLP technology to conduct semantic understanding of the identified results.

As one of the oldest writing systems in the world, ancient Yi script is a mysterious and dazzling mark on the map of Chinese civilization. Hehe Information teamed up with teams from Shanghai University and South China University of Technology to carry out unified coding for the existing ancient Yi characters in Southwest Yi and Yunnan and Guizhou areas, and recently released the industry's first basic ancient Yi coding database (referred to as the "database").

It is reported that the database contains thousands of basic codes of ancient Yi characters. Through API data interfaces and other forms, the database is expected to help university researchers, cultural workers, hobbyists and other people quickly find the pronunciation and Chinese meaning of ancient Yi characters in the dictionary. , usage, like a "big dictionary", helps people lower the threshold for reading ancient Yi books and documents, and uses digital means to help protect and innovate traditional culture.

Studying the collection of ancient Yi characters will help to understand ancient books that have not been translated into Chinese and whose wording has not yet been standardized, and will play a deeper and more thorough role in the protection of traditional culture. At the same time, by establishing an ancient Yi database, it will fill the current research gaps at home and abroad. Hehe Information and South China University of Technology jointly established a joint laboratory for document image analysis, recognition and understanding, and collaborated with the School of Social Sciences of Shanghai University to jointly solve academic and technical difficulties in database construction.

In addition, Hehe Information’s Scanner Almighty King also launched “Smart HD Filters”. This function is based on AI technology and an intelligent scanning engine, which can automatically detect problems in the image and determine how to optimize the image, achieving one-click processing of interference factors such as blur, shadows, fingers, and screen lines. Users do not need to think about shooting angles, light sources, and backgrounds. Just click any shooting button such as single shot, multi-shot, scan, etc. to get a clear and flat picture as clear as the original printed document.

The picture on the left is the original picture, and the picture on the right is the ancient Yi book after being recognized by the intelligent high-definition filter function.

These efforts have opened up new paths for the research and inheritance of ancient Yi script. In the future, with the continuous advancement of technology, we are expected to achieve more efficient and accurate recognition and translation of ancient Yi characters, and make greater contributions to the protection and inheritance of ancient Yi characters.

4. The significance of identifying ancient Yi characters

The significance of the identification of ancient Yi scripts is to protect and inherit cultural heritage, promote language and cultural research, protect and promote cultural diversity, and provide learning and educational resources. Through the application of digital technology, we can better understand and inherit Yi culture and promote the diverse development and exchange of culture.

During the World Artificial Intelligence Conference in the past two years, the oracle bone inscription recognition and Western Zhou Dynasty bell and ding inscription recognition projects displayed by Hehe Information have become "popular dark horses" on the field. The technical points behind them are "bend correction" and "complex scene text recognition" Technologies such as Scanner Almighty have been applied to products represented by Scanner to optimize image processing effects and improve text recognition accuracy to meet the more diverse needs of more groups.

For example, the "handwriting erasure" function uses intelligent text recognition technology to divide the image to be processed into "erased areas" for handwriting and "non-erased areas" such as printed questions, and performs complex scenes such as noise, shadows, and background clutter. Processing, while using filter technologies such as edge correction and image enhancement to erase handwritten notes on test papers and homework, and present users with clear and beautiful paper images, which is very popular among parents and students.

5. Summary

Hehe Information’s early research on oracle bone inscriptions and bronze inscriptions made the identification of ancient Yi inscriptions a matter of course.

This school-enterprise cooperation on the "Guizhou Ancient Yi Image Recognition and Digital Proofreading Project" jointly launched by Hehe Information and Shanghai University will fill the current research gaps at home and abroad, and will also become a small language empowered by Hehe Information's intelligent text recognition technology An important milestone event in the protection and inheritance of ancient culture.

In the future, Hehe Information will also focus on the field of natural language processing, continue to improve AI's ability to "read" ancient texts, and achieve more understanding levels, so as to better promote the improvement of academic research efficiency and reduce the difficulty of understanding ancient texts. threshold, reaching a wider range of social groups in the fields of cultural tourism and cultural creativity, and giving traditional culture new vitality.

Guess you like

Origin blog.csdn.net/m0_63947499/article/details/133418047