The vertical large-scale model is getting better and better, decoding the first large-scale model in the field of intelligent proofreading in China "Midu Wenxiu"

A single branch is not spring, a hundred flowers bloom together and the garden is full of spring.

The rise of ChatGPT has triggered a global upsurge in large-scale model competitions. After going through the initial chaotic period, the large-scale model arena is now showing two clear routes: one is the basic general-purpose large-scale model track where giants represented by cloud service providers compete; the other is the basic general-purpose large-scale model On top of that, the vertical industry model built by industry "veteran drivers" who have been working in thousands of industries for many years.

At the WAIC 2023 World Artificial Intelligence Conference venue, we found a manufacturer named "Midu", which not only has a special booth comparable to that of major leading manufacturers, but also hosted the "Language Intelligence and Content Generation Summit Forum". What kind of confidence is this to make Midu so high-profile?

The "secret" of honey

From Midu's website, it can be found that it was founded in 2009. It is a language intelligence technology company with artificial intelligence technology as its core. It focuses on multi-modal multi-language intelligence technology and provides intelligent application software for various office scenarios of governments and enterprises. , is committed to providing a full range of intelligent application solutions for the construction of digital government, digital marketing, digital media, and digital cities.

Honeydu utilizes advanced cross-modal retrieval (CMR), multilingual proofreading (MLC), computer vision (CV), natural language processing (NLP), content generation (AIGC), knowledge graph (KG) and other artificial intelligence technologies to provide enterprises with And government agencies provide application software products such as intelligent proofreading, intelligent generation, and intelligent retrieval to empower enterprises and governments to transform and upgrade their office scenarios digitally and intelligently.

Up to now, Midu has served 20,000+ government customers, 10,000+ well-known enterprises and large enterprises.

Midu Wenxiu : The first large-scale model in the field of intelligent proofreading in China

Smart proofreading, a field that sounds relatively small, is one of Midu's main fields, and Midu is making it the ultimate. At this WAIC, Midu released the first domestic large-scale model in the field of intelligent proofreading, "Midu Wenxiu".

The name "Midu Wenxiu" comes from "Ouyang Xiu used to be a collation of the pavilion, and proofreading also meant to revise the text". Midu Wenxiu uses the large language model (LLM) as the technical base, and through the use of high-quality data to learn a variety of characteristic sub-tasks, it greatly improves the intelligence of Chinese proofreading and polishing capabilities. So far, in the evaluation results of the public test corpus, Midu Wenxiu has achieved a comprehensive SOTA (state- of-the-art, refers to the performance of the model in the specified task is currently the best in the industry). Midu Wenxiu not only assists professional users to improve the quality of proofreading, increase the speed of proofreading, and reduce the error rate, but also brings revolutionary work mode iteration and efficiency improvement to professional fields such as news publishing, media manuscripts, and government documents, and provides a new era of language and text High-quality work development focuses on intelligence and empowerment.

"Midu Wenxiu's performance in Chinese spelling correction and grammar correction tasks is significantly better than that of the general large-scale model ChatGPT, and the effect is improved by about 20% to 30%." Liu Yidong, CTO of Midu, commented.

Innovation does not happen overnight

It is understood that it took three years for the launch of Midu Wenxiu, and Fang sharpened his sword.

As early as 2020, Midu launched an intelligent text detection service, and used the deep learning model as a technology development strategy to try to intelligently solve basic proofreading problems such as typos.

In 2021, Midu will establish a multilingual proofreading and testing laboratory to gradually improve the construction of proofreading capability system, focusing on three main capabilities of "text punctuation errors", "knowledge errors" and "content-oriented risk identification".

At WAIC in 2022, Midu released the Midu proofreading pass AI-Box that supports localized deployment. This is also the first natural language processing intelligent text proofreading application solution that has passed Huawei Ascend AI Ecological Certification.

Entering 2023, Midu's actions in the field of intelligent proofreading will be significantly accelerated. At the beginning of the year, Midu Smart Proofreading System was selected into the 2022 Publishing Industry Science and Technology and Demonstration Innovation Project "Science and Technology Innovation Achievements" of the National Press and Publication Administration; Mongolian, Tibetan, Uyghur, Korean, Zhuang, Kazakh, Dai, Uzbek, Kirgiz, Russian, Yi, Lisu, a total of 12 minority languages; in June, Midu was officially launched Chinese polishing service, focusing on solving wording and expression problems such as improper use of words and mixed sentence patterns.

Until the site of this WAIC conference, Midu launched a brand new Midu Wenxiu, bringing the new working paradigm in the era of large-scale models into the proofreading scene. In addition to refreshing the best results of various types of proofreading tasks, it also improved the weaker ones in the past. It is difficult to distinguish and analyze the subtle semantics of easily confused words. At the same time, on the basis of respecting the expression of the original meaning, it can better correct problems such as mixed sentence patterns and logical confusion, so as to make the sentence expression more fluent and realize the polishing function of the sentence. The release of Midu Wenxiu can be regarded as a practice of Midu actively empowering vertical office scenarios with cutting-edge technology that keeps pace with the times in the era of large-scale models.

According to Zhang Xiaojuan, general manager of Midu Intelligent Proofreading Division, Midu Wenxiu’s innovation is concentrated in two aspects.

First of all, a multi-task learning strategy is introduced to improve the proofreading ability, and multiple sub-tasks closely related to the proofreading task are designed to allow the model to perform self-supervised learning, and to improve the intelligence of proofreading by doing related tasks.

Secondly, the data quality of model learning has been greatly improved, and automated methods have been introduced to evaluate the quality of large-scale data to solve the problem that noise data is more sensitive to the impact of proofreading tasks. At the same time, Midu Wenxiu has achieved full coverage of general standardized Chinese characters , with a more complete professional vocabulary, so that more Chinese characters can be input into the model for learning.

In the future , innovation will not stop

In actually serving institutional customers such as news publishing, media, and government, Midu found that users have strong needs for consistency testing, professional knowledge proofreading, and layout proofreading, but there are still technical constraints that cannot be solved with high quality. In response to these needs, the honey algorithm team has been intensively developing research and development, hoping to better meet the proofreading needs of users at different levels through the ability of large language models, and contribute to the further improvement of proofreading capabilities.

At the same time, in response to the proofreading needs of government agencies that need to be used in the internal network environment, Midu Wenxiu plans to start the model compression task on the one hand, under the constraint of limited effect loss, to reduce the cost of hardware resources, and secondly, to provide plug-in local learning Service, in a computing environment that users can trust, realizes the incremental learning of non-public data and improves the proofreading effect.

As large models enter more and more vertical industries and segment scenarios, their effect on improving the productivity of the whole society becomes more and more obvious. The large-scale model in the vertical field represented by Midu Wenxiu has a promising future.

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/131665348