NLP: Chinese Academy of Sciences NLP Corpus

        The Chinese Academy of Sciences NLP Corpus is a Chinese large-scale natural language processing corpus developed by the Natural Language Processing and Social Humanities Computing Laboratory of the Chinese Academy of Sciences (CASIA-NLP). The corpus contains many different types of text data, such as news, forums, microblogs, encyclopedias, novels, etc. Among them, the news text is a subset of the Chinese News Corpus (CNC for short), which is the most commonly used part of the corpus.

        The corpus has a large scale, containing more than 1 billion words of Chinese text data, and can be used for a variety of natural language processing tasks, such as word segmentation, part-of-speech tagging, named entity recognition, and syntactic analysis. At the same time, the corpus also provides a variety of different data formats, and can perform custom text query and statistical analysis.

        The NLP corpus of the Chinese Academy of Sciences is one of the important resources for the research and application of Chinese natural language processing, and has been widely used in academic research, commercial applications and other fields.

Guess you like

Origin blog.csdn.net/SYC20110120/article/details/132722058