What is Natural Language Processing Technology

Natural Language Processing ( NLP) is a field of computer science, artificial intelligence, linguistics concerned with the interaction between computers and human (natural) language. Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics, but with important differences. Natural language processing is not the general study of natural language, but the development of computer systems that can effectively realize natural language communication, especially the software systems therein. Hence it is part of computer science.

Natural language processing technology is a general term for all technologies related to computer processing of natural language. Its purpose is to enable computers to understand and accept instructions input by humans in natural language, and to complete the translation function from one language to another. Natural language The research of processing technology can enrich the research content of computer knowledge processing and promote the development of artificial intelligence technology.

The Dakuai NLP module is a component of the Dakuai big data integration platform. Users can refer to this component to effectively perform natural language processing, such as article summarization, semantic discrimination, and improving the accuracy and effectiveness of content retrieval.

Natural language processing is now being studied not only as a core subject of artificial intelligence, but also as a core subject of a new generation of computers. From the perspective of knowledge industry, expert systems, databases, knowledge bases, computer-aided design systems ( CAD), computer-aided teaching systems (CAI), computer-aided decision-making systems, office automation management systems, intelligent robots, etc., all require natural language processing. , The natural language understanding system with text understanding ability can be used in the fields of automatic machine translation, intelligence retrieval, automatic indexing, automatic summarization, automatic writing of stories and novels, etc., all of which can be processed by our tool class DKNLPBase.

standard participle

Method signature: List<Term> StandardTokenizer.segment(String txt);

Returns: a list of participles.

Signature parameter description : txt: The statement to be segmented.

Example: The following example verifies that the fifth participle of a paragraph is AlphaGo.

public void testSegment() throws Exception

    {

        String text = " Goods and Services ";

        List<Term> termList = DKNLPBase.segment(text);

        assertEquals("商品", termList.get(0).word);

 

assertEquals("", termList.get(1).word);

 

        assertEquals("服务", termList.get(2).word);

 

        text = " Ke Jie explained the ending of " Lee Sedol VS AlphaGo in the second game" ";

 

        termList = DKNLPBase.segment(text);

 

        assertEquals(" Alpha dog ", termList.get(5).word); // Can identify " Alpha dog "

 

}

 

keyword extraction

 

Method signature: List<String>extractKeyword(String txt,int keySum); 

 

Returns: a list of keywords .

 

Signature parameter description : txt: statement to extract keywords, keySum to extract the number of keywords

 

Example: Given a paragraph to extract a keyword is "programmer".

 

public void testExtractKeyword() throws Exception

 

    {

 

        String content = " Programmers ( English Programmers) are professionals engaged in program development and maintenance. " +

 

                " Programmers are generally divided into programmers and programmers, " +

 

                "但两者的界限并不非常清楚,特别是在中国。" +

 

                "软件从业人员分为初级程序员、高级程序员、系统" +

 

                "分析员和项目经理四大类。";

 

        List<String> keyword = DKNLPBase.extractKeyword(content, 1);

 

        assertEquals(1, keyword.size());

 

        assertEquals("程序员", keyword.get(0));

 

    }

 

短语提取

 

方法签名:List<String> extractPhrase(String txt, int phSum);

 

返回:短语

 

签名参数说明txt:要提取短语的语句,phSum短语数量

 

范例:给出一段文字,能代表文章的五个短语,第一个短语是算法工程师

 

 

 

    迈进二十一世纪,我们已经进入了以互联网为主要标志的海量信息时代,这些海量信息大部分是以自然语言表示的。一方面,海量信息也为计算机学习人类语言提供了更多的“素材”,另一方面,这也为自然语言处理提供了更加宽广的应用舞台。例如,作为自然语言处理的重要应用,搜索引擎逐渐成为人们获取信息的重要工具,涌现出以百度、谷歌等为代表的搜索引擎巨头;机器翻译也从实验室走入寻常百姓家,谷歌、百度等公司都提供了基于海量网络数据的机器翻译和辅助翻译工具;基于自然语言处理的中文(输入法如搜狗、微软、谷歌等输入法)成为计算机用户的必备工具;带有语音识别的计算机和手机也正大行其道,协助用户更有效地工作学习。总之,随着互联网的普及和海量信息的涌现,自然语言处理正在人们的日常生活中扮演着越来越重要的作用。
    然而,我们同时面临着一个严峻事实,那就是如何有效利用海量信息已成为制约信息技术发展的一个全局性瓶颈问题。自然语言处理无可避免地成为信息科学技术中长期发展的一个新的战略制高点。同时,人们逐渐意识到,单纯依靠统计方法已经无法快速有效地从海量数据中学习语言知识,只有同时充分发挥基于规则的理性主义方法和基于统计的经验主义方法的各自优势,两者互相补充,才能够更好、更快地进行自然语言处理。
    自然语言处理作为一个年龄尚不足一个世纪的新兴学科,正在进行着突飞猛进的发展。回顾自然语言处理的发展历程,并不是一帆风顺,有过低谷,也有过高潮。而现在我们正面临着新的挑战和机遇。例如,目前网络搜索引擎基本上还停留在关键词匹配,缺乏深层次的自然语言处理和理解。语音识别、文字识别、问答系统、机器翻译等目前也只能达到很基本的水平。路漫漫其修远兮,自然语言处理作为一个高度交叉的新兴学科,不论是探究自然本质还是付诸实际应用,在将来必定会有令人期待的惊喜和异常快速的发展。

<!--EndFragment-->

<!--EndFragment-->

 

<!--EndFragment-->

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326172419&siteId=291194637