Quickly understand and master Chinese natural language processing

What is NLP

     In the computer field, NLP (Natural Language Processing), also known as "Natural Language Processing", is the study of how to make computers understand human language. This includes enabling computers to both understand the meaning of natural language text and express a given deep intent, thought, etc. in natural language text. Therefore, this technology often embodies the highest task and realm of artificial intelligence, that is to say, only when the computer has the ability to understand natural language, the machine can be considered to achieve true intelligence. However, because Chinese words are composed of ever-changing Chinese characters, "natural language processing" in the Chinese field is particularly complex. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics , but with important differences. Natural language processing is not the general study of natural language, but the development of computer systems that can effectively realize natural language communication, especially the software systems therein. Hence it is part of computer science . Natural Language Processing (NLP) is a field of computer science, artificial intelligence , linguistics concerned with the interaction between computers and human (natural) language.

    Focusing on the field of "Natural Language Processing" for many years, NLP technology and services have been making continuous progress with rapid development. From automatic translation, intelligence retrieval, automatic indexing, automatic summarization, automatic writing of stories and novels, etc., we can use our tool class DKNLPBase to deal with. NLP technology is no longer simply at the conceptual level, but gradually penetrates and succeeds Applied in various fields of Dakuai.

 

Why you need NLP

    For example, in our daily life, we always come across some uncommon words that we don’t know how to read. At this time, we often go to search engines to search for them, such as “4 words and what to say”. We found that the search results must show you what "4 words are composed of", with pinyin and annotations attached to the side, not the solitary words "4 words again" or their superficial expressions. match results.

 

This is actually a manifestation of NLP technology. Through this technology, people do not have to spend a lot of effort to learn and understand the difficult computer language, but use the computer in the language they are most used to, and learn more about the meaning behind it.

What can NLP be used for?

The Dakuai NLP module is a component of the Dakuai big data integration platform. Users can refer to this component to effectively perform natural language processing, such as article summarization, semantic discrimination, and improving the accuracy and effectiveness of content retrieval.
Natural language processing is now being studied not only as a core subject of artificial intelligence, but also as a core subject of a new generation of computers. From the perspective of knowledge industry, expert systems, databases, knowledge bases, computer-aided design systems (CAD), computer-aided teaching systems (CAI), computer-aided decision-making systems, office automation management systems, intelligent robots, etc., all require natural language processing. , The natural language understanding system with text understanding ability can be used in the fields of automatic machine translation, intelligence retrieval, automatic indexing, automatic summarization, automatic writing of stories and novels, etc., all of which can be processed by our tool class DKNLPBase.
Standard Tokenizer
Method signature: List<Term> StandardTokenizer.segment(String txt);
Return: Tokenizer list.
Signature parameter description: txt: The statement to be segmented.
Example: The following example verifies that the fifth participle of a paragraph is AlphaGo.
public void testSegment() throws Exception
    {
        String text = "Goods and Services";
        List<Term> termList = DKNLPBase.segment(text);
        assertEquals("Goods", termList.get(0).word);
   assertEquals("and ", termList.get(1).word);
        assertEquals("Service",
        text = "Ke Jie explains the ending of "Lee Sedol VS AlphaGo in the second round"";
        termList = DKNLPBase.segment(text);
        assertEquals("AlphaGo", termList.get(5).word); / / Can identify "AlphaGo"
}
Keyword extraction
method signature: List<String> extractKeyword(String txt,int keySum);
Return: keyword list.
Signature parameter description: txt: The statement to extract the keyword, keySum to extract Number of keywords
Example: Given a sentence, extract a keyword that is "programmer".
public void testExtractKeyword() throws Exception
    {
        String content = "Programmers (English Programmers) are professionals engaged in program development and maintenance." +
               "Generally, programmers are divided into program designers and program coders." +
               "But two The boundaries of the programmers are not very clear, especially in China." +
               "Software practitioners are divided into four categories: junior programmers, senior programmers, system" +
               "analysts and project managers.";
        List<
        assertEquals(1, keyword.size());
        assertEquals("Programmer", keyword.get(0));
    }
Phrase extraction
Method signature: List<String> extractPhrase(String txt, int phSum);
Return: Phrase
signature parameter Description: txt: The sentence to extract the phrase, the number of phSum phrases
Example: Given a piece of text, five phrases that can represent the article, the first phrase is an algorithm engineer.

 

    NLP has made good progress in recent years, but there are still many problems to be solved, so Da Kuai is actively trying, but it is precisely such challenging problems that more talented people can devote themselves to it. Come and push it forward.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325557592&siteId=291194637