NLP task summary

Author: chen_h
Micro Signal & QQ: 862251340
micro-channel public number: coderpai


A: lexical analysis

  • Word (Word Segmentation / Tokenization, ws): When the text processing, text processing would be a word, the following is a common lexicon.
  • The new word discovery (New Words Identification, nwi): This is easy to understand, because the network always has new words emerge, such as the former 'God horse' such network popular vocabulary.
  • Morphological analysis (Morphological Analysis, MA): morphological analysis of words, including stem (Sterms), root (Roots), affixes (Prefixes and Suffixes), etc.
  • Speech tagging (Part-of-speech Tagging, POS): determining the text parts of speech of each word. Including speech verb (Verb), noun (Noun), pronouns (pronoun phr.) And the like.
  • Spelling correction (Spelling Correction, SP): As the name suggests, need to find the wrong word, the wrong word and make changes.

II: Parsing

  • The language model (Language Modeling, LM): Application language model is still very widespread, NLP model of language study notes give a detailed introduction to the language model. Now a lot of models are based on LM come.
  • Chunking (Chunking): the indicated phrase sentence block, e.g. noun phrase (the NP), verb phrase (VP), etc.
  • Super tagging (Super Tagging): for each word in each sentence marked on the label super, super tag syntax tree is associated with the word tree
  • Parsing component (Constituency Parsing, CP): analyzing a sentence component, given a tree syntax tree consisting of terminal and non-terminal symbol
  • Dependency parsing (Dependency Parsing, DP): analyzing the dependencies between words and words in sentences, a syntax tree to the dependency of a word composed dependencies.
  • Language Identification (Language Identification): a piece of text to determine what kind of language
  • Sentence boundary detection (Sentence Boundary Detection): no clear text to add a sentence boundaries boundary.

Three: semantic analysis

  • Words / sentences / paragraphs told quantification (Word / Sentence / Paragraph Vector): this means that word2vec, sentence2vec, paragraph2vec, even doc2vec.
  • WSD (Word Sense Disambiguation): Words of ambiguous, their exact meaning
  • Semantic role labeling (Semantic Role Labeling): annotation semantic role Classmark sentence, semantic role, including semantic role agent, patient, etc. Effects
  • Abstract semantic representation analysis (Abstract Meaning Representation Parsing): AMR is an abstract semantic representation, AMR parser to parse the sentence structure AMR
  • First-order predicate calculus logic (First Order Predicate Calculus):
    Frame Semantic Analysis (Frame Semantic Parsing):

Four: information extraction

  • NER (Named Entity Recognition, NER): from text identifying the named entities, including entities generally names (PER), names (the LOC), organization name (the ORG), time, date, currency, percentage and the like. There are also more specialized professional entities. https://arxiv.org/abs/1812.09449 article reviews the depth learning about the current method of NER research.
  • Relation extraction (Relationship Extraction): determining a type of relationship between two entities in the text.
  • The term extraction (Terminology / Giossary Extraction): identify terminology meet the requirements from the text.
  • Event extraction (Event Extraction): never text structure extracting structured event.
  • Disambiguation entity (Entity Disambiguation, ED): also known as semantic disambiguation, is designed for solving the ambiguity problem identical entity in the art. In the actual locale, often encounter the problem of an entity name that corresponds to a plurality of named entity objects.
  • Align entity (Entity Alignment, EA): also known as the matching entity (Entity Matching), it refers to heterogeneous data sources for each entity in the knowledge base, find belong to the same entity in the real world.
  • Coreference resolution (Coreference Resolution): determining the equivalent description of the different entities, including nouns and pronouns digestion Digestion
  • Sentiment Analysis (Sentiment Analysis): the text inside the inherent subjectivity of emotions. For example the word 'I really like this movie', then this is a positive evaluation, 'I hate this film' then that is a negative evaluation to.
  • Intended to identify (Intent Detection): dialogue system is an important module for the user to analyze a given conversation, identify the user's intention.
  • Filling slots (Slot Filling): the dialogue is an important module in the system, from the conversations analyzed for useful information related to the user's intent.

Five: Top Task

  • Machine translation (Machine Translation, MT): two conversion of the language. Many models sequence2sequence deep learning inside, Transformer, Bert and other model is applied to a machine translation above.
  • Text Summarization (Text summarization / Simplication): to extract the contents of a long text outline of
  • Q system (Question-Answering Systerm, QAS): For user questions, the system gives the appropriate answer
  • Dialogue system (Dialogue Systerm, DS): enables users to chat with dialogue, capture the user's intention from the conversation, and analysis of the implementation
  • Reading Comprehension (Reading Comprehension, RC): after the machine finished reading an article, given some of the issues related to the article, the machine can answer
  • Automatic article classification (Automatic Essay Grading, AEG): Given an article, the article on the quality of scoring or grading

Here Insert Picture Description

Source: https: //www.jianshu.com/p/d80b065bdcf0

Published 414 original articles · won praise 168 · views 470 000 +

Guess you like

Origin blog.csdn.net/CoderPai/article/details/105050924