Where is the difficulty in natural language understanding?

Original: Where is the difficulty in natural language understanding? - Know almost

1. Essence and key

The essence of natural language understanding tasks is structure prediction, and the key is the semantic representation ability of language units.

1.1 The essence of natural language understanding is structure prediction

        Natural language text is a typical unstructured data, composed of sequences of language symbols (such as Chinese characters). To realize the ideographic understanding of natural language, it is necessary to establish the prediction of the semantic structure behind the unstructured text. Therefore, many tasks of natural language understanding, including but not limited to Chinese word segmentation, part-of-speech tagging, named entity recognition, coreference resolution, syntactic analysis, semantic role labeling, etc., are all about predicting the specific semantic structure behind the text sequence. For example, Chinese word segmentation is to add spaces or other marks to sentences that are not separated by spaces, and mark the boundaries of each word in the sentence, which is equivalent to adding some structured semantic information to this text sequence.

1.2 The key to natural language understanding is semantic representation

        To achieve a complete understanding of the text, it is necessary to establish a more complete semantic structure representation space , and this more complete semantic representation often becomes the basis for the structure prediction of the above-mentioned NLP tasks.

Feature engineering : The process of constructing features is to construct a semantic structure representation space. Only when the semantic representation ability of this space is good enough and close enough to human understanding, can the model be able to perfectly represent and interpret the meaning that humans want to express through language.

  • In the era of statistical learning, the Symbol -based Representation scheme is generally adopted , that is, each word is regarded as an independent symbol. For example, the Bag-of-Words (BOW) model is the most commonly used text representation scheme, ignoring the order information of words in the text, and widely used in text classification, information retrieval and other tasks. N-Gram is also a language model based on symbolic representation. Compared with the BOW model, it takes into account the order of appearance of words in a sentence. It has been widely used in tasks such as machine translation, text generation, and information retrieval.

Disadvantages: The symbol representation is too rough, ignoring the consideration of the internal semantics or word order information of the words, and unable to consider the rich semantic information reflected behind the language symbols; it is also affected by the problem of data sparsity

  • In the era of deep learning, distributed representation (Distributed Representation or Embeddings) schemes are generally adopted , and each language unit (including but not limited to words, words, phrases, sentences, documents) uses a low-dimensional dense vector to represent their semantic information . Distributed representation is a key technology for deep learning and neural networks. The distributed representation scheme is inspired by the neural mechanism of the human brain.

Disadvantages: Although distributed representation has more powerful representation capabilities and degrees of freedom, it can only learn from data under specific tasks at present, and can only establish semantic representations that meet specific needs. On the one hand, it lacks interpretability and poor robustness. On the one hand, the versatility and mobility are insufficient. These are still thousands of miles away from the semantic representation capabilities exhibited by the human brain.

2. Characteristics of natural language 

  • innovative
  • recursion
  • Ambiguity
  • subjectivity
  • social

3. Where is the difficulty of natural language understanding

  • Structural Semantics Representation Space Construction

        For computers to understand human language, it is necessary to construct a structured semantic representation space. Only when the semantic representation capability of this space is comparable to that of the human mind, can it be possible to perfectly represent and interpret the meaning that humans want to express through language. At the same time, this semantic representation space also needs to be corrected by the objective world to eliminate biases and defects in human cognition, so that artificial intelligence can better serve human society.

        In the current semantic representation scheme, the symbol representation is too rough to consider the rich semantic information reflected behind the language symbols; while the distributed representation has more powerful representation capabilities and degrees of freedom, but currently it can only be learned through data under specific tasks. Only semantic representations that meet specific needs can be established. On the one hand, it lacks interpretability and poor robustness, and on the other hand, it lacks generality and transferability. These are still thousands of miles away from the semantic representation capabilities exhibited by the human brain.

        In the future, more powerful structured semantic representation spaces need to be explored. For example, whether it is possible to combine distributed representations with symbolic representations , while retaining the generalization capabilities of distributed representations while taking into account the abstraction capabilities brought about by modular and hierarchical symbolic representations. Perhaps this is one of the breakthroughs for the next round of revolutionary progress in natural language understanding.

  • Understanding Multimodal Complex Contexts

        Human beings do not use language in isolation, and language use needs to consider its complex context. Taking the polysemy of language as an example, there are polysemy language units, which always need the external complex contextual information for disambiguation : the polysemy of a character needs at least the words it consists of to disambiguate; the ambiguity of a word requires at least The sentence in which it is located is used to disambiguate; the meaning of the sentence must be placed in the discourse or dialogue context at least, and even complex world knowledge is needed to help understanding.

Guess you like

Origin blog.csdn.net/qq_27586341/article/details/123519687