introduction

Basic Information

50 credit hours, 3 credits
Zong Chengqing, Zhang Jiajun
Assignment: method practice + technical report (group or single)

statement of problem

Analysis of the relationship between people and events is of great significance
A lot of complex data is difficult to deal with manually
Let the computer understand natural language text automatically or semi-automatically
Natural language processing : Let the computer realize the automatic processing of massive language texts and the effective use of mining rivers to meet the various needs of different users and realize personalized services.

basic concept

Linguistics :
- Scientific research on language
- Subjects that study the nature, structure and development law of language
- Speech and text are two basic attributes of language
Computational Linguistics (Computational Linguistics):
- The discipline of analyzing, understanding and generating natural language by establishing formal computational models
- Interdisciplinary
- More research on basic theories and methods than natural language processing
- Consider the problem of language modeling, mathematical models and methods
- Distinction 1/3: Language modeling and calculation
NLU (Natural Language Understanding):
- Disciplines that study natural language processing methods and implementation techniques that mimic human language cognitive processes
- Interdisciplinary (including cognitive science)
- Thinking about language thinking
- The standard of "understanding": judging the intelligence of a computer?
  - Performance (act), reaction (react), interaction (interact)
  - How does it compare with conscious individuals (people)? Turing experiment
- Distinction 2/3: Language Cognition
NLP (Natural Language Processing):
- The subject of using computer technology to process and process language text
- Recognition, classification, extraction, conversion and generation of lexical, syntactic, semantic and pragmatic information
- Distinction 3/3: Implementation of Language Engineering System
Unified Understanding of the Three: Human Language Technology Research (Human Language Technology)
- NLP -> CL -> NLU
Language Family:
- Inflectional language (fusional language): morphological changes in words to express grammatical relations (English)
- Adhesive language (agglutinative language): There are additional components within the word that specifically represent the grammar, and the root or stem and the additional components are not tightly combined (Japanese)
- Isolating language (isolating language): morphological changes little, grammatical relations expressed by the order and function words (Chinese)
Chinese Information Processing: Chinese language natural language processing technology

The emergence and development of disciplines

Early: rationalism, symbolic logic (rules, dictionary + algorithm)
Mid-term: empiricism, statistical learning (corpus, feature + model)
Later: Connectivism, neural network (corpus + model)

research content

machine translation
- Experiment with automatic translation from one language to another
Information retrieval
- Information retrieval, using computer systems to find relevant information that meets user needs from a large number of documents
Automatic digest
- Automatically extract the main content of the original document or some information to form a summary or abbreviation
Question answering system
- The system understands people's questions and uses automatic reasoning to automatically solve answers from knowledge resources and make corresponding answers
- Can be combined with voice technology to form a man-machine dialogue system
- Community Q & A
Information filtering
- Automatically identify and filter document information that meets certain conditions
Information extraction
- Extract information of interest to users from specified documents or massive texts
- Entity relationship extraction
- Social network
Document classification
- Automatic document classification or information classification
- A large number of documents are automatically classified according to certain classification criteria (theme, content)
- Sentiment classification
Text editing and automatic proofreading
- Continue to automatically check, proofread and arrange for spelling, wording, even grammar, document format, etc.
- More difficult
Language teaching
Text recognition
Speech Recognition
- Automatically convert input voice signals into written text
Text to speech conversion, speech synthesis
- Automatically convert written text into corresponding speech representations
Speaker recognition
- Determine or verify the identity of the speaker based on some speech sticks

Problems and challenges

Morphology (Morphology) question: how a meaningful word basic unit - morpheme
- Inflectional morphological changes and word recognition
- Chinese word segmentation
- Morpheme : root, prefix, suffix, suffix
Syntax (Syntax) problem: the relationship between the structural components of the sentence and the rules that make up the sentence sequence
Semantic question: how to derive the meaning of a sentence from the meaning of the words in a sentence and the role of these words in the syntactic structure
Pragmatics (Pragmatic) problem: different contexts statements and the context of the application of the understanding of the impact statement
- Context reflected in language structure
- Meaning not covered by semantics
A lot of ambiguity difficulties:
- Lexical ambiguity: morphological changes, Chinese segmentation
- Speech Ambiguity
- Structural ambiguity
- Semantic ambiguity
- Polyphonic characters and prosodic ambiguity: polyphonic characters , prosodic tones, etc.
Difficulty with a large number of unknown languages
- New words, names, place names, terms
- New meaning
- New usage and new sentence patterns
challenge
- Pervasive uncertainty
- Unpredictability of unknown language phenomena
- Inadequate data always faced
- Complexity of knowledge representation
- Unequivalence of mapping units in machine translation
The human brain understands language is a complex thought process

Basic methods and technical status

basic method
- Rationalist approach: rule-based approach
- Empirical approach: data-driven approach
- Linkist approach: data-driven, neural networks
Rationalism : through the study of some representative sentences or language phenomena, we can get an understanding of human language ability, summarize the laws of language use, and analyze and infer the expected effect of the test sample
- Establish symbol processing system based on rule analysis method
- Knowledge base + inference system
- Theoretical basis: Chomsky's grammar theory
- Rule method : good effect on the content of standard structure, but it is difficult to deal with irregular content
Empiricism : Using a large amount of real language data, the help of the ending person (annotation and feature selection), statistically discover the law of language use and the possibility of the size, based on which to calculate the possible results of predicting the test sample
- Statistical unit for discrete events
- Building a calculation model based on large-scale real data
- Corpus + statistical model
- Theoretical technology: statistics, information theory, machine learning
- Bayesian formula
Connectionism : use large-scale real language data to build a model, statistically discover the rules of language use and the possibility of it, and use this as a basis to calculate the possible results of predicting test samples
- The statistical unit is a continuous real space representation (vector)
- Building a calculation model based on large-scale real data
- Corpus + neural network + statistical model
- Theoretical basis: statistics, deep learning
- Vectorized representation, neural network model for target optimization, RNN, attention mechanism
- Data-driven approach : no deep analysis is required, or even basic knowledge, and it depends on the amount of data; but the amount of data is also a difficult problem, which is difficult to deal with complex sentences, unfamiliar vocabulary, reference and translation consistency Lack of explanation

A pie star

Published 14 original articles · Likes0 · Visits 69

Private letter concerns

UCAS-AI Academy-Special Course on Natural Language Processing-Lecture 1-Course Notes

UCAS-AI Academy-Special Course on Natural Language Processing-Lecture 1-Course Notes

introduction

Basic Information

statement of problem

basic concept

The emergence and development of disciplines

research content

Problems and challenges

Basic methods and technical status

Guess you like