Natural Language Processing Series: Natural Language Processing Concepts and Noun Explanation (2)

(1) The necessity of language analysis:

Suppose your company releases a brand new mobile phone product. The release of new products brought relevant reports and user feedback from different media. Faced with this data, you may wish to understand

What everyone pays attention to is what features
of this mobile phone do you think about this mobile phone,
and which users expressed their willingness to buy.
In the face of massive data, it is obviously impractical to use human resources to analyze these data. In this scenario, linguistic analysis comes in handy.
Letting machines do these analytical tasks instead of humans is exactly what language analysis does.

(2) Common operations of language analysis:

(1) Participle

Chinese word segmentation (Word Segmentation, WS) refers to the segmentation of Chinese character sequences into word sequences. Because in Chinese, words are the most basic units that carry semantics. Word segmentation is the basis for many Chinese natural language processing tasks such as information retrieval, text classification, and sentiment analysis.

For example, the sentence

Premier Li Keqiang of the State Council proposed to support Shanghai to actively explore new mechanisms when he investigated Shanghai Waigaoqiao.

The result of the correct word segmentation is

State Council / Premier / Li Keqiang / Research / Shanghai / Waigaoqiao / Time / Propose / , / Support / Shanghai / Active / Explore / New / Mechanism / .

If the segmentation result given by the word segmentation system is

State Council / Premier / Li Ke / Emphasis / Research / Shanghai…

Since emphasis is also a common word, this participle result is likely to occur. Then, if you want to search for information related to Li Keqiang, it will be difficult for search engines to retrieve the document.

Disambiguation is the main difficulty in word segmentation tasks.

(2) Part-of-speech tagging

Part-of-speech Tagging (POS) is the task of assigning a part-of-speech category to each word in a sentence. The part-of-speech categories here may be nouns, verbs, adjectives, or others. The following sentence is an example of part-of-speech tagging. Among them, v stands for verb, n stands for noun, c stands for conjunction, d stands for adverb, wp stands for punctuation mark.

Different corpora with part-of-speech tagging use different specifications. Here, the language cloud of Harbin Institute of Technology is used as an example to explain:

State Council/ni Premier/n Li Keqiang/nh Research/v Shanghai/ns Waigaoqiao/ns Time/n Proposed/v, /wp Support/v Shanghai/ns Active/a Explore/v New/a Mechanism/n. /wp

Part-of-speech tag set: 863 part-of-speech tag sets are used in LTP, and the meanings of each part of speech are as follows:

(3) Named Entity Recognition

 Named Entity Recognition (NER) is the task of locating and identifying entities such as person names, place names, and institution names in word sequences in sentences.

As in the previous example, the result of named entity recognition is:

 Premier Li Keqiang (name)  of the State Council (name of institution) proposed to support Shanghai (name of place) to actively explore new mechanisms when investigating Shanghai Waigaoqiao (name of place).

Named entity recognition plays an important role in mining entities in text and then analyzing them.

The type of named entity recognition is generally task-specific. LTP provides the identification of the most basic three entity types: person name, place name, and organization name.

Users can easily expand the entity type into entity types such as brand name and software name.

 (4) Dependency syntax analysis

Dependency Parsing (DP) reveals its syntactic structure by analyzing the dependencies between components within a language unit.

Intuitively, dependency syntax analysis identifies the grammatical components of "subject, predicate and object" and "fixed form complement" in a sentence, and analyzes the relationship between the components. Still the above example, the analysis result is:

 

From the analysis results, we can see that the core predicate of the sentence is "proposed", the subject is "Li Keqiang", the proposed object is "support Shanghai...", "when investigating..." is the (time) adverbial of "proposed", " The modifier of "Li Keqiang" is "Premier of the State Council", and the object of "support" is "exploring new mechanisms". With the above syntactic analysis results, we can easily see that the "proposed person" is "Li Keqiang", not "Shanghai" or "Waigaoqiao", even though they are all nouns, and they are more distant from "proposed". close.

 Dependency syntax analysis annotation relationship (15 types in total) and their meanings are as follows:

 

(5) Semantic role annotation

Semantic Role Labeling (SRL) is a shallow semantic analysis technique that labels certain phrases in a sentence as arguments (semantic roles) of a given predicate, such as agent, subject, time and place. It can promote applications such as question answering systems, information extraction and machine translation. Still the above example, the result of semantic role annotation is: 

 

Among them are the three predicates propose, investigate and explore. Taking exploration as an example, the positive is its way (generally represented by ADV), and the new mechanism is its subject (generally represented by A1)

The core semantic roles are A0-5, A0 usually means the agent of the action, A1 usually means the influence of the action, etc. A2-5 will have different semantic meanings according to different predicate verbs. The remaining 15 semantic roles are additional semantic roles, such as LOC for location, TMP for time and so on. The list of additional semantic roles is as follows:

(6) Semantic Dependency Analysis

Semantic Dependency Parsing (SDP) analyzes the semantic associations between language units of a sentence and presents the semantic associations as a dependency structure. The advantage of using semantic dependencies to describe sentence semantics is that there is no need to abstract the vocabulary itself, but to describe the vocabulary through the semantic frame that the vocabulary bears, and the number of arguments is always much less than the number of vocabulary. The goal of semantic dependency analysis is to overcome the constraints of the surface syntactic structure of sentences and directly obtain deep semantic information. For example, the following three sentences express the same semantic information in different expressions, that is, Zhang San implemented an eating action, and the eating action was implemented on an apple.

 

Semantic dependency analysis is not affected by syntactic structure. Language units with direct semantic associations are directly connected to dependency arcs and marked with corresponding semantic relations.

This is also an important difference between semantic dependency analysis and syntactic dependency analysis.

The above example compares the results of syntactic dependency and semantic analysis, and we can see that there are two significant differences between the two. First, syntactic dependencies pay more attention to the role of non-substantial words (such as prepositions) in sentence structure analysis to some extent, while semantic dependencies are more inclined to establish direct dependency arcs between content words with direct semantic association, and non-substantial words exist as auxiliary markers. . Second, the semantic relationship marked on the two dependency arcs is completely different. The semantic dependency relationship is derived from the argument relationship and can be used to answer questions, such as where do I drink soup, what am I drinking soup with, and who is there Soup, what am I drinking. But syntactic dependencies do not have this ability.

There is also a relationship between semantic dependency and semantic role labeling. Semantic role labeling only focuses on the arguments of the main predicates of a sentence and the relationship between predicates and arguments, while semantic dependencies not only focus on the relationship between predicates and arguments, but also on predicates and predicates. The semantic relationship between arguments, between arguments, and within arguments. Semantic dependencies can describe the semantic information of sentences more completely and comprehensively.

Semantic dependencies are divided into three categories: main semantic roles, each of which has a nested relationship and an inverse relationship; event relationships, which describe the relationship between two events; semantic attachment markers, which mark the speaker’s tone of voice and other dependencies. sexual information.

 

relationship type Tag Description Example
agency relationship Eight Agent I send her a bouquet of flowers (I <-- send)
relationship Exp Experiencer I run fast (run --> me)
emotional relationship Aft Affection I miss my hometown (missing --> me)
Consular relations Poss Possessor He has a good read (he <-- has)
Subject relationship Pat Patient He hit Xiao Ming (hit --> Xiao Ming)
guest relationship Account Content He heard firecrackers (listen --> firecrackers)
success relationship Prod Product He wrote a novel (write --> novel)
source relationship Orig Origin Our army captured four enemy tanks (captured --> tanks)
relationship Datv Dative he told me a secret ( tell --> me )
Compare roles Comp Comitative His grades are better than me (he --> me)
Subjective role Belg Belongings Lao Zhao has two daughters (Lao Zhao <-- yes)
similar role Class Classification He is a middle school student (yes --> middle school student)
By role Accd According This court pronounces judgment according to law (law <-- judgment)
sake role In row Reason He is worrying about his daughter's marriage (wore --> marriage)
intent role Int Intention He worked hard for the gold medal (gold medal <-- hard work)
ending role Cons Consequence He ran sweaty (run --> sweaty)
way role Mann Manner The ball slowly rolls into the empty gate (slow <-- roll)
tool role Tool Tool 她用砂锅熬粥 (砂锅 <-- 熬粥)
材料角色 Malt Material 她用小米熬粥 (小米 <-- 熬粥)
时间角色 Time Time 唐朝有个李白 (唐朝 <-- 有)
空间角色 Loc Location 这房子朝南 (朝 --> 南)
历程角色 Proc Process 火车正在过长江大桥 (过 --> 大桥)
趋向角色 Dir Direction 部队奔向南方 (奔 --> 南)
范围角色 Sco Scope 产品应该比质量 (比 --> 质量)
数量角色 Quan Quantity 一年有365天 (有 --> 天)
数量数组 Qp Quantity-phrase 三本书 (三 --> 本)
频率角色 Freq Frequency 他每天看书 (每天 <-- 看)
顺序角色 Seq Sequence 他跑第一 (跑 --> 第一)
描写角色 Desc(Feat) Description 他长得胖 (长 --> 胖)
宿主角色 Host Host 住房面积 (住房 <-- 面积)
名字修饰角色 Nmod Name-modifier 果戈里大街 (果戈里 <-- 大街)
时间修饰角色 Tmod Time-modifier 星期一上午 (星期一 <-- 上午)
反角色 r + main role   打篮球的小姑娘 (打篮球 <-- 姑娘)
嵌套角色 d + main role   爷爷看见孙子在跑 (看见 --> 跑)
并列关系 eCoo event Coordination 我喜欢唱歌和跳舞 (唱歌 --> 跳舞)
选择关系 eSelt event Selection 您是喝茶还是喝咖啡 (茶 --> 咖啡)
等同关系 eEqu event Equivalent 他们三个人一起走 (他们 --> 三个人)
先行关系 ePrec event Precedent 首先,先
顺承关系 eSucc event Successor 随后,然后
递进关系 eProg event Progression 况且,并且
转折关系 eAdvt event adversative 却,然而
原因关系 eCau event Cause 因为,既然
结果关系 eResu event Result 因此,以致
推论关系 eInf event Inference 才,则
条件关系 eCond event Condition 只要,除非
假设关系 eSupp event Supposition 如果,要是
让步关系 eConc event Concession 纵使,哪怕
手段关系 eMetd event Method  
目的关系 ePurp event Purpose 为了,以便
割舍关系 eAban event Abandonment 与其,也不
选取关系 ePref event Preference 不如,宁愿
总括关系 eSum event Summary 总而言之
分叙关系 eRect event Recount 例如,比方说
连词标记 mConj Recount Marker 和,或
的字标记 mAux Auxiliary 的,地,得
介词标记 mPrep Preposition 把,被
语气标记 mTone Tone 吗,呢
时间标记 mTime Time 才,曾经
范围标记 mRang Range 都,到处
程度标记 mDegr Degree 很,稍微
频率标记 mFreq Frequency Marker 再,常常
趋向标记 mDir Direction Marker 上去,下来
插入语标记 mPars Parenthesis Marker 总的来说,众所周知
否定标记 mNeg Negation Marker 不,没,未
情态标记 mMod Modal Marker 幸亏,会,能
标点标记 mPunc Punctuation Marker ,。!
重复标记 mPept Repetition Marker 走啊走 (走 --> 走)
多数标记 mMaj Majority Marker 们,等
实词虚化标记 mVain Vain Marker  
离合标记 mSepa Seperation Marker 吃了个饭 (吃 --> 饭) 洗了个澡 (洗 --> 澡)
根节点 Root Root 全句核心节点

 

以上资料整理于哈工大的语言云

 

20180503 于求是园

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325196580&siteId=291194637