NLP Natural Language Processing

NLP part 
NLU NLU 
given natural language input is a useful map. 
Analysis of different aspects of the language. 

NLG NLG 
text plan - which includes retrieving relevant content from the knowledge base. 
Sentence Planning - This includes selecting the desired word, to form a meaningful phrase, set the tone of the sentence. 
Text realize - this is the plan that maps sentences to sentence structure. 


NLP term 
phonology - this is the systematic study of sound organization. 
Form - This is a study from the original construction of meaningful units of words. 
Morpheme - it is the original meaning of language units. 
Grammar - it refers to the arrangement of words to express a sentence. It also relates to the structure to determine the role of words in sentences and phrases. 
Semantics - it involves the meaning of words and how words are combined into meaningful phrases and sentences. 
Pragmatics - it deals with the use and understanding of how to interpret the sentence and the sentence are affected in different situations. 
Discourse - how it handles under the previous sentence affect the interpretation of the word. 
Knowledge of the world -It includes general knowledge about the world. 

Step NLP 
lexical analysis 
which involves the identification and analysis of the structure of a word. Language represents the set of lexical language words and phrases. Lexical analysis txt entire block is divided into paragraphs, sentences and words. 
Syntactic analysis (parsing) 
it involves words in a sentence, grammar and word arrangements way analysis to show the relationship between words. "The school goes to boy" and other sentences were rejected English syntax analyzer. 
Semantic analysis 
it extracts the exact meaning or dictionary meaning from text. Text is checked whether it makes sense. It is done by grammatical structure and object mapping mission field. Parser ignore the sentence, such as "hot ice cream" or the like. 
Word integration of 
any sentence whose meaning depends on the meaning of the sentence before it. In addition, it also brings the meaning of the immediately subsequent sentence. 
Pragmatic Analysis 
In the meantime, said reinterpret its practical significance. It involves language needs to derive real-world knowledge. 

Blocking 
Import NLTK 

sentence = [( " A " , " DT " ), ( " Clever " , " JJ " ), ( " Fox","NN"),("was","VBP"), ("jumping","VBP"),("over","IN"),("the","DT"),("wall","NN")]
grammar = "NP:{<DT>?<JJ>*<NN>}"

parser_chunking = nltk.RegexpParser(grammar)# Define syntax parser 
parser_chunking.parse (sentence) .draw () # parse sentences and draw the tree diagram 





predict whether a given sentence category 
from sklearn.datasets Import fetch_20newsgroups
 from sklearn.naive_bayes Import MultinomialNB
 from sklearn.feature_extraction.text Import TfidfTransformer
 from sklearn.feature_extraction.text Import CountVectorizer 

# define classification FIG 
category_map = { ' talk.religion.misc ' : ' Religion ' , ' rec.autos ' : 'Autos ' , ' rec.sport.hockey ' : ' Hockey ' ,
 ' sci.electronics ' : ' Electronics, ' , ' sci.space ' : ' Space ' } 

# create a training set 
training_data = fetch_20newsgroups (Subset = ' Train ' , the Categories category_map.keys = (), shuffle = True, random_state =. 5 ) 

# create a vector counter counts and extracted term 
vectorizer_count = CountVectorizer () 
train_tc = vectorizer_count.fit_transform(training_data.data)
print("\nDimensions of training data:", train_tc.shape)

#创建tf-idf转换器
tfidf = TfidfTransformer()
train_tfidf = tfidf.fit_transform(train_tc)

#创建测试数据
input_data = [
   'Discovery was a space shuttle',
   'Hindu, Christian, Sikh all are religions',
   'We must have to drive safely',
   'Puck is a disk made of rubber',
   'Television, Microwave, Refrigrated All uses Electricity ' 
] 

classifier = MultinomialNB (). Fit (train_tfidf, training_data.target) # training a Multinomial naive Bayes classifier 
input_tc = vectorizer_count.transform (Input_Data) # Vector counter input data around 
input_tfidf tfidf.transform = (input_tc) # TF-IDF converter to convert vector data 
Predictions = classifier.predict (input_tfidf) 

for Sent, category in ZIP (Input_Data, Predictions):
      Print ( ' \ nInput the data: ' , Sent, ' \ the Category n-: ' , \ 
           category_map [training_data.target_names [category]])

result
Dimensions of training data: (2755, 39297)

Input Data: Discovery was a space shuttle 
 Category: Space

Input Data: Hindu, Christian, Sikh all are religions 
 Category: Religion

Input Data: We must have to drive safely 
 Category: Autos

Input Data: Puck is a disk made of rubber 
 Category: Hockey

Input Data: Television, Microwave, Refrigrated all uses electricity 
 Category: Electronics



口语词的识别
import speech_recognition as sr

recording = sr.Recognizer()

with sr.Microphone() as source:
    recording.adjust_for_ambient_noise(source)
    print("please say something")
    audio = recording.listen(source)

try:
    print("you said:\n" + recording.recognize_google(audio))
except Exception as e:
    print(e)

 

Guess you like

Origin www.cnblogs.com/hichens/p/11229638.html