Lucene.Net used in sentiment analysis (Opinion Mining)

Abstract: Lucene.Net using semantic analysis (Opinion Mining)


Here we assume that the line should be divided into various comments are positive or negative remarks classification,

The easiest way is to use Bayesian filtering method, the process is similar classified as spam,

We have two training thesaurus, a dictionary is negative, a positive thesaurus,

Thesaurus training again, and so to determine the positive or negative opinion about the establishment of a thesaurus

O'Reilly Bad Data refer to the book, wherein the reference patterns can be segmented by,

Such as movie reviews with comments elsewhere may be different

Refer to previously published articles:

Lucene.Net Spam Filter

Establish lexicon: 

Establish lexicon syntax can be found Lucene.Net index

  1. A sufficient number of comments
  2. Classification of balance, usually comments will show extreme, such as the 5 star review usually far more than the evaluation of a star, it is necessary to limit the number of reviews pos pos and neg classification to balance

Select the appropriate classification algorithm: 

  1. Bayesian filtering
  2. Maximum Entropy

In Bad Data Technical Manual, the authors believe that a good deal larger than the Maximum Entropy Bayesian filtering method

Training a classifier: 

Can be found NLTK - trainer

Providing users a variety of natural language analysis of Python libraries such as the NLTK, Net platform may

SharpNLP can choose to start, there are other Stanford Parser

Data Sources :

Public perception of the use of text mining analysis

Opinion Mining explanation

Bad Data Technical Manual

Maximum Entropy Learning

Original: Big Box  Lucene.Net used in sentiment analysis (Opinion Mining)


Guess you like

Origin www.cnblogs.com/chinatrump/p/11516370.html