Introduction to Machine Learning (6): Classification Algorithm-Naive Bayes Algorithm

Learning Directory:
Insert picture description here

Four. Naive Bayes Algorithm

Insert picture description here
Insert picture description here
Insert picture description here

Example: Insert picture description here
Insert picture description here
Application of Naive Bayes (including Laplacian smoothing coefficient): Text classification case
Insert picture description here
Insert picture description here

Insert picture description here

Case: 20 types of news text classification

**Process:** Obtain data (no need to do data processing, English sklearn data is processed)
          Divide the data set
          feature engineering (because it is an article, so do text feature extraction)
          Naive Bayesian predictor process
          model evaluation

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import MultinomialNB
def nb_news():
   #获取数据集
   news = fetch_20newsgroups(subset='all')
   # 划分数据集
   x_train, x_test, y_train, y_test = train_test_split(news.data, news.target)
   #特征工程
   transfer = TfidfVectorizer()
   x_train=transfer.fit_transform(x_train)
   x_test=transfer.transform(x_test)#不使用fit(),因为对验证集进行标准化要按照训练集的标准化标准进行
   #朴素贝叶斯算法预估流程
   estimator=MultinomialNB()#调用贝叶斯算法
   estimator.fit(x_train,y_train)
   #模型评估
   #方法一:直接比对真实值和预测值
   y_predict=estimator.predict(x_test)
   print('y_predict:\n',y_predict)
   print('直接比对真实值和预测值:\n', y_test=y_predict)
   # 方法二:计算准确率
   score = estimator.score(x_test,y_test)
   print('准确率:\n', score)
if __name__=='__main__':
    nb_news()

Insert picture description here

Summary: Naive Bayes

     Advantages: less sensitive to missing data, simpler algorithms, higher accuracy of common language text classification, and fast speed
     Disadvantages: due to the use of the assumption that the sample attributes are mutually independent, if there is a correlation between features and features, The effect will be worse

Summary of classification algorithms:
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_45234219/article/details/114931618