"Machine Learning Core Algorithm" Classification Algorithm - Naive Bayes MultinomialNB

"Author's homepage": Shibie Sanri wyx
"Author's profile": CSDN top100, Alibaba Cloud blog expert, Huawei Cloud Sharing expert, high-quality creator in the field of network security
"Recommended column": "Python Beginner to Master" for beginners with zero basics

Insert image description here

Naive Bayes model ( NBM for short ) is a classification method based on "Bayes' theorem" and "feature conditional independence assumption" .

"Bayes' Theorem" : Also called Bayes' formula, it is used to describe the relationship between two "conditional probabilities" . For example, if you see a person always doing good things, then this person is probably a good person.
"Feature Conditional Independence Assumption" : In order to solve the problem of excessive exponential growth of "parameters" , Naive Bayes assumes that the conditions of features are "mutually independent" based on Bayes' theorem .


1. Naive Bayes API

Naive Bayes classifier of "polynomial" model, used for classification with "discrete" features, such as word count for text classification, which requires integer feature count.

sklearn.naive_bayes.MultinomialNB()

parameter

  • alpha : (optional, float) smoothing parameter, default is 1.0
  • force_alpha : (optional, Boolean type) default value False, if it is False and alpha is less than 1e-10, set alpha to 1e-10; if True, aplha remains unchanged; this is to prevent alpha from being too close to 0 leading to numerical errors
  • fit_prior : (optional, Boolean type) whether to learn the prior probability, the default value is True, if it is False, the unified prior is used.

function

  • MultinomialNB.fit(x_train, y_train): Receive training set features and training set targets
  • MultinomialNB.predict(x_test): Receives test set features and returns the class label of the data
  • MultinomialNB.score(x_test, y_test): Receives test set features and test set targets, and returns the accuracy.
  • MultinomialNB.get_params(): Get the received parameters (parameters such as alpha and fit_prior)
  • MultinomialNB.set_params(): Set parameters
  • MultinomialNB.partial_fit(): incremental test, used when the amount of data is too large to be loaded into the memory at once.

2. Practical application of Naive Bayes algorithm

2.1. Obtain data set

Here we use the "Iris" data set that comes with sklearn .

from sklearn import datasets

# 1、获取数据集
iris = datasets.load_iris()
print(iris.data)

Output:

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 ......
  [5.9 3.  5.1 1.8]]

2.2. Divide the data set

Next, we "divide" the data set , pass in the feature values ​​and target values, and divide it according to the default proportion (25% test set, 75% training set)

from sklearn import datasets
from sklearn import model_selection

# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
print('训练集特征值:', len(x_train))
print('测试集特征值:', len(x_test))
print('训练集目标值:', len(y_train))
print('测试集目标值:', len(y_test))

Output:

训练集特征值: 112
测试集特征值: 38
训练集目标值: 112
测试集目标值: 38

From the results, we can see that the training set is divided into 112 groups and the test set is divided into 38 groups, which is in line with expectations.


2.3. Feature normalization

Next, we "normalize" the feature values . It should be noted that the training set and test set must be processed exactly the same.

from sklearn import datasets
from sklearn import model_selection
from sklearn import preprocessing

# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
# 3、特征归一化
mm = preprocessing.MinMaxScaler()
x_train = mm.fit_transform(x_train)
x_test = mm.fit_transform(x_test)
print(x_train)
print(x_test)

Output:

[[0.8        0.5        0.87719298 0.70833333]
 [0.42857143 0.5        0.66666667 0.70833333]
 ......

From the results we can see that the eigenvalues ​​have changed accordingly.


2.4. Bayesian algorithm processing and evaluation

Next, instantiate the Bayesian class object and pass in the feature value target value of the training set for training.

from sklearn import datasets
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import naive_bayes

# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
# 3、特征归一化
mm = preprocessing.MinMaxScaler()
x_train = mm.fit_transform(x_train)
x_test = mm.fit_transform(x_test)
# 4、贝叶斯算法处理
estimator = naive_bayes.MultinomialNB()
estimator.fit(x_train, y_train)

# 5、模型评估
y_predict = estimator.predict(x_test)
print('真实值和预测值对比', y_predict == y_test)
score = estimator.score(x_test, y_test)
print('准确率', score)

Output:

真实值和预测值对比 [ True False  True False  True False  True  True  True  True False  True
 False False False False False  True False  True False  True  True  True
  True  True  True  True  True False False False  True  True  True  True
  True False]
准确率 0.6052631578947368

3. Frequently Asked Questions

The training set data of MultinomialNB cannot be "negative" , otherwise an error will be reported: Negative values ​​in data passed to MultinomialNB.

For example, when normalizing features, negative results will appear and an error will be reported. You can use normalization instead.

4. Fan benefits

Leave a message in the comment area to participate in the lottery and receive 5 copies of "Cloud Computing Security".

Insert image description here

This book contains both theoretical research and practical discussion. It is divided into 6 chapters and explains the application practice of artificial intelligence and big data mining technology in cloud computing security. Chapter 1 provides a macro introduction to cloud computing security from the perspectives of concepts, development, and standards. Chapter 2 explains the core goals of cloud computing security, security requirements in public cloud scenarios, and security requirements in private cloud scenarios from the perspective of cloud computing security requirements. Security needs; Chapter 3 comprehensively and systematically introduces the public cloud security technology system and private cloud security technology system, Chapter 4 details the application practice of artificial intelligence technology in the field of cloud computing security; Chapter 5 introduces in detail Application practice of data mining technology in the field of cloud computing security: Chapter 6 introduces the comprehensive application of artificial intelligence and human data mining technology, proposes a cloud data center security protection framework, and introduces the cloud data center security situation awareness system in detail.
This book is a practical reference book for the application of artificial intelligence and big data mining technology in the field of cloud computing security. It is suitable for practitioners in artificial intelligence, big data mining, cloud computing, and network information security related fields.

Insert image description here

Tsinghua News Agency [Autumn Reading Plan] Get coupons and enjoy discounts immediately

IT Good Book 5 fold plus 10 yuan no threshold coupon:https://u.jd.com/Yqsd9wj

Event time: September 4th to September 17th, first come first served, come and grab it quickly

Guess you like

Origin blog.csdn.net/wangyuxiang946/article/details/132865473