Using basic machine learning algorithms to classify mobile phone data-Naive Bayes

1 Algorithm ideas

The specific ideas of Naive Bayes will not be elaborated, and the emphasis will be on how to train a reliable Naive Bayes classifier from existing data.

Insert picture description here
First, let’s take a look at the data csv table that was preprocessed last time. Each row of the table is the data of a mobile phone model, each column is an attribute of the mobile phone, and the last column is the label value, with only high and low labels.

According to Bayesian formula:
Insert picture description here
Our goal is that the classifier can be based on the input data x (x 1, x 2,..., Xn) \textbf{x}(x_1,x_2,...,x_n)x(x1,x2,...,xn) Find the most likelyC i C_iCi. Because of the denominator p (x) p(\textbf(x))p ( x ) is fixed, so we only need to compare the value of each numerator.

In the numerator:

  • P ( C i ) P(C_i) P(Ci) Is the probability of the category, just divide the number of data under the category by the total number of data.
  • p ( x ∣ C i ) p(\textbf{x}|C_i) p(xCi) IsC i C_iCiX (x 1, x 2,..., Xn) \textbf(x)(x_1,x_2,...,x_n) appears inx(x1,x2,...,xn) . Because we are a naive Bayes algorithm, all attributes are directly not related by default, so:p (x ∣ C i) p(\textbf{x}|C_i)p(xCi) = p ( x 1 ∣ C i ) ∗ . . . ∗ p ( x n ∣ C i ) p(x_1|C_i)*...*p(x_n|C_i) p(x1Ci)...p(xnCi)

In the data table above, the data is continuous rather than discrete. How to know p (x ∣ C i) p(x|C_i)p(xCi) Value?

Here, the data can be divided into k categories according to intervals, so as to convert continuous data into discrete data. So that each xi x_ixiThere are k kinds of values, and their probabilities in the training set can be calculated one by one.

So far, we must distinguish the various concepts in Naive Bayes:

  • Category: The category and label to which the data belongs is the target we want to classify
  • Attribute: A piece of data contains multiple different attributes, and each attribute independently affects its category
  • Property value: the value of the property

2 code implementation

Guess you like

Origin blog.csdn.net/weixin_44602409/article/details/109398214