I. Overview
1. Artificial Intelligence > Machine Learning > Deep Learning | Reinforcement Learning
2. Machine Learning (ML: machine leaning): A method of mining data through optimization methods (linear regression, logistic regression, decision trees, vector machines, Bayesian models, etc.) Regular disciplines.
3. Machine learning: input ==> function ==> output. Now that the input and output data are known, a better function is fitted by machine learning to match the input and output. 4. The essence of machine learning
is statistical model training, its main The job is to train the model, which can also be called the fitting model, that is, fitting the data is the main work of machine learning, and one word to sum it up is "guess"
. After calculation, a deviation result is fed back, and the algorithm model is adjusted according to the deviation result, and then a value is output, which goes round and round until it is correct.
6. Hypothesis Function (Hypothesis Function): Fill the hypothesis function with data as "fuel", it can generate power output and make the learning process run 7.
Loss Function (Loss Function): Provide learning power for machine learning, the The deviation value is obtained by comparing the predicted result of the function with the actual value.
8. The basic mode of machine learning.
9. The optimization method can adjust the parameters of the hypothetical function based on the deviation value to make it approximate
and
fit. Learning: It can be understood as learning with reference answers, specifically, the data set contains prediction results
12. Commonly used machine learning algorithms
1. Linear regression algorithm: the simplest machine learning algorithm, which uses a linear method to solve regression problems
2. Logistic regression classification algorithm: it is the "twin brother" of the linear regression algorithm, and its core idea is still the linear method, which has the ability to solve classification problems
3. KNN classification algorithm: an algorithm that does not rely on mathematical or statistical models, but relies purely on "life experience". It solves classification problems through the idea of "finding the nearest neighbor" 4. Naive Bayesian classification algorithm: the result is not deterministic
but It is probabilistic and solves the classification problem
5. Decision tree classification algorithm: similar to if-else logic for classification
6. Support vector machine classification algorithm: map linearly inseparable data points into linearly separable, and then use the simplest
7. K-means clustering algorithm 8.
Neural network classification algorithm
2. Environment
1. Three-piece set of machine learning
Support library Numpy: a professional support library specially designed for scientific computing
Algorithm library Scikit-Learn: machine learning algorithm library
Data processing library Pandas: built-in many practical functions such as sorting and statistics
2.Numpy
command line installation
pip install -U numpy
Pip download is too slow: you can import -i https://pypi.douban.com/simple as the required end content
import numpy as np
use
3.Scikit-Learn
command line installation
pip install -U scikit-learn -i https://pypi.douban.com/simple
import
import sklearn
use
4.Pandas
command line installation
pip install -U pandas -i https://pypi.douban.com/simple
import
import pandas as pd
use
3. Linear Regression Algorithm (Linear Regression)
1. Using linear models to solve regression problems
2. Regression problems: fitting historical continuous data, predicting future continuous data
3. Learning from mistakes: bias measurement + weight adjustment
4. Mathematical expressions of hypothesis functions
5. Mathematics of loss functions Expression
6. Mathematical Expressions for Optimization Methods
7. Linear Regression Algorithm Information Table
8. Three Steps to a Linear Regression Problem
9. Using the Linear Regression Algorithm in Python
import matplotlib.pyplot as plt # 二维画图
import numpy as np # 科学计算库
from sklearn import linear_model # 机器学习算法库
# 生成数据集
x = np.linspace(-3, 3, 30)
y = 2 * x + 1
# 添加扰动
x = x + np.random.rand(30)
y = y + np.random.rand(30)
# 数据集转换:序列==>矩阵
x.shape = len(x), -1
y.shape = len(y), -1
# 训练线性回归模型
model = linear_model.LinearRegression()
model.fit(x, y)
# 测试输入
x_ = [[1], [2]]
# 预测输出
y_ = model.predict(x_)
print(y_)
# 法向量w和截距b
w = model.coef_
b = model.intercept_
print(w, b)
# 数据集绘图
y2 = w[0][0] * x + b[0] # 拟合直线
plt.scatter(x, y)
plt.plot(x, y2)
plt.show()
4. Logistic Regression Classification Algorithm (Logistic Regression)
1. Classification problem: Compared with regression problem, its predicted value is discrete rather than continuous. Binary classification is the basis of multivariate classification
. Approaching to 0, right approaching to 1
Through the Logistic function, the continuous value can be mapped to the discrete value of the transition, so it is a bridge connecting continuous and discrete The
mathematical expression of the Logistic function is as follows:
Using Logistic regression to solve classification problems Core ideas:
First, use linear equations to draw straight lines.
The second is to "bend" the straight line through the Logistic function to fit the data points of the classification problem in a discrete distribution, which is equivalent to first mapping the classification problem into a regression problem through the Logistic function, and then using a linear model that can solve the regression problem to solve the classification problem question.
3. The idea of using the Logistic function to map continuous values to discrete values
4. The classification category form in machine learning 5.
The hypothesis function of Logistic regression
6. The loss function of Logistic regression
7. Logistic regression classification algorithm information table
8. Logistic regression classification algorithm Step
9. Using the Logistic Regression Algorithm in Python
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris # 导入鸢尾花分类数据集
X, y = load_iris(return_X_y=True) # 载入鸢尾花数据集
clf = LogisticRegression(max_iter=1000).fit(X, y) # 训练模型
y_ = clf.predict(X) # 使用模型进行分类预测
print(y_) # 分类结果
print(clf.score(X, y)) # 性能评估
Five. KNN classification algorithm (K-Nearest Neighbor)
1. Algorithm principles
Like attracts like: For the question of which pile the newly input samples to be classified should be classified into, it is transformed into which pile of samples has the most in common and is most similar to the new sample. Which pile is similar to the new sample will be classified into that pile, that is, which category it will be divided into.
Majority voting: According to the value of each dimension of it, see what kind of points are adjacent to it, according to the principle of majority voting, which classes are in the majority, and which class this new sample belongs to
Proximity voting: With the point to be classified as the center of the circle, you can find out which points are close to it, thus forming its "friend circle". Only the points in the circle have the right to vote on which class this point belongs to, instead of voting by the entire sample
2. Take the sample point to be classified as the center and the nearest K points. Which category accounts for the largest proportion among the K points, and which category the sample points to be classified belong to.
3. How to determine the number of nearest neighbors K?
The parameters that need to be adjusted according to the actual situation in order to obtain a better fitting effect can be set according to experimental methods such as cross-validation and combined with work experience. Generally, the value of K will be between 3 and 10
4. How to determine the nearest neighbor?
The key is what method to use to measure "nearest". This is the primary problem that KNN and related derivative algorithms need to solve. It is a difficulty and an innovation point. It can be measured by Minkowski
Distance
5. KNN algorithm classification process
6. Minkowski distance
When P=1:
When P=2:
7. KNN classification algorithm information table
8. KNN classification algorithm implementation steps
9. Using KNN classification algorithm in Python
from sklearn.datasets import load_iris # 从Scikit-Learn库导入近邻模型中的KNN分类算法
from sklearn.neighbors import KNeighborsClassifier # 载入鸢尾花数据集
X, y = load_iris(return_X_y=True) # 训练模型
clf = KNeighborsClassifier().fit(X, y) # 使用模型进行分类预测
y_ = clf.predict(X)
print(y_) # 预测分类结果
print(clf.score(X, y)) # 算法性能评估
Six. Naive Bayes classification algorithm (Naive Bayes)
1. The core of the Naive Bayesian classification algorithm is the Bayesian formula, and the core of the Bayesian formula is the conditional probability
2. Naive Bayesian: use the "Bayesian formula" under the "naive" assumptions
3. Probability and Conditional probability
4. The essence of conditional probability is to quantify the correlation between X and Y
5. The difference between logic and correlation: logic is causality, and correlation is based on statistical data
6. Bayesian formula prediction The core idea is just 5 words-"It looks more like"
7. Bayesian formula hopes to use known experience to make judgments. Using "experience" to make "judgment", how does experience come from? How to judge with experience? One sentence actually contains two rounds of process.
8. Prior probability, posterior probability and possibility function.
That is, the prior probability can be obtained by modifying the possibility function.
If the probability of occurrence of A is the prior probability, when something B occurs that will affect the probability of A occurring After the occurrence, the probability of A occurring at this time is called the posterior probability
9. The posterior probability of the category and the likelihood of the feature
The posterior probability of the category and
the likelihood of a certain feature represent
10. Mathematics of the Naive Bayesian classification algorithm Analyze
the "simple" assumption: features and features are independent of each other and do not affect each other (this assumption is to solve the lack and incompleteness of data collection, so the more features x, the more prominent these two problems will be, It is more difficult to count the probability of these features appearing at the same time),
so the likelihood of a feature can be simplified as: the
posterior probability is proportional to the likelihood
Naive Bayesian algorithm uses the posterior probability to predict, the core method is through the likelihood The likelihood predicts the posterior probability, and the learning process is the process of continuously increasing the likelihood.
If the equation is used, the probability of co-occurrence of statistical features is still required:
the optimization method of Naive Bayes:
11. Naive Bayes Classification Algorithm Information Table
12. Implementation Steps of Naive Bayes Classification Algorithm
13. Using Naive Bayes Classification Algorithm in Python
from sklearn.datasets import load_iris # 从Scikit-Learn库导入朴素贝叶斯模型中的多项式朴素贝叶斯分类算法
from sklearn.naive_bayes import MultinomialNB # 载入鸢尾花数据集
X, y = load_iris(return_X_y=True) # 训练模型
clf = MultinomialNB().fit(X, y) # 使用模型进行分类预测
y_ = clf.predict(X)
print(y_)
print(clf.score(X, y))
Seven. Decision Tree classification algorithm (Decision Tree)
1. Programmer’s point of view: if-else is matched layer by layer
2. How to choose the judgment condition to generate the judgment branch is the core point of the decision tree algorithm
3. The judgment condition of the decision tree is generated from this feature dimension set
4. How to It is a good decision-making condition: the ideal situation is of course that after the decision-making condition is selected, an if-else just divides the data set into two parts according to the positive class and the negative class. The next best thing is to hope that the fewer impurities in the classification results, the better, that is, the purer the classification results, the better.
5. Measuring rules for node purity
6. Pruning problem of decision tree
The reality is that due to various reasons, such as one-sided collection of data sets or random disturbances, etc., the data may be falsely correlated, and these actually invalid attribute dimensions will be regarded as effective branch judgment conditions by the decision tree algorithm. The decision tree model trained with such a falsely related data set will experience over-learning, and learn the classification decision-making conditions that do not have universal significance, that is, over-fitting, resulting in the classification effectiveness of the decision tree model. reduce.
According to the trigger timing of the pruning operation, it can be basically divided into two types, one is called pre-pruning, and the other is called post-pruning
Regardless of pre-pruning or post-pruning, pruning is divided into two steps: pruning judgment and pruning operation. Only when it is judged that pruning is necessary will the actual pruning operation be performed
7. Basic idea of decision tree classification algorithm
Where does the criterion come from?
This problem is solved in two steps. The first step is the source. The data in the dataset are organized by feature dimensions. These feature dimensions can also be used as a set, called post dimension set, or attribute set. We want to discover the possible relationship between feature dimensions and categories, so the discriminant conditions come from this set.
Which feature dimension should be selected as the discriminant condition of the current if-else?
This requires comparison, and comparison requires standards, so we introduced the concept of "purity", which feature dimension "purification" effect is the best, and which feature dimension is selected as the discriminant condition.
When should the decision tree stop node splitting?
A core of the decision tree classification algorithm is to sequentially select the decision-making conditions in the feature set of the data, that is, to complete the division of the if-else judgment branch.
How to measure the purity of the classification results under different characteristic conditions is the core issue of the decision tree classification algorithm.
8. Decision tree classification algorithm information table
9. Decision tree classification algorithm implementation steps
10. Using decision tree classification algorithm in Python
from sklearn.datasets import load_iris # 从Scikit-Learn库导入决策树模型中的决策树分类算法
from sklearn.tree import DecisionTreeClassifier # 载入鸢尾花数据集
X, y = load_iris(return_X_y=True) # 训练模型
clf = DecisionTreeClassifier().fit(X, y) # 使用模型进行分类预测
y_ = clf.predict(X)
print(y_)
print(clf.score(X, y))
Eight. Support Vector Machine Classification Algorithm (Support Vector Machine)
1. Interval: the distance between different classes. Linearly separable problems can be classified using a straight line in the interval. For linear inseparable problems, high-dimensional mapping processing is required first. 2. Support vectors: data points at the edge of the interval are
called Support vectors, they are very important for correct classification
3. High-dimensional mapping: low-dimensional linear inseparable mapping can be separable after high-dimensional
4. Kernel function: a function that completes high-dimensional mapping in support vector machines
5 .Algorithm Classification Step
6. Algorithm Information Table
7. Using Support Vector Machine Classification Algorithm in Python
from sklearn.datasets import load_iris # 从Scikit-Learn库导入支持向量机算法
from sklearn.svm import SVC # 载入鸢尾花数据集
X, y = load_iris(return_X_y=True) # 训练模型
clf = SVC().fit(X, y) # 默认为径向基rbf,可通过kernel查看
print(clf.predict(X))
print(clf.kernel)
print(clf.score(X, y))
Nine. K-means clustering algorithm
1. The most basic principle of clustering problems: find similarities
2. If there are too many similarities, it is the same class, and if there are too many differences, it is not the same class.
3. Clusters: The sample data sets are finally aggregated into individual "classes" through the clustering algorithm. These classes are called "clusters" in the Chinese terminology of machine learning.
4. The clustering process can be regarded as the process of continuously finding the centroids of the clusters. 5.
The number of different clusters that clustering will eventually produce can be preset as K, that is, the data is classified according to K categories 6.
Centroid: randomly select K points in the data set as centroids, and cluster them around them Classes, we can use the mean to adjust the centroid, so that K randomly selected centroids can finally achieve our desired goal. 7. Majority voting: The K-means algorithm
votes on the clustering problem, which is "Are we the same Cluster", that is, everyone is to be identified, and no sample data point can be used as the center point, so some points need to be selected as the centroid
. The K centroids that can satisfy this "minimum" are the centroids we are looking for.
9. Algorithm information table and implementation steps
10. Use K-means clustering algorithm in Python
# 导入绘图库
import matplotlib.pyplot as plt
# #从Scikit-Learn库导入聚类模型中的K-means聚类算法
from sklearn.cluster import KMeans
# #导入聚类数据生成工具
from sklearn.datasets import make_blobs
# 用sklearn自带的make_blobs方法生成聚类测试数据
n_samples = 1500
# #该聚类数据集共1500个样本
X, y = make_blobs(n_samples=n_samples)
# #进行聚类,这里n_clusters设定为3,也即聚成3个簇
y_pred = KMeans(n_clusters=3).fit_predict(X)
# #用点状图显示聚类效果
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()
10. Artificial Neural Network (ANN)
1. The neural network algorithm has "three treasures", neuron, activation function and backpropagation mechanism.
2. Neurons
3. Excitation transmission
4. Activation function
5. Backpropagation mechanism
6. The core working mechanism of neurons is to decide whether to activate or not according to the stimulus, and the activation will continue to transmit the stimulus forward, otherwise the stimulus will be interrupted here and will not affect the final output
7. Neural network structure