Preface: According to the target value of the machine learning data set is discrete or continuous, the processing algorithm includes classification and regression.
sklearn tutorial https://www.jianshu.com/p/6ada34655862
table of Contents
Classification algorithm
k-nearest neighbor algorithm
Algorithm idea: A sample is the most similar to the k samples in the data set. If most of the k samples belong to one category, then the sample is recognized as belonging to this category;
The most common representation of the distance between two points or between multiple points, also known as Euclidean metric, is defined in Euclidean space. The Euclidean distance between two points x1(x11,x12,…,x1n) and x2(x21,x22,…,x2n) in n-dimensional space
achieve:
https://www.cnblogs.com/xiaotan-code/p/6680438.html
from sklearn.neighbors import KNeighborsClassifier
# 导包
knn = KNeighborsClassifer()
# 定义一个分类器对象
knn.fit([特征值],[目标值])
# 调用模型
Naive Bayes
Algorithm idea: https://blog.csdn.net/Growing_hacker/article/details/89790230
achieve
from sklearn.naive_bayes import MultinomialNB
# 导包
clf = MultinomialNB()
# 实例化分类器
clf.fit([特征值],[目标值])
# 调用训练模型
https://blog.csdn.net/Growing_hacker/article/details/89790230
Decision tree, random forest
Algorithm idea: According to information entropy and information gain, find out the classification standard
https://blog.csdn.net/Growing_hacker/article/details/89816012
achieve
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
# 导包
Classification regression
Algorithm idea: To solve the two-classification problem, the result of linear regression is used as the input of classification regression, and the maximum likelihood loss function is used to find the weight. Different thresholds produce different prediction results
achieve
from sklearn.linear_model import LogisticRegression
# 导包
classifier = LogisticRegression(random_state=37)
# 实例分类器对象
classifier.fit(X, y)
# 回归分类器进行训练
Regression algorithm
Linear regression
Algorithm idea: According to the loss function, continuously adjust the weight to make the value of the loss function smaller; solve the weight by the characteristic equation or gradient descent method
https://www.cnblogs.com/geo-will/p/10468253.html
achieve
from sklearn.linear_model import LinearRegression