机器学习工作日记

10种机器学习算法的要点以及Python实现

广义来说，有三种机器学习算法

1、监督式学习

工作机制：这个算法由一个目标变量或结果变量（或因变量）组成。这些变量由已知的一系列预示变量（自变量）预测而来。利用这一系列变量，我们生成一个将输入值映射到期望输出值的函数。这个训练过程会一直持续，直到模型在训练数据上获得期望的精确度。监督式学习的例子有：回归、决策树、随机森林、K – 近邻算法、逻辑回归等。

2、非监督式学习

工作机制：在这个算法中，没有任何目标变量或结果变量要预测或估计。这个算法用在不同的组内聚类分析。这种分析方式被广泛地用来细分客户，根据干预的方式分为不同的用户组。非监督式学习的例子有：关联算法和 K – 均值算法。

3、强化学习

工作机制：这个算法训练机器进行决策。它是这样工作的：机器被放在一个能让它通过反复试错来训练自己的环境中。机器从过去的经验中进行学习，并且尝试利用了解最透彻的知识作出精确的商业判断。强化学习的例子有马尔可夫决策过程。

常用的机器学习算法名单

1.线性回归

python代码

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets

# Create linear regression object
linear = linear_model.LinearRegression()

# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)

#Equation coefficient and Intercept
print('Coefficient: n', linear.coef_)
print('Intercept: n', linear.intercept_)

#Predict Output
predicted= linear.predict(x_test)

2.逻辑回归

python代码

#Import Library
from sklearn.linear_model import LogisticRegression

# Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create logistic regression object
model= LogisticRegression()

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Equation coefficient and Intercept
print('Coefficient: n', model.coef_)
print('Intercept: n', model.intercept_)

#Predict Output
predicted= model.predict(x_test)

更进一步

尝试更多的方法来改进这个模型

加入交互项
精简模型特性
使用正则化方法
使用非线性模型

3.决策树

python代码

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import tree

# Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_datase

# Create tree object 
model = tree.DecisionTreeClassifier(criterion='gini') # for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini 

# model = tree.DecisionTreeRegressor() for regression
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Predict Output
predicted= model.predict(x_test)

4. SVM支持向量机

python代码

#Import Library
from sklearn import svm

#Assumed you have, X (predicor)and Y (target) for training data set and x_test(predictor) of test_datase
# Create SVM classification object
model = svm.svc() # there is various option associated with it, this is simple for classification. You can refer link, for mo# re detail.

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Predict Output
predicted= model.predict(x_test)

5.朴素贝叶斯

python代码

#Import Library
from sklearn.naive_bayes import GaussianNB

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create SVM classification object 
model = GaussianNB() # there is other distribution for multinomial classes like Bernoulli Naive Bayes, Refer link

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)

6.KNN(K最近邻算法)

KNN的计算成本很高
变量应该先标准化（normalized），不然会被更高范围的变量偏倚
在使用KNN之前，要在野值去除和噪音去除等前期处理多花工夫

python代码

#Import Library
from sklearn.neighbors import KNeighborsClassifier

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create KNeighbors classifier object model
KNeighborsClassifier(n_neighbors=6) # default value for n_neighbors is 5

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)

7.K均值算法

python算法

#Import Library
from sklearn.cluster import KMeans

#Assumed you have, X (attributes) for training data set and x_test(attributes) of test_dataset
# Create KNeighbors classifier object model
k_means = KMeans(n_clusters=3, random_state=0)

# Train the model using the training sets and check score
model.fit(X)

#Predict Output
predicted= model.predict(x_test)

8.随机森林算法

python算法

#Import Library
from sklearn.ensemble import RandomForestClassifie

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Random Forest object
model= RandomForestClassifier()

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)

9.降维算法（PCA ,FA)

python代码

#Import Library
from sklearn import decomposition

#Import Library
from sklearn import decomposition

#Assumed you have training and test data set as train and test
# Create PCA obeject
pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)

# For Factor analysis
#fa= decomposition.FactorAnalysis()

# Reduced the dimension of training dataset using PCA
train_reduced = pca.fit_transform(train)

#Reduced the dimension of test dataset
test_reduced = pca.transform(test)

10.Gradient Boost和Adaboost算法

python代码

#Import Library
from sklearn.ensemble import GradientBoostingClassifier

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Gradient Boosting Classifier object
model= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test

机器学习 工作日记

10种机器学习算法的要点以及Python实现

广义来说，有三种机器学习算法

1、监督式学习

2、非监督式学习

3、强化学习

常用的机器学习算法名单

1.线性回归

python代码

2.逻辑回归

python代码

更进一步

3.决策树

python代码

4. SVM支持向量机

python代码

5.朴素贝叶斯

python代码

6.KNN(K最近邻算法)

python代码

7.K均值算法

python算法

8.随机森林算法

python算法

9.降维算法（PCA ,FA)

python代码

10.Gradient Boost和Adaboost算法

python代码

猜你喜欢

机器学习工作日记