机器学习 工作日记

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/github_37856429/article/details/77337187

10种机器学习算法的要点以及Python实现

广义来说,有三种机器学习算法

1、监督式学习

  工作机制:这个算法由一个目标变量或结果变量(或因变量)组成。这些变量由已知的一系列预示变量(自变量)预测而来。利用这一系列变量,我们生成一个将输入值映射到期望输出值的函数。这个训练过程会一直持续,直到模型在训练数据上获得期望的精确度。监督式学习的例子有:回归、决策树、随机森林、K – 近邻算法、逻辑回归等。

2、非监督式学习

  工作机制:在这个算法中,没有任何目标变量或结果变量要预测或估计。这个算法用在不同的组内聚类分析。这种分析方式被广泛地用来细分客户,根据干预的方式分为不同的用户组。非监督式学习的例子有:关联算法和 K – 均值算法。

3、强化学习

  工作机制:这个算法训练机器进行决策。它是这样工作的:机器被放在一个能让它通过反复试错来训练自己的环境中。机器从过去的经验中进行学习,并且尝试利用了解最透彻的知识作出精确的商业判断。 强化学习的例子有马尔可夫决策过程。

常用的机器学习算法名单

1.线性回归

python代码
#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets

# Create linear regression object
linear = linear_model.LinearRegression()

# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)

#Equation coefficient and Intercept
print('Coefficient: n', linear.coef_)
print('Intercept: n', linear.intercept_)

#Predict Output
predicted= linear.predict(x_test)

2.逻辑回归

python代码
#Import Library
from sklearn.linear_model import LogisticRegression

# Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create logistic regression object
model= LogisticRegression()

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Equation coefficient and Intercept
print('Coefficient: n', model.coef_)
print('Intercept: n', model.intercept_)

#Predict Output
predicted= model.predict(x_test)

更进一步

尝试更多的方法来改进这个模型

  • 加入交互项
  • 精简模型特性
  • 使用正则化方法
  • 使用非线性模型

3.决策树

python代码
#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import tree

# Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_datase

# Create tree object 
model = tree.DecisionTreeClassifier(criterion='gini') # for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini 

# model = tree.DecisionTreeRegressor() for regression
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Predict Output
predicted= model.predict(x_test)

4. SVM支持向量机

python代码
#Import Library
from sklearn import svm

#Assumed you have, X (predicor)and Y (target) for training data set and x_test(predictor) of test_datase
# Create SVM classification object
model = svm.svc() # there is various option associated with it, this is simple for classification. You can refer link, for mo# re detail.

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

#Predict Output
predicted= model.predict(x_test)

5.朴素贝叶斯

python代码
#Import Library
from sklearn.naive_bayes import GaussianNB

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create SVM classification object 
model = GaussianNB() # there is other distribution for multinomial classes like Bernoulli Naive Bayes, Refer link

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)
6.KNN(K最近邻算法)
  • KNN的计算成本很高
  • 变量应该先标准化(normalized),不然会被更高范围的变量偏倚
  • 在使用KNN之前,要在野值去除和噪音去除等前期处理多花工夫
python代码
#Import Library
from sklearn.neighbors import KNeighborsClassifier

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create KNeighbors classifier object model
KNeighborsClassifier(n_neighbors=6) # default value for n_neighbors is 5

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)

7.K均值算法

python算法
#Import Library
from sklearn.cluster import KMeans

#Assumed you have, X (attributes) for training data set and x_test(attributes) of test_dataset
# Create KNeighbors classifier object model
k_means = KMeans(n_clusters=3, random_state=0)

# Train the model using the training sets and check score
model.fit(X)

#Predict Output
predicted= model.predict(x_test) 

8.随机森林算法

python算法
#Import Library
from sklearn.ensemble import RandomForestClassifie

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Random Forest object
model= RandomForestClassifier()

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test)

9.降维算法(PCA ,FA)

python代码
#Import Library
from sklearn import decomposition

#Import Library
from sklearn import decomposition

#Assumed you have training and test data set as train and test
# Create PCA obeject
pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)

# For Factor analysis
#fa= decomposition.FactorAnalysis()

# Reduced the dimension of training dataset using PCA
train_reduced = pca.fit_transform(train)

#Reduced the dimension of test dataset
test_reduced = pca.transform(test)

10.Gradient Boost和Adaboost算法

python代码

#Import Library
from sklearn.ensemble import GradientBoostingClassifier

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Gradient Boosting Classifier object
model= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)

# Train the model using the training sets and check score
model.fit(X, y)

#Predict Output
predicted= model.predict(x_test

猜你喜欢

转载自blog.csdn.net/github_37856429/article/details/77337187