Data Analysis: Improving Algorithms Through Integration

1. Algorithm integration method¶

     1. Bagging: Obtain the optimal solution by voting for a given combination. For example, if you are sick, you go to n hospitals to see n doctors, and each doctor prescribes a prescription for you. In the final result, which prescription appears more often, it means that this prescription is more likely to be the most effective remedy. Solution, this is easy to understand. The bagging algorithm is this idea.

     2. Boosting: A method used to improve the accuracy of weak classification algorithms by constructing a series of prediction functions, and then combining them into a prediction function in a certain way.

     3. Voting: Two or more algorithm models are packaged with voting algorithms to calculate the average prediction of each sub-model

Second, the bagging algorithm

     1. Bagging decision tree

     Very effective in cases where the data has a lot of variance

     2. Random Forest

     It is to build a forest in a random way. There are many decision trees in the forest. There is no relationship between each decision tree in the random forest. After getting the forest, when a new input sample enters, let each decision tree in the forest make a judgment to see which class the sample should belong to, and then see which class is selected the most , it is predicted that the sample belongs to that class.

     3. Extreme Random Decision Trees

3. Lifting Algorithm

     1.AdaBoost

     The core idea is to train different classifiers (weak classifiers) for the same training set, and then combine these weak classifiers to form a stronger final classifier (strong classifier)

     2. Stochastic Gradient Ascent

     Gradient ascent is based on the idea that the best way to find the maximum value of a function is to follow the gradient direction of the function. The gradient operator always points in the direction of the fastest growing function value

4. Voting Algorithm

5. Test

In [22]:
from pandas import read_csv
import pandas as pd
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from matplotlib import pyplot
from sklearn.preprocessing import LabelEncoder

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

# 导入数据
iris =pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',header=None)
iris.columns=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm','Species'] 

# 将数据分为输入数据和输出结果
arrary = iris.values
# print(arrary)
X =arrary[:,0:4]

le = LabelEncoder()
le.fit(iris['Species'])   
Y = le.transform(iris['Species']) # 对花的类别进行编号处理

num_folds = 10
seed = 7
kfold = KFold(n_splits=num_folds, random_state=seed)
models = {}

# 装袋决策树
cart = DecisionTreeClassifier()
num_tree = 100
models['BC'] = BaggingClassifier(base_estimator=cart, n_estimators=num_tree, random_state=seed)
# 随机森林
max_features = 3
models['BFC'] = RandomForestClassifier(n_estimators=num_tree, random_state=seed, max_features=max_features)
# 极端随机数
max_features = 3
models['ETC'] = ExtraTreesClassifier(n_estimators=num_tree, random_state=seed, max_features=max_features)
# AdaBoost
models['ABC'] = AdaBoostClassifier(n_estimators=num_tree, random_state=seed)
# 随机梯度上升
models['GBC'] = GradientBoostingClassifier(n_estimators=num_tree, random_state=seed)

# 投票算法
cart = DecisionTreeClassifier()
models2 = []
model_ld = LinearDiscriminantAnalysis()
models2.append(('ld', model_ld ))
# model_cart = DecisionTreeClassifier()
# models2.append(('cart', model_cart))
model_svc = SVC()
models2.append(('svm', model_svc))
models['VC'] = VotingClassifier(estimators=models2)

results = []
for name in models:
    result = cross_val_score(models[name], X, Y, cv=kfold)
    results.append(result)
    msg = '%s: %.6f (%.6f)' % (name, result.mean(), result.std())
    print(msg)

# 图表显示
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
pyplot.boxplot(results)
ax.set_xticklabels(models.keys())
pyplot.show()
BFC: 0.946667 (0.071802)
ABC: 0.913333 (0.143139)
GBC: 0.940000 (0.075719)
BC: 0.946667 (0.071802)
ETC: 0.940000 (0.075719)
VC: 0.946667 (0.077746)
In [ ]:
 

六、git与参考

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325754898&siteId=291194637
Recommended