Corrected "Classic Examples of Python Machine Learning" p38-p45 page 2.9 "Assessing Quality Based on Car Features"

When using python3.5 to re-verify the code in "Classic Examples of Python Machine Learning", I often encounter various warnings and errors.

Generally speaking, the warning comes from the update of the function library. The original book uses python2.x, and the function library is old. Some modules have been merged. Remember to update the name when calling the corresponding method, otherwise the red warning will hurt your eyes.

The exception error thrown is centered on this line of code in the coding test for a single data example on page P40:

input_data_encoded[i]=int(label_encoder[i].transform([input_data[i]]))

The exception is:

raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape ()

After step-by-step analysis, it is found that the value of input_data[i] is a single string, but the parameters in the transform method need a list format, so it is changed to: [input_data[i]]

After solving this problem, the second question is:

output_class=classifier.predict(input_data_encoded)

The error is:

ValueError: Expected 2D array, got 1D array instead:
array=[ 3.  3.  0.  0.  2.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

This problem is similar to the first one, to reshape input_data_encoded into a 1D array.

The value of input_data_encoded before reshaping is: [0 0 1 1 2 0]

The reshape code is:

input_data_encoded=input_data_encoded.reshape(1,6)

After reshape it is: [[0 0 1 1 2 0]]

The above is the difficulty of this section. The code after all corrections is as follows:

import numpy as np
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.model_selection import validation_curve
from sklearn.model_selection import learning_curve

#Display Chinese font when drawing
from pylab import mpl
mpl.rcParams ['font.sans-serif'] = ['SimHei']


input_file=r'D:\python\AI\2\car.data.txt'
x=[]
count=0

with open(input_file,'r') as f:
    for line in f.readlines():
        data=line[:-1].split(',')
        x.append(data)

x=np.array(x)

# string to value
label_encoder=[]
x_encoded=np.empty(x.shape)

for i,item in enumerate(x[0]):
    label_encoder.append(preprocessing.LabelEncoder())
    label_encoder[-1].fit(x[:,i])
    x_encoded[:,i]=label_encoder[-1].transform(x[:,i])

x=x_encoded[:, :-1].astype(int)
y=x_encoded[:,-1].astype(int)

#train classifier
params={'n_estimators':200,'max_depth':8,'random_state':7}
classifier=RandomForestClassifier(**params)
classifier.fit(x,y)

#Cross-validation
accuracy=model_selection.cross_val_score(classifier,x,y,scoring='accuracy',cv=3)
print('Accuracy of the classifier: '+str(round(100*accuracy.mean(),2))+'%')

#Code test on a single data example
input_data=['high','high','3','4','small','high']
input_data_encoded=[-1]*len(input_data)


for i,item in enumerate(input_data):
    input_data_encoded[i]=int(label_encoder[i].transform([input_data[i]]))

input_data_encoded=np.array(input_data_encoded)
print('Array before reshaping:', input_data_encoded)

# predict and print the output of the data points
#reshape the array
input_data_encoded=input_data_encoded.reshape(1,6)
print('After the array is reshaped:', input_data_encoded)
output_class=classifier.predict(input_data_encoded)
print('Output class（输出类型）:',label_encoder[-1].inverse_transform(output_class)[0])

#define the hyperparameters of the random forest regressor
#Test the effect of the number of evaluators on the classifier
classifier = RandomForestClassifier(max_depth=4, random_state=7)
parameter_grid = np.linspace(25, 200, 8).astype(int)
train_scores, validation_scores = validation_curve(classifier, x, y,
        'n_estimators', parameter_grid, cv=5)
print('\n##### Verification curve #####')
print('\nParam: n_estimators\nTraining scores:\n', train_scores)
print('\nParam: n_estimators\nValidation scores:\n', validation_scores)

#paint
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title(u'Training curve')
plt.xlabel(u'Number of estimators')
plt.ylabel(u'Accuracy')
plt.show()

#Test the effect of the maximum depth parameter on the classifier
classifier = RandomForestClassifier(n_estimators=20, random_state=7)
parameter_grid = np.linspace(2, 10, 5).astype(int)
train_scores, valid_scores = validation_curve(classifier, x, y,
        'max_depth', parameter_grid, cv=5)
print(u'\nParam: max_depth\nTraining scores:\n', train_scores)
print(u'\nParam: max_depth\nValidation scores:\n', validation_scores)

#paint
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title(u'Validation curve(validation curve)')
plt.xlabel(u'Maximum depth of the tree')
plt.ylabel(u'Accuracy')
plt.show()


#generate learning curve
classifier = RandomForestClassifier(random_state=7)

parameter_grid = np.array([200, 500, 800, 1100])
train_sizes, train_scores, validation_scores = learning_curve(classifier,
        x, y, train_sizes=parameter_grid, cv=5)

print('\n##### Learning curve #####')
print('\nTraining scores:\n', train_scores)
print('\nValidation scores:\n', validation_scores)

#paint
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title(u'Learning curve')
plt.xlabel(u'Number of training samples')
plt.ylabel(u'Accuracy')
plt.show()

Corrected "Classic Examples of Python Machine Learning" p38-p45 page 2.9 "Assessing Quality Based on Car Features"

Guess you like