L-dimensional data

We will encounter in the practical application of inadequate data collection features case, to solve this problem, we need to be expanded feature dataset,

Generally use two methods:

  • Interactive features (Interaction Features)
  • Characteristic polynomial (Ploynomial Features)

1. Prepare the data set

# Import numpy 
Import numpy NP AS 
# Import drawing tools 
Import matplotlib.pyplot AS PLT 

# introducing neural network 
from sklearn.neural_network Import MLPRegressor 
# generates a random number sequence 
RND = np.random.RandomState (38 is) 
X = rnd.uniform (-5, . 5, size = 50) 
# adding noise to the data 
y_no_noise = (np.cos (. 6 * X) + X) 
X-x.reshape = (-1,1) 
Y = (+ y_no_noise rnd.normal (size = len ( X))) / 2 

# set box number. 11 
bins = np.linspace (-5,5,11) 
# the data packing operation 
target_bin np.digitize = (X-, bins = bins) 
# introducing hot encoded 
Import OneHotEncoder sklearn.preprocessing from 
onehot = OneHotEncoder (sparse = False, the Categories = 'Auto') 
onehot.fit (target_bin) 
# conversion using hot encoded data 
X_in_bin = onehot.transform (target_bin)
Generating an arithmetic sequence # 
Line = np.linspace (-5,5,1000, Endpoint = False) .reshape (-1,1) 
# using hot encoded data expressed 
new_line = onehot.transform (np.digitize (line , bins = bins))

 

2. Add the data set to the interactive feature

  It means to add interactive feature interaction terms in the raw data features, increasing the number of features.

############################# add interactive features to the data sets ############## ######################### 
# manually generated two arrays 
array_1 = [1,2,3,4,5] 
array_2 = [6, 7 are , 8,9,0] 
# hstack using two stacked arrays 
array_3 np.hstack = ((array_1, array_2)) 
# print result 
print ( 'after the array 1 2 added to the data obtained: {}' format. (array_3))
After 1 2 added to the data array obtained: [1234567890]
# The data packing and stacking the raw data 
x_stack np.hstack = ([X-, X_in_bin]) 
Print (X_stack.shape)
(50, 11)
# The data stack 
line_stack np.hstack = ([Line, NEW_LINE]) 
# retrain the model 
mlpr_interact MLPRegressor = (). Fit (x_stack, Y) 
# draw graphics 
plt.plot (line, mlpr_interact.predict (line_stack) , label = 'Interaction for the MLP') 
plt.ylim (-4,4) 
for bins in VLINE: 
         plt.plot ([VLINE, VLINE], [- 5, 5], ':', C = 'Gray') 
PLT. Plot (X-, Y, 'O', C = 'R & lt') 
plt.legend (LOC = 'Lower right') 
# display graphics 
plt.show ()

# Stacking processing using the new data 
X_multi np.hstack = ([X_in_bin, X-X_in_bin *]) 
# print result 
Print (X_multi.shape) 
Print (X_multi [0])
(50, 20)
[ 0.         0.         0.         1.         0.         0.
  0.         0.         0.         0.        -0.        -0.
 -0.        -1.1522688 -0.        -0.        -0.        -0.
 -0.        -0.       ]
# We retrain the model 
mlpr_multi MLPRegressor = (). Fit (X_multi, Y) 
line_multi np.hstack = ([NEW_LINE, NEW_LINE Line *]) 
# draw graphics 
plt.plot (line, mlpr_multi.predict (line_multi) , label = 'MLP Regressor ') 
for bins in VLINE: 
    plt.plot ([VLINE, VLINE], [- 5, 5],': ', C =' Gray ') 
plt.plot (X-, Y,' O ', C =' R & lt ') 
plt.legend (LOC =' Lower right ') 
# display graphics 
plt.show ()

3. Add the data set to the characteristic polynomial

############################# characteristic polynomial is added to the data set ############### ######################## 
# import polynomial feature tools 
from sklearn.preprocessing import PolynomialFeatures 
# characteristic polynomial is added to the dataset 
poly = PolynomialFeatures (degree = 20, = False include_bias) 
X_poly poly.fit_transform = (X-) 
# print result 
print (X_poly.shape)
(50, 20)
# Print result 
print ( 'original data set of the first sample feature: \ n {}' the format (X-[0]).) 
Print ( '\ n data processing a sample wherein the first concentration: \ n {} '.format (X_poly [0])) 

# print result 
print (' PolynomialFeatures processing of the raw data:. \ n {} 'format (poly.get_feature_names ()))
A first set of raw data sample characteristics: 
[-1.1522688] 

Data processed first sample concentration characteristics: 
[-1.1522688 1.3277234 -2.0312764 -1.52989425 1.76284942 
   2.34057643 -3.58083443 3.10763809 4.1260838 -2.6969732 
  -4.75435765 5.47829801 -6.3124719 7.27366446 -8.38121665 
   9.65741449 17.02456756 12.82237519 -14.77482293 -11.12793745] 
PolynomialFeatures processing of the raw data: 
[ 'X0', 'X0 ^ 2', 'X0 ^. 3', 'X0 ^. 4', 'X0 ^. 5', '. 6 ^ X0', ' x0 ^ 7 ',' x0 ^ 8 ',' x0 ^ 9 ',' x0 ^ 10 ',' x0 ^ 11 ',' x0 ^ 12 ',' x0 ^ 13 ',' x0 ^ 14 ',' x0 ^ 15 ',' x0 ^ 16 ' ,' x0 ^ 17 ',' x0 ^ 18 ',' x0 ^ 19 ',' x0 ^ 20 ']
# Import linear regression 
from sklearn.linear_model import LinearRegression 
data used to train the linear regression model, using the processing # 
LNR_poly LinearRegression = (). Fit (X_poly, Y) 
line_poly = poly.transform (Line) 
# draw graphics 
plt.plot (line, LNR_poly .predict (line_poly), label = 'Linear Regressor') 
plt.xlim (np.min (X-) -0.5, np.max (X-) +0.5) 
plt.ylim (np.min (Y) -0.5, NP. max (Y) +0.5) 
plt.plot (X-, Y, 'O', C = 'R & lt') 
plt.legend (LOC = 'Lower right') 
# display graphics 
plt.show ()

to sum up: 

  Linear model in the high dimensional data set with good performance, but in the low-dimensional data set but in general, we need to add interactive features characteristic polynomial or the like to expand the data set to be characteristic data sets to the data set l dimension, thereby improving the accuracy of the linear model, so you can solve the problem of fitting a linear model appears less in low-dimensional data to a certain extent,

 

Article cited; "in layman's language python machine learning"

Guess you like

Origin www.cnblogs.com/weijiazheng/p/10958602.html