R language fitting neural network prediction and visualization of results

Original link: http://tecdat.cn/?p=6691

 

Neural networks has been one of the charming machine learning models, not only because of fancy back-propagation algorithm, but also because of their complexity (taking into account the many hidden layers of depth learning) and inspired by the brain's structure.

Neural networks are not always popular, in part because they are still computationally expensive and in some cases, partly because in comparison with support vector machine (SVM) and other simple methods, they do not seem to produce better results. However, neural networks once again attracted people's attention and become popular.

 In this article, we will fit neural networks and linear models for comparison.

data set

Boston Boston suburb of the data set is a collection of home value data. Our goal is to use all other available continuous variables to predict the value of owner-occupied housing (medv) of.

First of all, we need to check whether data is missing, or we need to fix the data set.

apply(data,2,function(x)sum(is.na(x)))
 

Then we fit linear regression model was developed and tested on the test set.

index < -  sample(1:nrow(data),round(0.75 * nrow(data)))
   MSE.lm < -  sum((pr.lm  -  test $ medv)^ 2)/ nrow(test)

This sample(x,size)function simply randomly selected sample of the vector specifying the size of the output from the vector x.

Ready to adapt neural network

Before fitting neural networks, we need to do some preparation work. Neural networks are not easy to train and adjust.

As a first step , we will solve the problem of data preprocessing.
Therefore, we divided the data before continuing:

maxs < -  apply(data,2,max) 
 scaled < -  as.data.frame(scale(data,center = mins,scale = maxs  -  mins))
train_ < -  scaled [index,]
test_ < -  scaled [-index,]

Please note that scalereturns need to cast to data.frame matrix.

parameter

As far as I know, although there are a few rules of thumb are more or less acceptable, but there is no fixed rule can use the number of layers and neurons. Typically, if necessary, a hidden layer is sufficient to meet the high demand applications. In terms of number of neurons, which should be between the size of the input layer and output layer size, typically 2/3 of the input magnitude

 

  • This hiddenparameter accepts a vector of the number of neurons in each hidden layer contains, and the parameter linear.outputused to specify whether we want to return linear.output=TRUEor classificationlinear.output=FALSE

Neuralnet package provides a good tool to draw the model:

plot(nn)
 

This is a graphical representation of the model, each connection has the right weight:


Black line shows the connection between the weight of each layer on each connection weight, and the blue line shows the error term added in each step. Deviation may be considered intercept linear model. 

Using a neural network forecasting medv

Now we can try to predict the value of the test set and calculate the MSE. 

pr.nn < -  compute(nn,test _ [,1:13])

Then we compared two MSE

Obviously, when predicting medv, network better than the linear model. Again, be careful, because this depends on the results of tests performed on top of the train division. Now, after a visual map, we will quickly cross-validation, in order to have more confidence in the results.
Following a visual drawn first on the method of linear model and network performance test set

FIG output by visual inspection diagram, we can see the prediction neural network (typically) more concentrated around the line (the line is perfectly aligned MSE will indicate 0, and therefore is an ideal perfect prediction), rather than a linear model.

The following draws a visual comparison might be more useful:


 Cross-validation

Cross-validation is another very important step in constructing the prediction model. Although there are different types of cross validation 

Then by calculating the average error, we can grasp the mode of operation of the model.

We will use the neural network and linear models for the cycle cv.glm()of bootthe function package to achieve fast cross-validation.
To my knowledge, R no built-in functions carried out cross-validation on this neural network, if you know this function, please let me know in the comments. The following is the 10-fold cross validation linear model MSE:

 
lm.fit < -  glm(medv~。,data = data) 

 Please note that I am splitting data in this way: 90% of the training set and 10% of the test set in a random manner 10 times. I also use the plyrlibrary initialization progress bar, because I want to pay close attention to the state of the process, because the fitting neural network may take some time.

After a child, the process is completed, we calculate the average MSE and the results plotted as box plots

 
cv.error
10.32697995
17.640652805 6.310575067 15.769518577 5.730130820 10.520947119 6.121160840
6.389967211 8.004786424 17.369282494 9.412778105 

 

The above code outputs the boxplot:


Neural Network average MSE (10.33) is lower than the MSE linear models, although there appears to be cross-validated MSE vary somewhat. This may depend on the re-division data network or randomly initialized.

For the final model explained interpretability

Neural network much like a black box: explain their interpretation of the results than the simple model (linear model) results to be much more difficult. Therefore, you need to be based on the type of application, you might also want to consider this factor. Also, as you can see above, need to be careful to adapt the neural network, small changes can lead to different results.

 

Thank you for reading this article, you have any questions please leave a comment below!

 

Have questions please contact us!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

QQ exchange group: 186 388 004 Big Data tribe

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Have questions please contact us!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

QQ exchange group: 186 388 004 Big Data tribe

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Have questions please contact us!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

QQ exchange group: 186 388 004 Big Data tribe

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Have questions please contact us!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

QQ exchange group: 186 388 004 Big Data tribe

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Have questions please contact us!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

QQ exchange group: 186 388 004 Big Data tribe

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Guess you like

Origin www.cnblogs.com/tecdat/p/11522424.html