Original link: http://tecdat.cn/?p=6691
Neural networks has been one of the charming machine learning models, not only because of fancy back-propagation algorithm, but also because of their complexity (taking into account the many hidden layers of depth learning) and inspired by the brain's structure.
Neural networks are not always popular, in part because they are still computationally expensive and in some cases, partly because in comparison with support vector machine (SVM) and other simple methods, they do not seem to produce better results. However, neural networks once again attracted people's attention and become popular.
In this article, we will fit neural networks and linear models for comparison.
data set
Boston Boston suburb of the data set is a collection of home value data. Our goal is to use all other available continuous variables to predict the value of owner-occupied housing (medv) of.
First of all, we need to check whether data is missing, or we need to fix the data set.
Then we fit linear regression model was developed and tested on the test set.
This sample(x,size)
function simply randomly selected sample of the vector specifying the size of the output from the vector x
.
Ready to adapt neural network
Before fitting neural networks, we need to do some preparation work. Neural networks are not easy to train and adjust.
As a first step , we will solve the problem of data preprocessing.
Therefore, we divided the data before continuing:
Please note that scale
returns need to cast to data.frame matrix.
parameter
As far as I know, although there are a few rules of thumb are more or less acceptable, but there is no fixed rule can use the number of layers and neurons. Typically, if necessary, a hidden layer is sufficient to meet the high demand applications. In terms of number of neurons, which should be between the size of the input layer and output layer size, typically 2/3 of the input magnitude
- This
hidden
parameter accepts a vector of the number of neurons in each hidden layer contains, and the parameterlinear.output
used to specify whether we want to returnlinear.output=TRUE
or classificationlinear.output=FALSE
Neuralnet package provides a good tool to draw the model:
This is a graphical representation of the model, each connection has the right weight:
Black line shows the connection between the weight of each layer on each connection weight, and the blue line shows the error term added in each step. Deviation may be considered intercept linear model.
Using a neural network forecasting medv
Now we can try to predict the value of the test set and calculate the MSE.
Then we compared two MSE
Obviously, when predicting medv, network better than the linear model. Again, be careful, because this depends on the results of tests performed on top of the train division. Now, after a visual map, we will quickly cross-validation, in order to have more confidence in the results.
Following a visual drawn first on the method of linear model and network performance test set
FIG output
The following draws a visual comparison might be more useful:
Cross-validation
Cross-validation is another very important step in constructing the prediction model. Although there are different types of cross validation
Then by calculating the average error, we can grasp the mode of operation of the model.
We will use the neural network and linear models for the cycle cv.glm()
of boot
the function package to achieve fast cross-validation.
To my knowledge, R no built-in functions carried out cross-validation on this neural network, if you know this function, please let me know in the comments. The following is the 10-fold cross validation linear model MSE:
Please note that I am splitting data in this way: 90% of the training set and 10% of the test set in a random manner 10 times. I also use the plyr
library initialization progress bar, because I want to pay close attention to the state of the process, because the fitting neural network may take some time.
After a child, the process is completed, we calculate the average MSE and the results plotted as box plots
The above code outputs the boxplot:
Neural Network average MSE (10.33) is lower than the MSE linear models, although there appears to be cross-validated MSE vary somewhat. This may depend on the re-division data network or randomly initialized.
For the final model explained interpretability
Neural network much like a black box: explain their interpretation of the results than the simple model (linear model) results to be much more difficult. Therefore, you need to be based on the type of application, you might also want to consider this factor. Also, as you can see above, need to be careful to adapt the neural network, small changes can lead to different results.
Thank you for reading this article, you have any questions please leave a comment below!
Have questions please contact us!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!
Have questions please contact us!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!
Have questions please contact us!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!
Have questions please contact us!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!
Have questions please contact us!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!