I do not know may not excel in, but alone will be able to excel Fun logistic regression? Some small partners really want someone to do something to this.
Today article, Xiaowen band we will go first to use Excel to implement a simple logistic regression model. Let the complex from the simple, step by step .
1, Base model
You should first try to achieve a logistic regression model Base, that is one-step update model. Here we use the iris data set. However, a total of three iris dataset, it FIG simple, using the first two data sets, the category flag is 0 setosa, another flag is 1, and selects each positive and negative samples 10. Excel data are as follows:
Then, set an initial estimate, we strive to make estimates between [-0.5, 0.5):
After each parameter such treatment, we get again copy and paste parameters, or else every move excel, parameters will follow refresh mode select Paste values only:
With samples and parameters, we can calculate the estimates, first review the estimates logistic regression (ie, predicted probability 1) is calculated:
In excel, to achieve similar such w * x, sumproduct required function, for a simple example:
The result is 3 * 2 * 2 + 3 + 4 + 5 * 4 * 5 = 54
Therefore, the calculation of the estimated value of the logistic regression, the following equations can be used in excel:
In this way, we will calculate the estimated value of each sample of the good:
The next step is to calculate a single sample loss, the following equation (a less negative sign):
In excel, using the following formula:
It can be seen here if joined a judgment, if the estimated value and the actual value is the same, the error is zero, if not this, what will happen:
So be sure to add the IF judge .
Well, this time, we have calculated the estimated value and the loss of a single sample of the good:
The total loss model is that the average value of a single sample loss.
The next task is to update the parameters by gradient descent. First, set a learning rate:
Here too the learning rate should not be set.
Then calculating the gradient, logistic regression, the gradient of each parameter is calculated as follows:
The above equation means is that, when the j-th parameters to be updated, for each sample i, we first calculate the difference between the estimated value and its actual value, multiplied by the sample i on the j-th feature value, each subsequent samples then calculated as the average value of the gradient of the j-th argument. We can split the average of two parts, one estimate value * characteristic part is the actual value * characteristic values, therefore, we have said before sumproduct function also sent handy gradient is calculated as follows:
Each parameter perform once we get the gradient of each parameter:
Next, we have to calculate the updated parameters, calculated as:
So, in excel, the same parameters are updated (in the figure below K11, the lock should be used $):
Next, should "post-update parameters" parameter corresponding to the line, copied to the "Parameter" the line. Direct Copy does not work, there will be the following questions:
We want to select Paste values only:
Have you ever noticed, but we paste the value in the past when the "gradient" and "post-update parameters" that line has changed, yes, because of changes in the parameters of the line, leading to the gradient, estimates, loss and so changed in this case the gradient is already the next round of the gradient.
and so! To achieve constantly updated, in fact, a crucial step is to "update parameter after" the line to copy only the value "parameters" that line, but we can not manually copy it, you want more convenient, then, is to turn it into a shortcuts, one-touch update! This time you record a macro function to come!
Then we can just set the shortcut command option + e + to keep the parameters can also be found, our loss is declining. Such a simple logistic regression process is realized!
A feature we now realize, it is also relatively simple, only to optimize the parameters through a single-step operation. Like a multi-step operation, regularization, early stop, draw the loss function, and so it has not been achieved.
Recommended reading:
How scored 10 algorithm engineers offer, not to be missed!
Python Data Analysis Learning Path personal summary
Python and algorithms Community
A good-looking point