## Logistic Regression on MNIST dataset

Objective: To achieve a logistic regression algorithm, and centralized view of the effect of learner in MNIST data.

## Minist data set

Data Type: handwritten digital data set, 0 to 9 ten categories. Each image is \$ 28times28times1 \$ grayscale image.

Scale data sets: a training set of 60,000 samples, test set of 10,000 samples.

Since there are 10 classes, may be employed OvR multi sorting process, may be employed to do the encoding EOCO MvM.

## Logistic regression achieve

### Logistic regression models

Using logistic regression regarded as a Sigmoid activation function of the neural network elements.

For a sample \$ boldsymbol {x} _i \$:

Our model consists of two parts: the linear part and activation function:

If \$ hat {y} _ileq0.5 \$, compared counterexample; if \$ hat {y} _i> 0.5 \$, compared to positive cases.

\$ Hat {y} \$ visualized positive probability the sample belongs to the class.

### Loss function

Single variable \$ boldsymbol {x} _i \$ loss function (log-likelihood):

When \$ y_i = 0 \$, i.e. \$ boldsymbol {x} _i \$ belonging to a like logarithmic loss of \$ ln (1-hat {y } _i) \$;
when \$ y_i = 1 \$, i.e. \$ boldsymbol {x } _i \$ n belonging to the class, the number of loss of \$ lnhat {y} _i \$.

For the loss function for all training samples

Now we want parameter \$ (boldsymbol {w} ^ {*}, b ^ {*}) \$ to satisfy

Object: to find the optimal parameters \$ boldsymbol {w} ^ {*} \$ and \$ b ^ {*} \$, we minimize the objective function \$ J \$.

• Initialization \$ boldsymbol {w}, b \$ zero
logarithmic function is a convex function, any value can be initialized to find the optimal solution by gradient descent.

• 用当前的\$boldsymbol{w}\$和\$b\$计算\$hat{boldsymbol{y}}\$
• 计算损失函数:
• 计算损失函数对w和b的偏导数
• 梯度下降获得新的w，b

（当然不一定是完全收敛好的解，这些还需要我们去挑一些参数。）

\$hat{boldsymbol{y}}\$算是属于正类的概率。

After importing MNIST data set, using the idea of ​​training OvM 10 classifiers. Classify it.

Each classifier is a category for positive cases, additional nine categories as negative.
Thus \$ n Example: Example trans = 1: 9 \$, although the number of positive cases is far less than the number of counter-examples, and does not produce the problem of unbalanced categories, as this ratio is substantially the ratio of the real distribution data is generated.

See complete code here

### Guess you like

Origin www.cnblogs.com/petewell/p/11585149.html
Recommended
Ranking
Daily