Objective: To achieve a logistic regression algorithm, and centralized view of the effect of learner in MNIST data.
Minist data set
Data Type: handwritten digital data set, 0 to 9 ten categories. Each image is $ 28times28times1 $ grayscale image.
Scale data sets: a training set of 60,000 samples, test set of 10,000 samples.
Since there are 10 classes, may be employed OvR multi sorting process, may be employed to do the encoding EOCO MvM.
Logistic regression achieve
Logistic regression models
Using logistic regression regarded as a Sigmoid activation function of the neural network elements.
For a sample $ boldsymbol {x} _i $:
Our model consists of two parts: the linear part and activation function:
If $ hat {y} _ileq0.5 $, compared counterexample; if $ hat {y} _i> 0.5 $, compared to positive cases.
$ Hat {y} $ visualized positive probability the sample belongs to the class.
Loss function
Single variable $ boldsymbol {x} _i $ loss function (loglikelihood):
When $ y_i = 0 $, i.e. $ boldsymbol {x} _i $ belonging to a like logarithmic loss of $ ln (1hat {y } _i) $;
when $ y_i = 1 $, i.e. $ boldsymbol {x } _i $ n belonging to the class, the number of loss of $ lnhat {y} _i $.
For the loss function for all training samples
Now we want parameter $ (boldsymbol {w} ^ {*}, b ^ {*}) $ to satisfy
Gradient descent
Object: to find the optimal parameters $ boldsymbol {w} ^ {*} $ and $ b ^ {*} $, we minimize the objective function $ J $.
 Initialization $ boldsymbol {w}, b $ zero
logarithmic function is a convex function, any value can be initialized to find the optimal solution by gradient descent.
1 
w = np.zeros((X_train.shape[0], 1)) 
进行num_iterations次梯度下降：
 用当前的$boldsymbol{w}$和$b$计算$hat{boldsymbol{y}}$
1 

 计算损失函数:
1 
# 计算损失函数 
 计算损失函数对w和b的偏导数
1 
# 求梯度 
 梯度下降获得新的w，b
1 
# 梯度下降 
经过num_iterations次梯度下降就得到我们最后的$boldsymbol{w}^{*}$和$b^{*}$了。
（当然不一定是完全收敛好的解，这些还需要我们去挑一些参数。）
将$boldsymbol{w}^{*}$和$b^{*}$带入，就得到我们模型的最终的$hat{boldsymbol{y}}$
1 
# 训练集上的predict y_hat 
$hat{boldsymbol{y}}$算是属于正类的概率。
根据我们的阈值0.5，可以进一步的到最终的预测值（1表示正类，0表示反类）
1 
y_prediction_train = np.zeros((1, m_train)) 
总的模型函数如下
1 
def (X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False): 
After importing MNIST data set, using the idea of training OvM 10 classifiers. Classify it.
Each classifier is a category for positive cases, additional nine categories as negative.
Thus $ n Example: Example trans = 1: 9 $, although the number of positive cases is far less than the number of counterexamples, and does not produce the problem of unbalanced categories, as this ratio is substantially the ratio of the real distribution data is generated.
See complete code here
Original: Big Box Logistic Regression on MNIST dataset