Logistic Regression on MNIST dataset

Objective: To achieve a logistic regression algorithm, and centralized view of the effect of learner in MNIST data.

Minist data set

Data Type: handwritten digital data set, 0 to 9 ten categories. Each image is $ 28times28times1 $ grayscale image.

image_1crc2rjnn18vv1e7bak573j1vca9.png-5.2kB

Scale data sets: a training set of 60,000 samples, test set of 10,000 samples.

Since there are 10 classes, may be employed OvR multi sorting process, may be employed to do the encoding EOCO MvM.

Logistic regression achieve

Logistic regression models

image_1crc33ceu1h5b15ns1vj212gj1j15m.png-95.8kB

Using logistic regression regarded as a Sigmoid activation function of the neural network elements.

For a sample $ boldsymbol {x} _i $:

Our model consists of two parts: the linear part and activation function:

If $ hat {y} _ileq0.5 $, compared counterexample; if $ hat {y} _i> 0.5 $, compared to positive cases.

$ Hat {y} $ visualized positive probability the sample belongs to the class.

Loss function

Single variable $ boldsymbol {x} _i $ loss function (log-likelihood):

When $ y_i = 0 $, i.e. $ boldsymbol {x} _i $ belonging to a like logarithmic loss of $ ln (1-hat {y } _i) $;
when $ y_i = 1 $, i.e. $ boldsymbol {x } _i $ n belonging to the class, the number of loss of $ lnhat {y} _i $.

For the loss function for all training samples

Now we want parameter $ (boldsymbol {w} ^ {*}, b ^ {*}) $ to satisfy

Gradient descent

Object: to find the optimal parameters $ boldsymbol {w} ^ {*} $ and $ b ^ {*} $, we minimize the objective function $ J $.

  • Initialization $ boldsymbol {w}, b $ zero
    logarithmic function is a convex function, any value can be initialized to find the optimal solution by gradient descent.
1
2
w = np.zeros((X_train.shape[0], 1))
b = 0

进行num_iterations次梯度下降:

  • 用当前的$boldsymbol{w}$和$b$计算$hat{boldsymbol{y}}$
1
2

y_hat = sigmoid(np.dot(w.T, X_train) + b)
  • 计算损失函数:
1
2
3
# 计算损失函数
cost = -1.0 / m_train * np.sum(Y_train * np.log(y_hat) + (1 - Y_train) * np.log(1 - y_hat))
cost = np.squeeze(cost)
  • 计算损失函数对w和b的偏导数
1
2
3
# 求梯度
dw = 1.0 / m_train * np.dot(X_train, (y_hat - Y_train).T)
db = 1.0 / m_train * np.sum(y_hat - Y_train)
  • 梯度下降获得新的w,b
1
2
3
# 梯度下降
w = w - learning_rate * dw
b = b - learning_rate * db

经过num_iterations次梯度下降就得到我们最后的$boldsymbol{w}^{*}$和$b^{*}$了。
(当然不一定是完全收敛好的解,这些还需要我们去挑一些参数。)
将$boldsymbol{w}^{*}$和$b^{*}$带入,就得到我们模型的最终的$hat{boldsymbol{y}}$

1
2
3
4
# 训练集上的predict y_hat
y_hat_train = sigmoid(np.dot(w.T, X_train) + b)
# 测试集上的predict y_hat
y_hat_test = sigmoid(np.dot(w.T, X_test) + b)

$hat{boldsymbol{y}}$算是属于正类的概率。
根据我们的阈值0.5,可以进一步的到最终的预测值(1表示正类,0表示反类)

1
2
3
4
y_prediction_train = np.zeros((1, m_train))
y_prediction_train[y_hat_train > 0.5] = 1
y_prediction_test = np.zeros((1, m_test))
y_prediction_test[y_hat_test > 0.5] = 1

总的模型函数如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def (X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
"""
逻辑回归模型

参数:
X_train -- np数组(num_px * num_px, m_train)
Y_train -- np数组(1, m_train)
X_test -- np数组(num_px * num_px, m_test)
Y_test -- np数组(1, m_test)
num_iterations -- 超参数 迭代次数
learning_rate -- 超参数 学习率
print_cost -- Set to true to print the cost every 100 iterations

"""

# 训练集样本数
m_train = X_train.shape[1]
# 测试集样本数
m_test = X_test.shape[1]

# 初始化w和b为0
# w 权重, np数组(num_px * num_px, 1)
w = np.zeros((X_train.shape[0], 1))
b = 0

# 新建一个数组,实时记录损失函数(我们的优化目标),后面可以画出来看看梯度下降的效果
costs = []
# 进行num_iterations次迭代,每次迭代算一次梯度下降
for i in range(num_iterations):


A = sigmoid(np.dot(w.T, X_train) + b)
# 计算损失函数
cost = -1.0 / m_train * np.sum(Y_train * np.log(A) + (1 - Y_train) * np.log(1 - A))
cost = np.squeeze(cost)

# 求梯度
dw = 1.0 / m_train * np.dot(X_train, (A - Y_train).T)
db = 1.0 / m_train * np.sum(A - Y_train)

# 梯度下降
w = w - learning_rate * dw
b = b - learning_rate * db

# 记录一下损失
if i % 100 == 0:
costs.append(cost)

# 每一百次迭代打印一次损失

if print_cost and i % 100 == 0:
print("Cost after iteration %i: %f" % (i, cost))

w = w.reshape(-1, 1)

# 训练集上的predict y_hat
y_hat_train = sigmoid(np.dot(w.T, X_train) + b)
# 测试集上的predict y_hat
y_hat_test = sigmoid(np.dot(w.T, X_test) + b)
# 训练集上的predict 类别
y_prediction_train = np.zeros((1, m_train))
y_prediction_train[y_hat_train > 0.5] = 1
# 测试集上的predict 类别
y_prediction_test = np.zeros((1, m_test))
y_prediction_test[y_hat_test > 0.5] = 1

d = {"costs": costs,
"Y_prediction_test": y_prediction_test,
"Y_prediction_train": y_prediction_train,
"Y_hat_test": y_hat_test,
"Y_hat_train": y_hat_train,
"w": w,
"b": b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}

return d

After importing MNIST data set, using the idea of ​​training OvM 10 classifiers. Classify it.

Each classifier is a category for positive cases, additional nine categories as negative.
Thus $ n Example: Example trans = 1: 9 $, although the number of positive cases is far less than the number of counter-examples, and does not produce the problem of unbalanced categories, as this ratio is substantially the ratio of the real distribution data is generated.

See complete code here

Original: Big Box  Logistic Regression on MNIST dataset


Guess you like

Origin www.cnblogs.com/petewell/p/11585149.html