Tensorflow neural network logic understanding
Review the logical flow of training a neural network
.1 Create neurons
For each neuron:
fw ⃗ , b ( x ⃗ ) = w ⃗ ⋅ x ⃗ + b f_{\vec{w},b}(\vec{x}) = \vec{w}\vec{x } + bfw,b(x)=w⋅x+b
For example, the first neuron in layer 1:
a ⃗ 1 [ 1 ] = g ( fw ⃗ 1 [ 1 ] , b 1 [ 1 ] ( x ⃗ ) ) \vec{a}^{[1]}_1 = g(f_{\vec{w}^{[1]}_1,b^{[1]}_1}(\vec{x}))a1[1]=g(fw1[1],b1[1](x))
Related blog post links:
[Machine Learning] P11 Neural Network
[Machine Learning] P12 Forward Propergation
[Machine Learning] P14 Tensorflow User Guide Dense Sequential Tensorflow Implementation
.2 Calculation of loss value
In a neural network, the loss value refers to the difference between the predicted value and the actual value of the output layer of each training sample:
loss = − ylog ( fw ⃗ , b ( x ⃗ ) ) − ( 1 − y ) log ( 1 − fw ⃗ , b ( x ⃗ ) ) loss = -ylog(f_{\vec{w},b}(\vec{x})) - (1-y)log(1-f_{\vec{w}, b}(\vec{x}))loss=−ylog(fw,b(x))−(1−y)log(1−fw,b(x))
The sum of the loss values of all training samples is:
J ( w ⃗ , b ) = 1 m ∑ i = 1 mloss J(\vec{w},b) = \frac 1 m \sum ^{m} _{i= 1} lossJ(w,b)=m1i=1∑mloss
Related blog post links:
[Machine Learning] P6 Logistic Regression Loss Function and Gradient Descent
.3 Gradient descent training model
Through the gradient descent method, continuously optimize the parameters of the training model ( w ⃗ , b \vec{w},bw,b ), so that the loss valueJ ( w ⃗ , b ) J(\vec{w},b)J(w,b ) Minimal, i.e. the prediction is as accurate as possible:
Update w ⃗ \vec{w}w
wj = wj − α ∂ J ( w ⃗ , b ) ∂ wj w_j = w_j - \alpha \frac {\partial J(\vec{w},b)}{\partial w_j}wj=wj−a∂wj∂J(w,b)
update bbb
b = b − α ∂ J ( w ⃗ , b ) ∂ b b = b - \alpha \frac {\partial J(\vec{w},b)} {\partial b} b=b−a∂b∂J(w,b)
其中:
∂ J ( w ⃗ , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w ⃗ , b ( x ⃗ [ i ] ) − y [ i ] ) x j [ i ] ∂ J ( w ⃗ , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w ⃗ , b ( x ⃗ [ i ] ) − y [ i ] ) \frac {\partial J(\vec{w},b)} {\partial w_j} = \frac 1 m \sum ^{m-1} _{i=0}(f_{\vec{w},b}(\vec{x}^{[i]})-y^{[i]})x^{[i]}_j \\ \frac {\partial J(\vec{w},b)} {\partial b} = \frac 1 m \sum ^{m-1} _{i=0}(f_{\vec{w},b}(\vec{x}^{[i]})-y^{[i]}) ∂wj∂J(w,b)=m1i=0∑m−1(fw,b(x[i])−y[i])xj[i]∂b∂J(w,b)=m1i=0∑m−1(fw,b(x[i])−y[i])
The specific neural network update method is called back propergation, which will be described in detail in subsequent blog posts. The link is as follows:
xxxxxxxx
Python implementation
Related blog posts: [Machine Learning] P9 implements a logistic regression case from beginning to end
.1 Create neurons
def sigmoid(z):
f_x = 1/(1 + np.exp(-z))
return f_x
z = np.dot(w,x) + b
f_x = sigmoid(z)
.2 Calculation of loss value
def compute_cost(X,y,w,b):
m = X.shape[0]
cost = 0.
for i in range(m):
f_x_i = sigmoid(np.dot(w,X[i]) + b)
loss = -y * np.log(f_x_i) - (1 - y) * np.log(1 - f_x_i)
cost += loss
cost = cost / m
return cost
.3 Gradient descent training model
def compute_gradient(X,y,w,b):
m = X.shape[0]
dj_dw = np.zeros(w.shape)
dj_db = 0.
for i in range(m):
f_wb = sigmoid(np.dot(w, X[i]) + b)
cost = f_wb - y[i]
dj_db += cost
dj_dw += cost * X[i]
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_dw, dj_db
def gradient_descent(X,y,w_in,b_in,cost_function,gradient_function,alpha,num_iters):
m = len(X)
J_history = []
w_history = []
for i in range(num_iters):
dj_db, dj_dw = gradient_function(X, y, w_in, b_in)
w_in = w_in - alpha * dj_dw # alpha 为学习率
b_in = b_in - alpha * dj_db
if i<100000:
cost = cost_function(X, y, w_in, b_in)
J_history.append(cost)
if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):
w_history.append(w_in)
print(f"Iteration {
i:4}: Cost {
float(J_history[-1]):8.2f} ")
return w_in, b_in, J_history, w_history
Tensorflow implementation
.1 Create neurons
model = Sequential([
Dense(units = 25, activation="sigmoid"),
Dense(units = 15, activation="sigmoid"),
Dense(units = 1, activation="sigmoid")
])
.2 Calculation of loss value
model.compile(
loss = BinaryCrossentropy()
)
.3 Gradient descent training model
model.fit(X,y,epochs=100)