梯度下降(SSE)
反向传播
正则化
权重大,梯度下降过快,容易过拟合,
加一个对高weights的惩罚项,lambda为惩罚项系数
L1,L2正则
L1正则:
- 具有稀疏性,可降低weights,让较小的weights趋于0,得到较小的值
- 具有特征选择的功能,将不重要的特征置为0
L2正则:
- 不具有稀疏性,保证所有权值都较小(倾向于选择平方和更小的权值)
- 一般训练模型具有更好的效果
Dropout
给定参数,即每个训练过程中随机关闭每个节点的概率,防止weights过大的节点对网络训练产生过大影响,而weights小的节点难以得到有效训练
局部最小值
- 随机初始化重新开始
梯度消失
sigmoid函数在weight较大时,梯度近似于0,可以换一个激活函数:1. RELU; 2. tanh
batch和随机梯度下降
数据集过多时,可以将数据分为多个batch,采用随机梯度下降的方式,多个不精确的step效果好于一个精确的step
momentum
momentum是(0,1)之间的常量β,与step的关系如下:距离当前越近的step权值越大,越远的越小,为1/β
扫描二维码关注公众号,回复:
11518230 查看本文章
实战项目:预测共享单车项目
class NeuralNetwork(object):
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes
# Initialize weights
self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5,
(self.input_nodes, self.hidden_nodes))
self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))
self.lr = learning_rate
#### TODO: Set self.activation_function to your implemented sigmoid function ####
#
# Note: in Python, you can define a function with a lambda expression,
# as shown below.
self.activation_function = lambda x : 1/(1+np.exp(-x)) # Replace 0 with your sigmoid calculation.
### If the lambda code above is not something you're familiar with,
# You can uncomment out the following three lines and put your
# implementation there instead.
#
#def sigmoid(x):
# return 0 # Replace 0 with your sigmoid calculation here
#self.activation_function = sigmoid
def train(self, features, targets):
''' Train the network on batch of features and targets.
Arguments
---------
features: 2D array, each row is one data record, each column is a feature
targets: 1D array of target values
'''
n_records = features.shape[0]
delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)
delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)
for X, y in zip(features, targets):
#### Implement the forward pass here ####
### Forward pass ###
# TODO: Hidden layer - Replace these values with your calculations.
hidden_inputs = np.dot(X,self.weights_input_to_hidden) # signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer
# TODO: Output layer - Replace these values with your calculations.
final_inputs = np.dot(hidden_outputs,self.weights_hidden_to_output) # signals into final output layer
final_outputs = final_inputs # signals from final output layer
#### Implement the backward pass here ####
### Backward pass ###
# TODO: Output error - Replace this value with your calculations.
error = y-final_outputs # Output layer error is the difference between desired target and actual output.
# TODO: Calculate the hidden layer's contribution to the error
hidden_error = np.dot(self.weights_hidden_to_output,error)
# TODO: Backpropagated error terms - Replace these values with your calculations.
output_error_term = error * 1
hidden_error_term =hidden_error * hidden_outputs * (1-hidden_outputs)
# Weight step (input to hidden)
delta_weights_i_h += hidden_error_term * X[:,None]
# Weight step (hidden to output)
delta_weights_h_o += output_error_term * hidden_outputs[:,None]
# TODO: Update the weights - Replace these values with your calculations.
self.weights_hidden_to_output += self.lr * delta_weights_h_o/n_records # update hidden-to-output weights with gradient descent step
self.weights_input_to_hidden += self.lr * delta_weights_i_h/n_records # update input-to-hidden weights with gradient descent step