tensorflow(2): neural network optimization (loss, learning_rate)

Case: Predict the daily sales of yogurt, so that the output can be prepared, so that the loss is small (the profit is large). Assuming that the sales volume is y, there are two factors x1, x2 that affect the sales volume. 
Data needs to be collected in advance, the daily x1, x2 and Sales y_, fake data set X, Y_, assume y_=x1+x2, add a noise (-0.05-0.05) in order to be more realistic

batch_size=8 #how much data to feed the neural network at a time
seed=23455
#Construct data set 
rdm=np.random.RandomState(seed) #Generate random numbers based on seed 
X=rdm.rand(32,2) # 32 sets of data 
Y_=[[x1+x2+(rdm.rand()/10.0- 0.05)] for (x1,x2) in X] # rdm.rand()/10.0 is a (0,1) random number 
print ( ' X:\n ' ,X)
 print ( ' Y_:\n ' ,Y_)
 #Define the input parameters of the neural network and output to define the forward propagation process x= tf.placeholder 
(tf.float32,shape=(None,2 ))
y_ =tf.placeholder(tf.float32,shape=(None,1)) #Qualified or unqualified features 
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1 ))
y = tf.matmul(x,w1) #Define   
 the loss function back propagation method loss_mse 
=tf.reduce_mean(tf.square(y_- y))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
#train_step=tf.train.MomentumOptimizer(0.001,0.9).minimize(loss) #其他方法
#train_step=tf.train.AdamOptimizer(0.001).minimize(loss)   

#Generate session training     
with tf.Session() as sess: #Use session calculation results 
    init_op= tf.global_variables_initializer()
    sess.run (init_op)   
    print ( ' w1:\n ' , sess.run(w1)) #Output the current (untrained) parameter values 
​​#training model 
    steps=20000 #training 20000 times 
    for i in range(steps):
        start=(i*batch_size) %32
        end=start+batch_size
        sess.run(train_step,feed_dict ={x:X[start:end],y_:Y_[start:end]}) # 8 sets of data 
        if i % 500==0:   #Print w1 value every 500 rounds 
            print ( ' After %d training steps,w1 is: ' % i)
             print (sess.run(w1), ' \n ' )
     print ( ' Final w1 is: ' ,sess.run(w1))
    
The result shows w1 = [0.98, 1.015] which is consistent with Y_=x1+x2, the prediction is correct!!

1.loss optimization

In the above example, the loss function is the sum of mean squares, but in practice, the loss caused by the difference between the predicted sales volume (ie the output to be prepared y_) and the actual sales volume y depends on the production cost cost and the sales profit profit,

When forecasting too much, lose cost, forecast less, lose profit,

So here to customize the loss, the above code remains unchanged, only need to modify the loss parameter

 

batch_size=8 #How much data is fed to the neural network at a time 
seed=23455 
cost =1 
profit =9
 #Construct a dataset 
rdm=np.random.RandomState(seed) #Generate random numbers based on seed 
X=rdm.rand(32,2) # 32 sets of data 
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X] # rdm.rand()/10.0 is a (0,1) random number 
print ( ' X:\n ' ,X)
 print ( ' Y_:\n ' ,Y_)
 #Define the input parameters and output of the neural network to define the forward propagation process x= tf.placeholder 
(tf.float32,shape=(None,2 ) )
y_ =tf.placeholder(tf.float32,shape=(None,1)) #Qualified or unqualified features 
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1 ))
y = tf.matmul (x,w1) #Define   
 loss function back propagation method 
loss=tf.reduce_sum(tf.where(tf.greater(y,y_),cost*(y-y_),profit*(y_- y))) #tf.where(tf.greater(y,y_),cost*(y-y_),profit*(y_- y)) The first one represents a logical value, and the second item is taken when it is true # Take the third item when it is false 
train_step =tf.train.GradientDescentOptimizer(0.001 ).minimize(loss)
 # train_step=tf.train.MomentumOptimizer(0.001,0.9).minimize(loss) #other methods 
# train_step=tf.train. AdamOptimizer(0.001).minimize(loss)   


#Generate session training with tf.Session() as sess: #Use session calculation results init_op= tf.global_variables_initializer() sess.run (init_op) print ( ' w1:\n ' , sess.run(w1)) #Output the current (untrained) parameter values ​​#training model steps=20000 #training 3000 times for i in range(steps): start=(i*batch_size) %32 end=start+batch_size sess.run(train_step,feed_dict ={x:X[start:end],y_:Y_[start:end]}) # 8 sets of data if i % 500==0: #Print the result every 500 rounds print ( ' After %d training steps,w1 is: ' % i) print (sess.run(w1), ' \n ' ) print ( ' Final w1 is: ' ,sess.run(w1))

The result is w1=[1.02,1.04], and the coefficients are all greater than 1. This is because the loss of more predictions is less than the loss of less predictions, and the result is also predicted in the direction of more. If cost=9, profit=1,

Finally, the parameters w1=[0.96, 0.97] are all less than 1 

In addition to the sum of mean squares, as long as the custom loss, there is a third loss of cross entropy,

Cross entropy: Represents the distance between two probability distributions, the greater the ce, the greater the distance.

 

Cross entropy ce
ce=-tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-12,1.0)))
n outputs for n categories

ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem=tf.reduce_mean(ce)

2.learning_rate learning rate adjustment

设置learning_rate=0.2

w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss =tf.square(w+1) #Define loss function train_step 
=tf.train.GradientDescentOptimizer(0.2 ).minimize(loss)

with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run (init_op)
    for i in range(40): #Iterate 40 times 
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print('after %d steps, w is %f, loss is %f.'%(i,w_val,loss_val))

 If the above learning_rate=0.001, w tends to -1, the speed is very slow, learning_rate=1, w will diverge and will not tend to -1 

The above learning_rate cannot be too large or too small


#Learning rate determines the update of parameters learning_rate # Determines the dynamic update learning rate according to the number of running rounds      
learning_rate=LEARNING_RATE_BASE*LEARNING_RATE_DECAY^(global_step/ LEARNING_RATE_STEP)
 #LEARNING_RATE_BASE is the initial value of the learning rate LEARNING_RATE_DECAY is the learning decay rate (0,1) 
#How many rounds Update the learning rate once, LEARNING_RATE_STEP is generally the total number of samples/batch_size 
global_step=tf.Variable(0,trainable=False) #Record how many rounds of batch-size have been run currently 
learning_rate= tf.train.exponential_decay(LEARNING_RATE_BASE,
                                        global_step, LEARNING_RATE_STEP,
                                        LEARNING_RATE_DECAY,
                                        staircase =True) 
# staircase true is the learning rate staircase decay, false is a smooth descending curve
#Determine the dynamic update learning rate according to the number of running rounds      
LEARNING_RATE_BASE=0.1 #Initial learning rate 
LEARNING_RATE_DECAY=0.9 #Learning rate decay rate 
LEARNING_RATE_STEP=1 #Update the learning rate after feeding how much batch_size, usually the total number of samples/batch_size, here for The convenience is 1 

global_step =tf.Variable(0,trainable=False) #Record how many rounds are currently running batch-size #Define exponential decline learning rate 
learning_rate= 
tf.train.exponential_decay (LEARNING_RATE_BASE,
                                        global_step, LEARNING_RATE_STEP,
                                        LEARNING_RATE_DECAY,
                                        staircase=True)

#Optimize parameters, initial value 5 
w=tf.Variable(tf.constant(5,dtype= tf.float32))
loss=tf.square(w+1) #定义损失函数
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run (init_op)
    for i in range(40): #Iterate 40 times 
        sess.run(train_step)
        learning_rate_val=sess.run(learning_rate)
        global_step_val=sess.run(global_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print('after %d steps, global_step is %f,w is %f, loss is %f.' %(i,global_step_val,w_val,loss_val))

result:

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325032475&siteId=291194637