The principle of gradient descent method and python pseudo code

Gradient Descent

The blogger’s understanding: y = wx + b, if the gap between y and the real is too large, it means that w has too much effect. At this time, we use the gradient to control it. (simple understanding)

What is a loss function

Used to measure the performance of a model on any given data. The loss function quantifies the error between the predicted value and the expected value and expresses it in the form of a single real number. The whole process is to initialize w, then calculate the predicted value y, and then calculate the loss function. In order to minimize the loss function, use the gradient descent method to modify the parameters of the given w.
insert image description here

So what is the gradient descent method?

The blogger's understanding: If you are on Mount Everest and you want to go down to the bottom of the mountain, the process of going down the mountain at this time is the process of the gradient descent method. The slope of the hillside is the gradient, and the pace of each step is the learning rate. The gradient descent method is an iterative optimization algorithm for finding the local minimum of a function.
insert image description here
The size of the learning rate requires us to choose carefully:

  • lr is too large, it may oscillate at the bottom
  • lr is too small, slow convergence
    insert image description here
local minimum

In fact, in real life, most of the problems we encounter are non-convex problems, that is, there are many minimum points. If the lr setting is too small, it is easy to get stuck in the minimum point and never come out again. This is why we need The optimizer needs the momentum parameter! We can use the lr of the previous moment to help us break out of the current minimum value area

Gradient descent method python code implementation

def train(X, y, W, B, alpha, max_iters):
	
	dW = 0                      # 梯度
	db = 0
	m = X.shape[0]              # 训练样本第一个数据,这里是拉平的,第一个维度为批次
	for i in range(max_iters):
		dW = 0
		db = 0
		for j in range(m):   # 对每一个标量进行梯度下降法
			W = W - alpha * (dW / m)
			B = B - alpha * (dB / m)
	return W, B

Guess you like

Origin blog.csdn.net/weixin_45074568/article/details/124916545