CHANG teacher machine learning course notes _ML Lecture 1: ML Lecture 1: Regression - Demo

introduction:

Recently started to learn "machine learning", the teacher had heard CHANG island country's name, has no time to watch his courses. After listening to a lesson today, I feel great, easy to understand, but can seize the key, but also the middle with some very interesting examples to deepen the impression of the students.
Video Links (bilibili): CHANG Machine Learning (2017)
In addition students have been determined to do shorthand and updates on github: CHANG machine learning notes (LeeML-Notes)
So, the next record my notes just some of my own summary and lectures at the time of confusion, if there is a friend can help me please exhibitions.

Learning machine learning, starting with the demo Xia start it, this demo is completely reproducible Li demo

import numpy as np
import matplotlib.pyplot as plt
x_data = [ 338., 333., 328., 207., 226., 25., 179., 60., 208.,  606. ]
y_data = [ 640., 633., 619., 393., 428., 27., 193., 66., 226., 1591. ]
# y_data = b + w * x_data
x = np.arange(-200, -100, 1) # bias
y = np.arange(-5, 5, 0.1) # weight
Z = np.zeros((len(x), len(y)))
X, Y = np.meshgrid(x, y)
for i in range(len(x)):
    for j in range((len(y))):
        b = x[i]
        w = y[j]
        Z[j][i] = 0
        for n in range(len(x_data)):
           Z[j][i] = Z[j][i] +(y_data[n] - b - w*x_data[n])**2
        Z[j][i] = Z[j][i]/len(x_data)
b = -129 # intialize b
w = -4 # intialize w
lr = 0.0000001 # learning rate
iteration = 100000

# Store intial values for plotting
b_history = [b]
w_history = [w]

# Iteration
for i in range(iteration):
    b_grad = 0.0
    w_grad = 0.0
    for n in range(len(x_data)):
        b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
        w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
    
    # Update parameters
    b = b - lr * b_grad
    w = w - lr * w_grad
    
    # Store the parameters for plotting
    b_history.append(b)
    w_history.append(w)

# plot the figure
plt.contour(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5,5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()

The output is:

B is the abscissa, the ordinate is w, × mark bits optimal solution, obviously, in the figure we get the optimal solution is not running, the optimal solution is very far away. Then we transfer large learning rate, lr = 0.000001 (10 times tone), to give the results shown in Figure 2.

#### change the lr to 0.000001

b = -129 # intialize b
w = -4 # intialize w
lr = 0.000001 # learning rate
iteration = 100000

# Store intial values for plotting
b_history = [b]
w_history = [w]

# Iteration
for i in range(iteration):
    b_grad = 0.0
    w_grad = 0.0
    for n in range(len(x_data)):
        b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
        w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
    
    # Update parameters
    b = b - lr * b_grad
    w = w - lr * w_grad
    
    # Store the parameters for plotting
    b_history.append(b)
    w_history.append(w)

# plot the figure
plt.contour(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5,5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()

We then transferred large learning rate, lr = 0.00001 (10 times tone), to give the results shown in Figure 3.

#### change the lr to 0.00001

b = -129 # intialize b
w = -4 # intialize w
lr = 0.00001 # learning rate
iteration = 100000

# Store intial values for plotting
b_history = [b]
w_history = [w]

# Iteration
for i in range(iteration):
    b_grad = 0.0
    w_grad = 0.0
    for n in range(len(x_data)):
        b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
        w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
    
    # Update parameters
    b = b - lr * b_grad
    w = w - lr * w_grad
    
    # Store the parameters for plotting
    b_history.append(b)
    w_history.append(w)

# plot the figure
plt.contour(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5,5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()

Start learning a set rate of 0.0000001, after 100,000 iterations, quite far away from the optimal solution found, indicating that the learning rate is too small, then the learning rate adjustment 0.000001, expanded 10 times, but this time we found that the learning rate there shock occurred, but a little better than the previous result, we are closer to the optimal solution. We then turn learning rate increases tenfold, found that the final result has exceeded the entire sheet, complete shock, and could not find the optimal solution.
The solution is: customized b, w different learning rate, this method is called AdaGrad

#### using adagrad to solve this problem

b = -129 # intialize b
w = -4 # intialize w
lr = 1 # learning rate
iteration = 100000

b_lr = 0.0
w_lr = 0.0

# Store intial values for plotting
b_history = [b]
w_history = [w]

# Iteration
for i in range(iteration):
    b_grad = 0.0
    w_grad = 0.0
    for n in range(len(x_data)):
        b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
        w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
    
    b_lr = b_lr + b_grad**2
    w_lr = w_lr + w_grad**2
    
    # Update parameters
    b = b - lr/np.sqrt(b_lr) * b_grad
    w = w - lr/np.sqrt(w_lr) * w_grad
    
    # Store the parameters for plotting
    b_history.append(b)
    w_history.append(w)

# plot the figure
plt.contour(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5,5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()

The final result is shown:

leogoforit

Released nine original articles · won praise 0 · Views 218

Private letter concerns

CHANG teacher machine learning course notes _ML Lecture 1: ML Lecture 1: Regression - Demo

introduction:

Guess you like