DL_C1_week_2_2(Logistic Regression)

Logistic Regression( a very simple Neural Network)

* 声明*:转载注明出处
主要内容:

  • 编程实现一个logistic regression 分类器,识别图像中是否为猫.
  • 参数的初始化
  • 计算cost,和cost函数的梯度
  • 使用优化算法,更新权值(参数),gradient descent(梯度下降)

1. Use Packages

  • numpy
  • h5py : train data 存储在h5文件中
  • matplotlib
  • PIL (python3.x中pillow已经取代PIL,安装pillow即可)
  • scipy
import h5py
import scipy
import numpy as np
from PIL import Image
from scipy import ndimage
import matplotlib.pyplot as plt
%matplotlib inline

2. Load and Overview data set

dataset保存在一个h5格式的文件中

  • training data 中每个图片的label用0/1表示:
    • cat (y = 1)
    • non-cat (y = 0)
  • trainset共有209张image,testset包含50张image
    • m_train = 209
    • m_test = 50
  • 每张图片的大小是:(64, 64, 3),height:64,width:64,channels:3(RGB)
    • image_shape = (64, 64, 3)

2.1 Load Dataset

# Function : load data
def load_data():
    # train_dataset : dict
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', 'r')
    # train data features
    orig_train_x = np.array(train_dataset['train_set_x'][:])
    # train data labels
    train_y = np.array(train_dataset['train_set_y'][:])
    # test_dataset
    test_dataset = h5py.File('datasets/test_catvnoncat.h5', 'r')
    # test data features
    orig_test_x = np.array(test_dataset['test_set_x'][:])
    # test data labels
    test_y = np.array(test_dataset['test_set_y'][:])
    # list of classes
    classes = np.array(test_dataset['list_classes'][:])
    # labels 数据维度转换(1, len(orig_train_y))
    train_y = train_y.reshape((1, len(train_y)))
    test_y = test_y.reshape((1, len(test_y)))
    return orig_train_x, train_y, orig_test_x, test_y, classes

“orig_”: orig_train_x, orig_test_x,是images的原始像素值,后面的模型中,还会对traindata进行预处理,比如Standardize

orig_train_x, train_y , orig_test_x, test_y, classes = load_data()

2.2 Overview dataset

m_train = orig_train_x.shape[0]
m_test = orig_test_x.shape[0]

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Each image is of size: "+str(orig_train_x[0].shape))
print ("train_set_x shape: " + str(orig_train_x.shape))
print ("train_set_y shape: " + str(train_y.shape))
print ("test_set_x shape: " + str(orig_test_x.shape))
print ("test_set_y shape: " + str(test_y.shape))
print ('classes :' + str(classes))
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
classes :[b'non-cat' b'cat']

2.3 Visualize image

  • orig_train_x,是一个多维数组,orig_train_x [ i ] 对应的是一个image的3D数组
# visualize an image 
index = 19
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+classes[image_label].decode('utf-8') +"' picture.")
y = 1, it's a 'cat' picture.

这里写图片描述

index = 10
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[image_label].decode('utf-8') +"' picture.")
y = 0, it's a 'non-cat' picture.

这里写图片描述

2.4 Reshape orig_data

  • 把每个image的数组转换为一个(64x64x3)的行向量
  • 转置.T后,matrix的列为对应的样本数量,行数为特征数
flatten_train_x = orig_train_x.reshape(orig_train_x.shape[0],-1).T
# -1 : (x.shape[0]*x.shape[1]*x.shape[2])
flatten_test_x = orig_test_x.reshape(orig_test_x.shape[0],-1).T
flatten_train_x.shape, flatten_test_x.shape
((12288, 209), (12288, 50))

2.5 Standardize

  • 数据中每个元素都是image的像素值,范围在[0~255]之间,normalization有利于加速训练
  • n o r m a l i z e d _ p i x e l = p i x e l p i x e l m i n p i x e l m a x
  • 已知 p i x e l m i n = 0 , p i x e l m a x = 255
  • 所以 n o r m e d _ p i x e l = p i x e l 0 255.0
normed_train_x = flatten_train_x/255.0
normed_test_x = flatten_test_x/255.0

3. General Architecture of the learning algorithm

  • 实现一个Logistic Regression(简单Neural Network)
  • Using a Neural Network mindset.
  • 训练model,对image进行分类

算法的数学表达式

z ( i ) = w x ( i ) + b

y ^ ( i ) = a ( i ) = s i g m o i d ( z ( i ) )

L ( y ^ ( i ) , y ( i ) ) = y ( i ) l o g ( y ^ ( i ) ) ( 1 y ( i ) ) l o g ( 1 ( i ) ^ )

  • x ( i ) : image的列向量,shape=(len( x ( i ) ), 1)
  • w : 输入(层)连接输入出层的权值矩阵,shape=(1, len( x ( i ) ))
  • b : 偏置
  • y ^ ( i ) , a ( i ) : 激活后的输出
  • s i g m o i d ( z ( i ) :

    s i g m o d ( z ( i ) ) = 1 1 + e ( z ( i ) )

  • m个样本的平均损失:

    J = 1 m i = 1 m L ( a ( i ) , y ( i ) )

4.Building algorithm

building a Neural Network 的步骤:

  • 1.确定网络的结构(输入输出层神经元的数量,这个例子不包含隐藏层)
  • 2.初始化模型的参数, w , b
  • 3.Loop:
    • (forward propagation)向前传播,计算损失 L
    • (backward propagation)反向传播,计算梯度 d w , d b
    • (gradient descent)梯度下降,更新参数 w = w α d w
      • b = b α d b

4.1 Sigmoid function

s i g m o i d ( w x + b ) = 1 1 + e ( w x + b )

def sigmoid(z):
    # z : ccalar or numpy array
    a = 1./(1.+np.exp(-z))
    return a
sigmoid(np.array([1,2,3,4]))
array([0.73105858, 0.88079708, 0.95257413, 0.98201379])

4.2 Initializing parameters

  • LR的权重可以初始化为全0
  • 也可以随即初始化
def initialize_w(dim, s=0):
    # dim :对应输入特征数量
    # s=0,全0初始化
    # s=1,随即初始化
    if s:
        w = np.random.rand(1, dim)
    else:
        w = np.zeros((1, dim))
    b = 0
    return w,b
initialize_w(5)
(array([[0., 0., 0., 0., 0.]]), 0)
initialize_w(5, 1)
(array([[0.7968392 , 0.14891652,  0.06809079, 0.9028713 , 0.11452771]]), 0)

4.3 Forward and Backward propagation

  • Forward Propagation
    • X:traning data matrix Z = W X + b
    • A:activate ouput   A = σ ( Z ) = ( a ( 0 ) , a ( 1 ) , . . . . . a ( m ) ) m:样本的数量
    • 计算cost function    J = 1 m i = 1 m y ( i ) l o g ( a ( i ) ) + ( 1 y ( i ) ) l o g ( 1 a ( i ) )

  • Backward Propagation

    (1)
    d w = j w = j a . a z . z w

    d b = j w = j a . a z . z b

    (2)
    j a = y a . 1 y 1 a

    (3)
    a z = a . ( 1 a )

    (4)
    z w = x

    (5)
    d w = y a . 1 y 1 1 . a . ( 1 a ) . x = ( a y ) x

    d b = a y

  • vectorial style
    j w = 1 m X ( A Y )

    j b = 1 m i = 1 m ( a ( i ) y ( i ) )
# Function: propagate
# 计算cost, gradient
def propagate(w, b, X, Y):
    """
    Arguments:
    w : weights,a numpy array,shape(1, len(featuers))
    b : bais, a scalar
    X : dataset, a numpy array,shape(len(features), m)
    Y : true label, a numpy array, shape(1, m)
    return :
    cost, dw, db
    """
    m = len(X)
    # forward propagation
    Z = np.dot(w, X)+b
    A = sigmoid(Z)
    # cost 
    cost = -1/m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
    # backward propagation
    dw = np.dot((A-Y),X.T)/m
    db = np.sum(A-Y)/m
    grads = {'dw':dw, 'db':db}
    return cost,grads

验证一下计算梯度和cost的函数是否正确

w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])

cost, grads = propagate(w, b, X, Y)

print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
dw = [[0.99993216 1.99980262]]
db = 0.49993523062470574
cost = 6.000064773192205

正确的结果:

** dw ** [[ 0.99993216] [ 1.99980262]]
** db ** 0.499935230625
** cost ** 6.000064773192205

5.Optimization


已经完成的工作

  • 参数的初始化:initialize_w(dim)
  • 激活函数:sigmoid(z)
  • cost和gradient的计算:propagate(w, b, X , Y)
  • 接下来,使用梯度下降来更新参数w,b, α 学习效率,控制梯度下降的步长:
    • w = w α d w
    • b = b α d b
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=0):
    # num_iterations : loop times
    # learning_rate : control gradient descent
    # print_cost : 1 print loss every 10 times
    costs = []
    for i in range(num_iterations):
        cost , grads = propagate(w, b, X, Y)
        w = w - learning_rate*grads['dw']
        b = b - learning_rate*grads['db']
        costs.append(cost)
        if print_cost and i%10 == 0:
            print ('Cost after iteration %i: %f'%(i, cost))
    params = {'w':w, 'b':b}
    return params, grads, costs

验证optimize函数是否正常工作

# 验证数据
w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])
params , grads, costs = optimize(w, b, X, Y, 100, 0.009, 1)
Cost after iteration 0: 6.000065
Cost after iteration 10: 5.527691
Cost after iteration 20: 5.055445
Cost after iteration 30: 4.583458
Cost after iteration 40: 4.112002
Cost after iteration 50: 3.641644
Cost after iteration 60: 3.173575
Cost after iteration 70: 2.710307
Cost after iteration 80: 2.257084
Cost after iteration 90: 1.824430
plt.plot(costs)

这里写图片描述

print ("w  = " + str(params["w"]))
print ("b  = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
w  = [[0.1124579  0.23106775]]
b  = 1.5593049248448891
dw = [[0.90158428 1.76250842]]
db = 0.4304620716786828

正确的结果:

**w** [[ 0.1124579 ] [ 0.23106775]]
**b** 1.55930492484
**dw** [[ 0.90158428][ 1.76250842]]
**db** 0.430462071679

6. Predict

  • 上面经过100次的迭代,Optimize()函数返回训练结束时的params
  • 利用训练好的模型参赛predict:
    • 1.计算  Y ^ = A = σ ( w X + b )
    • y ( i ) > 0.5 , x ( i ) label=1,
    • y ( i ) <= 0.5 , x ( i ) label=0,
def predict(w, b, X):
    Z = np.dot(w, X)+b
    A = sigmoid(Z)
    predicted = (A > 0.5)*1.
    return predicted

验证predict()是否正确

# 验证数据
w, b, X= np.array([[1,2]]),2,np.array([[1,2],[3,4]])
print ('predicted :' + str(predict(w, b, X)))
predicted :[[1. 1.]]

正确的输出:

**predicted** [[ 1. 1.]]

6. Merge all functions into a model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000,learning_rate=0.5,print_cost=0,flag=0):
    w, b = initialize_w(X_train.shape[0], flag)
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate,
                                    print_cost)
    test_predicted = predict(params['w'], params['b'], X_test)
    train_predicted = predict(params['w'], params['b'], X_train)
    print("train accuracy: {} %".format(100 - np.mean(np.abs(train_predicted - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(test_predicted - Y_test)) * 100))

    d = {'costs':costs,
         'train_predicted':train_predicted,
         'test_predicted':test_predicted,
         'w':params['w'],
         'b':params['b'],
         'lr':learning_rate,
         'num_iterations':num_iterations,
        }
    return d

Run model

  • flag = 0,全0初始化w,b
  • flag = 1,随即初始化w,[-1,1]之间

flag = 0,0初始化

d = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=1)
Cost after iteration 0: 0.011789
Cost after iteration 10: 0.028177
Cost after iteration 20: 0.011830
Cost after iteration 30: 0.019752
Cost after iteration 40: 0.017107
Cost after iteration 50: 0.008967
Cost after iteration 60: 0.008224
Cost after iteration 70: 0.008620
Cost after iteration 80: 0.012857
.................................
Cost after iteration 1900: 0.001477
Cost after iteration 1910: 0.001471
Cost after iteration 1920: 0.001464
Cost after iteration 1930: 0.001458
Cost after iteration 1940: 0.001452
Cost after iteration 1950: 0.001446
Cost after iteration 1960: 0.001440
Cost after iteration 1970: 0.001434
Cost after iteration 1980: 0.001429
Cost after iteration 1990: 0.001423
train accuracy: 99.52153110047847 %
test accuracy: 70.0 %

flag = 1,随即初始化

d2 = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=0, flag=1)
train accuracy: 98.08612440191388 %
test accuracy: 72.0 %

cost curve

plt.figure(figsize=(14,6))
plt.plot(d['costs'], label='flag=0')
plt.plot(d2['costs'], label='flag=1')
plt.legend()

plt.title('cost cruve lr=0.5')
Text(0.5,1,'cost cruve lr=0.5')

这里写图片描述

学习效率为0.5,所以在前期曲线震荡很厉害

模型对测试数据的预测结果

正确分类

index = 1
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")
y = 0, it's a 'cat' picture.

 这里写图片描述

错误分类

index = 5
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")
y = 0, it's a 'cat' picture.

 这里写图片描述

7 Choice of learning rate

learning_rates = [0.005, 0.01, 0.05, 0.1, 0.3, 1.0]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(normed_train_x, train_y, normed_test_x, test_y,
                           num_iterations = 40000, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')
learning rate is: 0.005
train accuracy: 94.73684210526316 %
test accuracy: 74.0 %

-------------------------------------------------------

learning rate is: 0.01
train accuracy: 97.60765550239235 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.05
train accuracy: 100.0 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.1
train accuracy: 100.0 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.3
train accuracy: 100.0 %
test accuracy: 72.0 %

-------------------------------------------------------

learning rate is: 1.0
train accuracy: 100.0 %
test accuracy: 72.0 %

-------------------------------------------------------
plt.figure(figsize=(15,8))
plt.grid(True)
plt.ylabel('cost')
plt.ylim(0, 0.012)
plt.xlabel('iterations')
plt.title('0 initalize')
for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["lr"]))

legend = plt.legend(loc='best', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

这里写图片描述

learning_rate的之越大,cost曲线下降的越快.
learning_rate过大,cost曲线前期会出现震荡,但是最终还是会趋于稳定.

predict picture

  • 选择测试准确率跟高的模型model[‘0.3’]
def predict_my_image(fname, model):
    image = np.array(ndimage.imread(fname, flatten=False))
    input_x = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64*64*3)).T
    predict_label = predict(model['w'], model['b'] , input_x)[0,0]
    plt.imshow(image)
    print('y = '+str(pred_label) +',model predict is a '+classes[int(pred_label)].decode('utf-8'))
predict_my_image('images/my_image2.jpg',models['0.3'])
y = 1,model predict is a cat

 这里写图片描述

predict_my_image('images/my_image.jpg', models['0.3'])
y = 1,model predict is a non-cat

 这里写图片描述

predict_my_image('images/cat_in_iran.jpg',models['0.3'])
 y = 1,model predict is a cat

这里写图片描述

猜你喜欢

转载自blog.csdn.net/u014281392/article/details/80288648