Logistic Regression( a very simple Neural Network)

* 声明*:转载注明出处
主要内容:

编程实现一个logistic regression 分类器,识别图像中是否为猫.
参数的初始化
计算cost,和cost函数的梯度
使用优化算法,更新权值(参数),gradient descent(梯度下降)

1. Use Packages

numpy
h5py : train data 存储在h5文件中
matplotlib
PIL (python3.x中pillow已经取代PIL,安装pillow即可)
scipy

import h5py
import scipy
import numpy as np
from PIL import Image
from scipy import ndimage
import matplotlib.pyplot as plt
%matplotlib inline

2. Load and Overview data set

dataset保存在一个h5格式的文件中

training data 中每个图片的label用0/1表示:
- cat (y = 1)
- non-cat (y = 0)
trainset共有209张image,testset包含50张image
- m_train = 209
- m_test = 50
每张图片的大小是:(64, 64, 3),height:64,width:64,channels:3(RGB)
- image_shape = (64, 64, 3)

2.1 Load Dataset

# Function : load data
def load_data():
    # train_dataset : dict
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', 'r')
    # train data features
    orig_train_x = np.array(train_dataset['train_set_x'][:])
    # train data labels
    train_y = np.array(train_dataset['train_set_y'][:])
    # test_dataset
    test_dataset = h5py.File('datasets/test_catvnoncat.h5', 'r')
    # test data features
    orig_test_x = np.array(test_dataset['test_set_x'][:])
    # test data labels
    test_y = np.array(test_dataset['test_set_y'][:])
    # list of classes
    classes = np.array(test_dataset['list_classes'][:])
    # labels 数据维度转换(1, len(orig_train_y))
    train_y = train_y.reshape((1, len(train_y)))
    test_y = test_y.reshape((1, len(test_y)))
    return orig_train_x, train_y, orig_test_x, test_y, classes

“orig_”: orig_train_x, orig_test_x,是images的原始像素值,后面的模型中,还会对traindata进行预处理,比如Standardize

orig_train_x, train_y , orig_test_x, test_y, classes = load_data()

2.2 Overview dataset

m_train = orig_train_x.shape[0]
m_test = orig_test_x.shape[0]

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Each image is of size: "+str(orig_train_x[0].shape))
print ("train_set_x shape: " + str(orig_train_x.shape))
print ("train_set_y shape: " + str(train_y.shape))
print ("test_set_x shape: " + str(orig_test_x.shape))
print ("test_set_y shape: " + str(test_y.shape))
print ('classes :' + str(classes))

Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
classes :[b'non-cat' b'cat']

2.3 Visualize image

orig_train_x,是一个多维数组,orig_train_x $[i]$ 对应的是一个image的3D数组

# visualize an image 
index = 19
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+classes[image_label].decode('utf-8') +"' picture.")

y = 1, it's a 'cat' picture.

这里写图片描述

index = 10
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[image_label].decode('utf-8') +"' picture.")

y = 0, it's a 'non-cat' picture.

这里写图片描述

2.4 Reshape orig_data

把每个image的数组转换为一个(64x64x3)的行向量
转置.T后,matrix的列为对应的样本数量,行数为特征数

flatten_train_x = orig_train_x.reshape(orig_train_x.shape[0],-1).T
# -1 : (x.shape[0]*x.shape[1]*x.shape[2])
flatten_test_x = orig_test_x.reshape(orig_test_x.shape[0],-1).T

flatten_train_x.shape, flatten_test_x.shape

((12288, 209), (12288, 50))

2.5 Standardize

数据中每个元素都是image的像素值,范围在[0~255]之间,normalization有利于加速训练
$n o r m a l i z e d_p i x e l = \frac{p i x e l - p i x e l_{m i n}}{p i x e l_{m a x}}$ $normalized\_pixel = \frac{pixel - pixel_{min}}{pixel_{max}}$
已知 $pixel_{min} = 0, pixel_{max} = 255$
所以 $normed\_pixel = \frac{pixel - 0}{255.0}$

normed_train_x = flatten_train_x/255.0
normed_test_x = flatten_test_x/255.0

3. General Architecture of the learning algorithm

实现一个Logistic Regression(简单Neural Network)
Using a Neural Network mindset.
训练model,对image进行分类

算法的数学表达式

z^{(i)} = w x^{(i)} + b

$z^{(i)} = wx^{(i)}+b$

{\hat{y}}^{(i)} = a^{(i)} = s i g m o i d (z^{(i)})

$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})$

L ({\hat{y}}^{(i)}, y^{(i)}) = - y^{(i)} l o g ({\hat{y}}^{(i)}) - (1 - y^{(i)}) l o g (1 - \hat{(i)})

$\mathcal{L}(\hat{y}^{(i)}, y^{(i)}) = -y^{(i)}log(\hat{y}^{(i)})-(1-y^{(i)})log(1-\hat{(i)})$

$x^{(i)}$ : image的列向量,shape=(len( $x^{(i)}$ ), 1)
$w$ : 输入(层)连接输入出层的权值矩阵，shape=(1, len( $x^{(i)}$ ))
$b$ : 偏置
$\hat{y}^{(i)},a^{(i)}$ : 激活后的输出
$sigmoid(z^{(i)}$ :

$s i g m o d (z^{(i)}) = \frac{1}{1 + e^{- (z^{(i)})}}$ $sigmod(z^{(i)}) = \frac{1}{1 + e^{-(z^{(i)})}}$
m个样本的平均损失：

$J = \frac{1}{m} \sum_{i = 1}^{m} L (a^{(i)}, y^{(i)})$ $J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})$

４．Building algorithm

building a Neural Network 的步骤：

１．确定网络的结构(输入输出层神经元的数量，这个例子不包含隐藏层）
２．初始化模型的参数， $w$ , $b$
３．Loop:
- (forward propagation)向前传播，计算损失 $L$
- (backward propagation)反向传播，计算梯度 $d_w, d_b$
- (gradient descent)梯度下降，更新参数
  - $b = b - \alpha d_b$

４.1 Sigmoid function

$sigmoid(wx+b) = \frac{1}{1+e^{-(wx+b)}}$

def sigmoid(z):
    # z : ccalar or numpy array
    a = 1./(1.+np.exp(-z))
    return a

sigmoid(np.array([1,2,3,4]))

array([0.73105858, 0.88079708, 0.95257413, 0.98201379])

4.2 Initializing parameters

LR的权重可以初始化为全０
也可以随即初始化

def initialize_w(dim, s=0):
    # dim ：对应输入特征数量
    # s=0,全０初始化
    # s=1,随即初始化
    if s:
        w = np.random.rand(1, dim)
    else:
        w = np.zeros((1, dim))
    b = 0
    return w,b

initialize_w(5)

(array([[0., 0., 0., 0., 0.]]), 0)

initialize_w(5, 1)

(array([[0.7968392 , 0.14891652,  0.06809079, 0.9028713 , 0.11452771]]), 0)

4.3 Forward and Backward propagation

Forward Propagation
- X:traning data matrix $Z = WX+b$
- A:activate ouput 　 $A = \sigma(Z) = (a^{(0)},a^{(1)},.....a^{(m)})$ m:样本的数量
- 计算cost function 　　 $J=\frac{-1}{m}\sum_{i=1}^m y^{(i)}log(a^{(i)})+(1-y^{(i)})log(1-a^{(i)})$

Backward Propagation

(1)
$d_{w} = \frac{\partial j}{\partial w} = \frac{\partial j}{\partial a} . \frac{\partial a}{\partial z} . \frac{\partial z}{\partial w}$ $d_w = \frac{\partial j}{\partial w} = \frac{\partial j}{\partial a}. \frac{\partial a}{\partial z}. \frac{\partial z}{\partial w}$
$d_{b} = \frac{\partial j}{\partial w} = \frac{\partial j}{\partial a} . \frac{\partial a}{\partial z} . \frac{\partial z}{\partial b}$ $d_b = \frac{\partial j}{\partial w} = \frac{\partial j}{\partial a}. \frac{\partial a}{\partial z}. \frac{\partial z}{\partial b}$
(2)
$\frac{\partial j}{\partial a} = \frac{- y}{a} . \frac{1 - y}{1 - a}$ $\frac{\partial j}{\partial a} = \frac{-y}{a}. \frac{1-y}{1-a}$
(3)
$\frac{\partial a}{\partial z} = a . (1 - a)$ $\frac{\partial a}{\partial z} = a.(1-a)$
(4)
$\frac{\partial z}{\partial w} = x$ $\frac{\partial z}{\partial w} = x$
(5)
$d_{w} = \frac{- y}{a} . \frac{1 - y}{1 - 1} . a . (1 - a) . x = (a - y) x$ $d_w = \frac{-y}{a}. \frac{1-y}{1-1}.a.(1-a).x = (a -y)x$
$d_{b} = a - y$ $d_b = a -y$

vectorial style
$\frac{\partial j}{\partial w} = \frac{1}{m} X (A - Y)$ $\frac{\partial j}{\partial w} = \frac{1}{m}X(A-Y)$
$\frac{\partial j}{\partial b} = \frac{1}{m} \sum_{i = 1}^{m} (a^{(i)} - y^{(i)})$ $\frac{\partial j}{\partial b} = \frac{1}{m}\sum_{i=1}^m(a^{(i)}-y^{(i)})$

# Function: propagate
# 计算cost, gradient
def propagate(w, b, X, Y):
    """
    Arguments:
    w : weights,a numpy array,shape(1, len(featuers))
    b : bais, a scalar
    X : dataset, a numpy array,shape(len(features), m)
    Y : true label, a numpy array, shape(1, m)
    return :
    cost, dw, db
    """
    m = len(X)
    # forward propagation
    Z = np.dot(w, X)+b
    A = sigmoid(Z)
    # cost 
    cost = -1/m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
    # backward propagation
    dw = np.dot((A-Y),X.T)/m
    db = np.sum(A-Y)/m
    grads = {'dw':dw, 'db':db}
    return cost,grads

验证一下计算梯度和cost的函数是否正确

w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])

cost, grads = propagate(w, b, X, Y)

print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

dw = [[0.99993216 1.99980262]]
db = 0.49993523062470574
cost = 6.000064773192205

正确的结果:

dw	[[ 0.99993216] [ 1.99980262]]
db	0.499935230625
cost	6.000064773192205

5.Optimization

已经完成的工作

参数的初始化：initialize_w(dim)
激活函数：sigmoid(z)
cost和gradient的计算：propagate(w, b, X , Y)
接下来，使用梯度下降来更新参数w,b, 学习效率，控制梯度下降的步长:
- $w = w - \alpha d_w$
- $b = b - \alpha d_b$

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=0):
    # num_iterations : loop times
    # learning_rate : control gradient descent
    # print_cost : 1 print loss every 10 times
    costs = []
    for i in range(num_iterations):
        cost , grads = propagate(w, b, X, Y)
        w = w - learning_rate*grads['dw']
        b = b - learning_rate*grads['db']
        costs.append(cost)
        if print_cost and i%10 == 0:
            print ('Cost after iteration %i: %f'%(i, cost))
    params = {'w':w, 'b':b}
    return params, grads, costs

验证optimize函数是否正常工作

# 验证数据
w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])
params , grads, costs = optimize(w, b, X, Y, 100, 0.009, 1)

Cost after iteration 0: 6.000065
Cost after iteration 10: 5.527691
Cost after iteration 20: 5.055445
Cost after iteration 30: 4.583458
Cost after iteration 40: 4.112002
Cost after iteration 50: 3.641644
Cost after iteration 60: 3.173575
Cost after iteration 70: 2.710307
Cost after iteration 80: 2.257084
Cost after iteration 90: 1.824430

plt.plot(costs)

这里写图片描述

print ("w  = " + str(params["w"]))
print ("b  = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

w  = [[0.1124579  0.23106775]]
b  = 1.5593049248448891
dw = [[0.90158428 1.76250842]]
db = 0.4304620716786828

正确的结果:

w	[[ 0.1124579 ] [ 0.23106775]]
b	1.55930492484
dw	[[ 0.90158428][ 1.76250842]]
db	0.430462071679

6. Predict

上面经过１００次的迭代，Optimize()函数返回训练结束时的params
利用训练好的模型参赛predict:
- 1.计算　 $\hat{Y} = A = \sigma(wX+b)$
- $y^{(i)} > 0.5, x^{(i)}$ label=1,
- $y^{(i)} <= 0.5, x^{(i)}$ label=0,

def predict(w, b, X):
    Z = np.dot(w, X)+b
    A = sigmoid(Z)
    predicted = (A > 0.5)*1.
    return predicted

验证predict()是否正确

# 验证数据
w, b, X= np.array([[1,2]]),2,np.array([[1,2],[3,4]])
print ('predicted :' + str(predict(w, b, X)))

predicted :[[1. 1.]]

正确的输出:

**predicted**

[[ 1. 1.]]

6. Merge all functions into a model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000,learning_rate=0.5,print_cost=0,flag=0):
    w, b = initialize_w(X_train.shape[0], flag)
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate,
                                    print_cost)
    test_predicted = predict(params['w'], params['b'], X_test)
    train_predicted = predict(params['w'], params['b'], X_train)
    print("train accuracy: {} %".format(100 - np.mean(np.abs(train_predicted - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(test_predicted - Y_test)) * 100))

    d = {'costs':costs,
         'train_predicted':train_predicted,
         'test_predicted':test_predicted,
         'w':params['w'],
         'b':params['b'],
         'lr':learning_rate,
         'num_iterations':num_iterations,
        }
    return d

Run model

flag = 0,全０初始化w,b
flag = 1,随即初始化w,[-1,1]之间

flag = 0,0初始化

d = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=1)

Cost after iteration 0: 0.011789
Cost after iteration 10: 0.028177
Cost after iteration 20: 0.011830
Cost after iteration 30: 0.019752
Cost after iteration 40: 0.017107
Cost after iteration 50: 0.008967
Cost after iteration 60: 0.008224
Cost after iteration 70: 0.008620
Cost after iteration 80: 0.012857
.................................
Cost after iteration 1900: 0.001477
Cost after iteration 1910: 0.001471
Cost after iteration 1920: 0.001464
Cost after iteration 1930: 0.001458
Cost after iteration 1940: 0.001452
Cost after iteration 1950: 0.001446
Cost after iteration 1960: 0.001440
Cost after iteration 1970: 0.001434
Cost after iteration 1980: 0.001429
Cost after iteration 1990: 0.001423
train accuracy: 99.52153110047847 %
test accuracy: 70.0 %

flag = 1，随即初始化

d2 = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=0, flag=1)

train accuracy: 98.08612440191388 %
test accuracy: 72.0 %

cost curve

plt.figure(figsize=(14,6))
plt.plot(d['costs'], label='flag=0')
plt.plot(d2['costs'], label='flag=1')
plt.legend()

plt.title('cost cruve lr=0.5')

Text(0.5,1,'cost cruve lr=0.5')

这里写图片描述

学习效率为0.5,所以在前期曲线震荡很厉害

模型对测试数据的预测结果

正确分类

index = 1
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")

y = 0, it's a 'cat' picture.

　这里写图片描述

错误分类

index = 5
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")

y = 0, it's a 'cat' picture.

　这里写图片描述

７ Choice of learning rate

learning_rates = [0.005, 0.01, 0.05, 0.1, 0.3, 1.0]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(normed_train_x, train_y, normed_test_x, test_y,
                           num_iterations = 40000, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

learning rate is: 0.005
train accuracy: 94.73684210526316 %
test accuracy: 74.0 %

-------------------------------------------------------

learning rate is: 0.01
train accuracy: 97.60765550239235 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.05
train accuracy: 100.0 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.1
train accuracy: 100.0 %
test accuracy: 70.0 %

-------------------------------------------------------

learning rate is: 0.3
train accuracy: 100.0 %
test accuracy: 72.0 %

-------------------------------------------------------

learning rate is: 1.0
train accuracy: 100.0 %
test accuracy: 72.0 %

-------------------------------------------------------

plt.figure(figsize=(15,8))
plt.grid(True)
plt.ylabel('cost')
plt.ylim(0, 0.012)
plt.xlabel('iterations')
plt.title('0 initalize')
for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["lr"]))

legend = plt.legend(loc='best', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

这里写图片描述

learning_rate的之越大，cost曲线下降的越快．
learning_rate过大，cost曲线前期会出现震荡，但是最终还是会趋于稳定．

predict picture

选择测试准确率跟高的模型model[‘0.3’]

def predict_my_image(fname, model):
    image = np.array(ndimage.imread(fname, flatten=False))
    input_x = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64*64*3)).T
    predict_label = predict(model['w'], model['b'] , input_x)[0,0]
    plt.imshow(image)
    print('y = '+str(pred_label) +',model predict is a '+classes[int(pred_label)].decode('utf-8'))

predict_my_image('images/my_image2.jpg',models['0.3'])

y = 1,model predict is a cat

　这里写图片描述

predict_my_image('images/my_image.jpg', models['0.3'])

y = 1,model predict is a non-cat

　这里写图片描述

predict_my_image('images/cat_in_iran.jpg',models['0.3'])

　y = 1,model predict is a cat

这里写图片描述

DL_C1_week_2_2(Logistic Regression)

Logistic Regression( a very simple Neural Network)

1. Use Packages

2. Load and Overview data set

3. General Architecture of the learning algorithm

４．Building algorithm

5.Optimization

6. Predict

6. Merge all functions into a model

Run model

７ Choice of learning rate

predict picture

猜你喜欢