Logistic Regression( a very simple Neural Network)
* 声明*:转载注明出处
主要内容:
- 编程实现一个logistic regression 分类器,识别图像中是否为猫.
- 参数的初始化
- 计算cost,和cost函数的梯度
- 使用优化算法,更新权值(参数),gradient descent(梯度下降)
1. Use Packages
- numpy
- h5py : train data 存储在h5文件中
- matplotlib
- PIL (python3.x中pillow已经取代PIL,安装pillow即可)
- scipy
import h5py
import scipy
import numpy as np
from PIL import Image
from scipy import ndimage
import matplotlib.pyplot as plt
%matplotlib inline
2. Load and Overview data set
dataset保存在一个h5格式的文件中
- training data 中每个图片的label用0/1表示:
- cat (y = 1)
- non-cat (y = 0)
- trainset共有209张image,testset包含50张image
- m_train = 209
- m_test = 50
- 每张图片的大小是:(64, 64, 3),height:64,width:64,channels:3(RGB)
- image_shape = (64, 64, 3)
2.1 Load Dataset
# Function : load data
def load_data():
# train_dataset : dict
train_dataset = h5py.File('datasets/train_catvnoncat.h5', 'r')
# train data features
orig_train_x = np.array(train_dataset['train_set_x'][:])
# train data labels
train_y = np.array(train_dataset['train_set_y'][:])
# test_dataset
test_dataset = h5py.File('datasets/test_catvnoncat.h5', 'r')
# test data features
orig_test_x = np.array(test_dataset['test_set_x'][:])
# test data labels
test_y = np.array(test_dataset['test_set_y'][:])
# list of classes
classes = np.array(test_dataset['list_classes'][:])
# labels 数据维度转换(1, len(orig_train_y))
train_y = train_y.reshape((1, len(train_y)))
test_y = test_y.reshape((1, len(test_y)))
return orig_train_x, train_y, orig_test_x, test_y, classes
“orig_”: orig_train_x, orig_test_x,是images的原始像素值,后面的模型中,还会对traindata进行预处理,比如Standardize
orig_train_x, train_y , orig_test_x, test_y, classes = load_data()
2.2 Overview dataset
m_train = orig_train_x.shape[0]
m_test = orig_test_x.shape[0]
print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Each image is of size: "+str(orig_train_x[0].shape))
print ("train_set_x shape: " + str(orig_train_x.shape))
print ("train_set_y shape: " + str(train_y.shape))
print ("test_set_x shape: " + str(orig_test_x.shape))
print ("test_set_y shape: " + str(test_y.shape))
print ('classes :' + str(classes))
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
classes :[b'non-cat' b'cat']
2.3 Visualize image
- orig_train_x,是一个多维数组,orig_train_x 对应的是一个image的3D数组
# visualize an image
index = 19
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+classes[image_label].decode('utf-8') +"' picture.")
y = 1, it's a 'cat' picture.
index = 10
image_label = train_y[0][index]
plt.imshow(orig_train_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[image_label].decode('utf-8') +"' picture.")
y = 0, it's a 'non-cat' picture.
2.4 Reshape orig_data
- 把每个image的数组转换为一个(64x64x3)的行向量
- 转置.T后,matrix的列为对应的样本数量,行数为特征数
flatten_train_x = orig_train_x.reshape(orig_train_x.shape[0],-1).T
# -1 : (x.shape[0]*x.shape[1]*x.shape[2])
flatten_test_x = orig_test_x.reshape(orig_test_x.shape[0],-1).T
flatten_train_x.shape, flatten_test_x.shape
((12288, 209), (12288, 50))
2.5 Standardize
- 数据中每个元素都是image的像素值,范围在[0~255]之间,normalization有利于加速训练
-
- 已知
- 所以
normed_train_x = flatten_train_x/255.0
normed_test_x = flatten_test_x/255.0
3. General Architecture of the learning algorithm
- 实现一个Logistic Regression(简单Neural Network)
- Using a Neural Network mindset.
- 训练model,对image进行分类
算法的数学表达式
- : image的列向量,shape=(len( ), 1)
- : 输入(层)连接输入出层的权值矩阵,shape=(1, len( ))
- : 偏置
- : 激活后的输出
:
m个样本的平均损失:
4.Building algorithm
building a Neural Network 的步骤:
- 1.确定网络的结构(输入输出层神经元的数量,这个例子不包含隐藏层)
- 2.初始化模型的参数, ,
- 3.Loop:
- (forward propagation)向前传播,计算损失
- (backward propagation)反向传播,计算梯度
- (gradient descent)梯度下降,更新参数
4.1 Sigmoid function
def sigmoid(z):
# z : ccalar or numpy array
a = 1./(1.+np.exp(-z))
return a
sigmoid(np.array([1,2,3,4]))
array([0.73105858, 0.88079708, 0.95257413, 0.98201379])
4.2 Initializing parameters
- LR的权重可以初始化为全0
- 也可以随即初始化
def initialize_w(dim, s=0):
# dim :对应输入特征数量
# s=0,全0初始化
# s=1,随即初始化
if s:
w = np.random.rand(1, dim)
else:
w = np.zeros((1, dim))
b = 0
return w,b
initialize_w(5)
(array([[0., 0., 0., 0., 0.]]), 0)
initialize_w(5, 1)
(array([[0.7968392 , 0.14891652, 0.06809079, 0.9028713 , 0.11452771]]), 0)
4.3 Forward and Backward propagation
- Forward Propagation
- X:traning data matrix
- A:activate ouput m:样本的数量
- 计算cost function
- Backward Propagation
(1)
(2)
(3)
(4)
(5)
- vectorial style
# Function: propagate
# 计算cost, gradient
def propagate(w, b, X, Y):
"""
Arguments:
w : weights,a numpy array,shape(1, len(featuers))
b : bais, a scalar
X : dataset, a numpy array,shape(len(features), m)
Y : true label, a numpy array, shape(1, m)
return :
cost, dw, db
"""
m = len(X)
# forward propagation
Z = np.dot(w, X)+b
A = sigmoid(Z)
# cost
cost = -1/m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
# backward propagation
dw = np.dot((A-Y),X.T)/m
db = np.sum(A-Y)/m
grads = {'dw':dw, 'db':db}
return cost,grads
验证一下计算梯度和cost的函数是否正确
w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])
cost, grads = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
dw = [[0.99993216 1.99980262]]
db = 0.49993523062470574
cost = 6.000064773192205
正确的结果:
** dw ** | [[ 0.99993216] [ 1.99980262]] |
** db ** | 0.499935230625 |
** cost ** | 6.000064773192205 |
5.Optimization
已经完成的工作
- 参数的初始化:initialize_w(dim)
- 激活函数:sigmoid(z)
- cost和gradient的计算:propagate(w, b, X , Y)
- 接下来,使用梯度下降来更新参数w,b,
学习效率,控制梯度下降的步长:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=0):
# num_iterations : loop times
# learning_rate : control gradient descent
# print_cost : 1 print loss every 10 times
costs = []
for i in range(num_iterations):
cost , grads = propagate(w, b, X, Y)
w = w - learning_rate*grads['dw']
b = b - learning_rate*grads['db']
costs.append(cost)
if print_cost and i%10 == 0:
print ('Cost after iteration %i: %f'%(i, cost))
params = {'w':w, 'b':b}
return params, grads, costs
验证optimize函数是否正常工作
# 验证数据
w, b, X, Y = np.array([[1,2]]),2,np.array([[1,2],[3,4]]),np.array([[1,0]])
params , grads, costs = optimize(w, b, X, Y, 100, 0.009, 1)
Cost after iteration 0: 6.000065
Cost after iteration 10: 5.527691
Cost after iteration 20: 5.055445
Cost after iteration 30: 4.583458
Cost after iteration 40: 4.112002
Cost after iteration 50: 3.641644
Cost after iteration 60: 3.173575
Cost after iteration 70: 2.710307
Cost after iteration 80: 2.257084
Cost after iteration 90: 1.824430
plt.plot(costs)
print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
w = [[0.1124579 0.23106775]]
b = 1.5593049248448891
dw = [[0.90158428 1.76250842]]
db = 0.4304620716786828
正确的结果:
**w** | [[ 0.1124579 ] [ 0.23106775]] |
**b** | 1.55930492484 |
**dw** | [[ 0.90158428][ 1.76250842]] |
**db** | 0.430462071679 |
6. Predict
- 上面经过100次的迭代,Optimize()函数返回训练结束时的params
- 利用训练好的模型参赛predict:
- 1.计算
- label=1,
- label=0,
def predict(w, b, X):
Z = np.dot(w, X)+b
A = sigmoid(Z)
predicted = (A > 0.5)*1.
return predicted
验证predict()是否正确
# 验证数据
w, b, X= np.array([[1,2]]),2,np.array([[1,2],[3,4]])
print ('predicted :' + str(predict(w, b, X)))
predicted :[[1. 1.]]
正确的输出:
**predicted** | [[ 1. 1.]] |
6. Merge all functions into a model
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000,learning_rate=0.5,print_cost=0,flag=0):
w, b = initialize_w(X_train.shape[0], flag)
params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate,
print_cost)
test_predicted = predict(params['w'], params['b'], X_test)
train_predicted = predict(params['w'], params['b'], X_train)
print("train accuracy: {} %".format(100 - np.mean(np.abs(train_predicted - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(test_predicted - Y_test)) * 100))
d = {'costs':costs,
'train_predicted':train_predicted,
'test_predicted':test_predicted,
'w':params['w'],
'b':params['b'],
'lr':learning_rate,
'num_iterations':num_iterations,
}
return d
Run model
- flag = 0,全0初始化w,b
- flag = 1,随即初始化w,[-1,1]之间
flag = 0,0初始化
d = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=1)
Cost after iteration 0: 0.011789
Cost after iteration 10: 0.028177
Cost after iteration 20: 0.011830
Cost after iteration 30: 0.019752
Cost after iteration 40: 0.017107
Cost after iteration 50: 0.008967
Cost after iteration 60: 0.008224
Cost after iteration 70: 0.008620
Cost after iteration 80: 0.012857
.................................
Cost after iteration 1900: 0.001477
Cost after iteration 1910: 0.001471
Cost after iteration 1920: 0.001464
Cost after iteration 1930: 0.001458
Cost after iteration 1940: 0.001452
Cost after iteration 1950: 0.001446
Cost after iteration 1960: 0.001440
Cost after iteration 1970: 0.001434
Cost after iteration 1980: 0.001429
Cost after iteration 1990: 0.001423
train accuracy: 99.52153110047847 %
test accuracy: 70.0 %
flag = 1,随即初始化
d2 = model(normed_train_x, train_y, normed_test_x, test_y, print_cost=0, flag=1)
train accuracy: 98.08612440191388 %
test accuracy: 72.0 %
cost curve
plt.figure(figsize=(14,6))
plt.plot(d['costs'], label='flag=0')
plt.plot(d2['costs'], label='flag=1')
plt.legend()
plt.title('cost cruve lr=0.5')
Text(0.5,1,'cost cruve lr=0.5')
学习效率为0.5,所以在前期曲线震荡很厉害
模型对测试数据的预测结果
正确分类
index = 1
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")
y = 0, it's a 'cat' picture.
错误分类
index = 5
pred_label = int(d['test_predicted'][0][index])
plt.imshow(orig_test_x[index])
print ("y = "+ str(image_label) +", it's a '"+ classes[pred_label].decode('utf-8') +"' picture.")
y = 0, it's a 'cat' picture.
7 Choice of learning rate
learning_rates = [0.005, 0.01, 0.05, 0.1, 0.3, 1.0]
models = {}
for i in learning_rates:
print ("learning rate is: " + str(i))
models[str(i)] = model(normed_train_x, train_y, normed_test_x, test_y,
num_iterations = 40000, learning_rate = i, print_cost = False)
print ('\n' + "-------------------------------------------------------" + '\n')
learning rate is: 0.005
train accuracy: 94.73684210526316 %
test accuracy: 74.0 %
-------------------------------------------------------
learning rate is: 0.01
train accuracy: 97.60765550239235 %
test accuracy: 70.0 %
-------------------------------------------------------
learning rate is: 0.05
train accuracy: 100.0 %
test accuracy: 70.0 %
-------------------------------------------------------
learning rate is: 0.1
train accuracy: 100.0 %
test accuracy: 70.0 %
-------------------------------------------------------
learning rate is: 0.3
train accuracy: 100.0 %
test accuracy: 72.0 %
-------------------------------------------------------
learning rate is: 1.0
train accuracy: 100.0 %
test accuracy: 72.0 %
-------------------------------------------------------
plt.figure(figsize=(15,8))
plt.grid(True)
plt.ylabel('cost')
plt.ylim(0, 0.012)
plt.xlabel('iterations')
plt.title('0 initalize')
for i in learning_rates:
plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["lr"]))
legend = plt.legend(loc='best', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()
learning_rate的之越大,cost曲线下降的越快.
learning_rate过大,cost曲线前期会出现震荡,但是最终还是会趋于稳定.
predict picture
- 选择测试准确率跟高的模型model[‘0.3’]
def predict_my_image(fname, model):
image = np.array(ndimage.imread(fname, flatten=False))
input_x = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64*64*3)).T
predict_label = predict(model['w'], model['b'] , input_x)[0,0]
plt.imshow(image)
print('y = '+str(pred_label) +',model predict is a '+classes[int(pred_label)].decode('utf-8'))
predict_my_image('images/my_image2.jpg',models['0.3'])
y = 1,model predict is a cat
predict_my_image('images/my_image.jpg', models['0.3'])
y = 1,model predict is a non-cat
predict_my_image('images/cat_in_iran.jpg',models['0.3'])
y = 1,model predict is a cat