tensorflow.keras入门3-回归

波士顿房价数据集
波士顿数据集是一个回归问题。每个类的观察值数量是均等的，共有 506 个观察，13 个输入变量和1个输出变量。每条数据包含房屋以及房屋周围的详细信息。其中包含城镇犯罪率，一氧化氮浓度，住宅平均房间数，到中心区域的加权距离以及自住房平均房价等等。
但是对于回归问题，需要读取数据后需要将数据集打散，代码如下：

boston_housing = keras.datasets.boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
#打散数据集
order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]

数据集标签展示：

import pandas as pd
column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD','TAX', 'PTRATIO', 'B', 'LSTAT']
df = pd.DataFrame(train_data, columns=column_names)
df.head()

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.07875	45.0	3.44	0.437	6.782	41.1	3.7886	5.0	398.0	15.2	393.87	6.68
1	4.55587	0.0	18.10	0.718	3.561	87.9	1.6132	24.0	666.0	20.2	354.70	7.12
2	0.09604	40.0	6.41	0.447	6.854	42.8	4.2673	4.0	254.0	17.6	396.90	2.98
3	0.01870	85.0	4.15	0.429	6.516	27.7	8.5353	4.0	351.0	17.9	392.43	6.36
4	0.52693	0.0	6.20	0.504	8.725	83.0	2.8944	8.0	307.0	17.4	382.00	4.63

数据归一化
据的标准化（normalization）是将数据按比例缩放，使之落入一个小的特定区间。在某些比较和评价的指标处理中经常会用到，去除数据的单位限制，将其转化为无量纲的纯数值，便于不同单位或量级的指标能够进行比较和加权。

#z-score 标准化
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std

模型训练和预测—模型建立和训练
模型建立的通用模式为网络结构确定（网络层数，节点数，输入，输出）、模型训练参数确定（损失函数，优化器、评价标准)、模型训练（训练次数，批次大小）

#z-score 标准化
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std
#模型建立函数
def build_model():
  model = keras.Sequential([
    keras.layers.Dense(64, activation=tf.nn.relu, 
                       input_shape=(train_data.shape[1],)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])
  optimizer = tf.train.RMSPropOptimizer(0.001)
 
  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae']) #平均绝对误差
  return model
#建立模型
model = build_model()
#模型结构显示
model.summary()

模型的训练代码如下:

# 回调函数
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self,epoch,logs):
    if epoch % 100 == 0: print('')
    print('.', end='')
EPOCHS = 500
#模型训练
history = model.fit(train_data, train_labels, epochs=EPOCHS,
                    validation_split=0.2, verbose=1, #verbose训练过程显示
                    callbacks=[PrintDot()]) #取测试集中的百分之20作为验证集

模型预测
调用history函数可以实现训练过程的可视化

#模型损失函数展示
def plot_history(history):
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [1000$]')
  plt.plot(history.epoch, np.array(history.history['mean_absolute_error']), 
           label='Train Loss')
  plt.plot(history.epoch, np.array(history.history['val_mean_absolute_error']),
           label = 'Val loss')
  plt.legend()
  plt.ylim([0,5])
plot_history(history)

为了提前停止训练，可以通过设置回调函数EarlyStopping设置训练停止条件。

#停止条件设置，即验证集损失连续20次训练没有变化，即停止训练
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
 
history = model.fit(train_data, train_labels, epochs=EPOCHS,
                    validation_split=0.2, verbose=0,
                    callbacks=[early_stop, PrintDot()])
 
plot_history(history)

模型预测代码如下：

test_predictions = model.predict(test_data).flatten()
print(test_predictions)

总体代码如下：

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
# 其他库
import numpy as np
import matplotlib.pyplot as plt
#查看版本
print(tf.__version__)
boston_housing = keras.datasets.boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
#打散数据集
order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]
import pandas as pd
column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD','TAX', 'PTRATIO', 'B', 'LSTAT']
df = pd.DataFrame(train_data, columns=column_names)
df.head()
#z-score 标准化
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std
#模型建立函数
def build_model():
  model = keras.Sequential([
    keras.layers.Dense(64, activation=tf.nn.relu, 
                       input_shape=(train_data.shape[1],)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])
 
  optimizer = tf.train.RMSPropOptimizer(0.001)
 
  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae']) #平均绝对误差
  return model
#建立模型
model = build_model()
#模型结构显示
model.summary()
# 回调函数
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self,epoch,logs):
    if epoch % 100 == 0: print('')
    print('.', end='') 
EPOCHS = 500 
#模型训练
history = model.fit(train_data, train_labels, epochs=EPOCHS,
                    validation_split=0.2, verbose=1, #verbose训练过程显示
                    callbacks=[PrintDot()]) #取测试集中的百分之20作为验证集
#模型损失函数展示
def plot_history(history):
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [1000$]')
  plt.plot(history.epoch, np.array(history.history['mean_absolute_error']), 
           label='Train Loss')
  plt.plot(history.epoch, np.array(history.history['val_mean_absolute_error']),
           label = 'Val loss')
  plt.legend()
  plt.ylim([0,5])
plot_history(history)
model = build_model()
#停止条件设置，即验证集损失连续20次训练没有变化，即停止训练
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = model.fit(train_data, train_labels, epochs=EPOCHS,
                    validation_split=0.2, verbose=0,
                    callbacks=[early_stop, PrintDot()])
 
plot_history(history)
test_predictions = model.predict(test_data).flatten()
print(test_predictions)

对于回归问题的官方总结：
1.均方误差(MSE)是一种常见的用于回归问题损失函数。
2.平均绝对误差(MAE)也是一种常用评价指标而不是精度。
3.对于输入数据，归一化是十分必要的。
4.训练数据较少，则模型结构较小更合适，防止过拟合。
5.提前停止是防止过拟合的好办法。

tensorflow.keras入门3

tensorflow.keras入门3-回归

猜你喜欢