TF Multivariate linear regression --- Boston housing problem

Linear regression diversity

Compared with the previous linear regression: models are linear, but the input feature dimensions is multidimensional, it should be realized prediction data for multidimensional linear mappings. This will be an example for Boston prices multidimensional linear model, training, and forecasting.

the data shows

  1. Total of 506 samples, wherein each sample consisted of 12;
  2. Csv format data in the storage, the performance of the row 13 in a csv 507; the column first line names, the first 12 columns represent characteristics, the last column shows the tag value.

Read data:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.utils import shuffle

df = pd.read_csv("Boston.csv")
print(df.describe())
print(df)

The code for loading achieved csv data. The main module is the use of the pandas, pandas is based on open source python third-party libraries, loading may be used including various forms of data including csv, and when the data may be loaded csv column index; using the code the penultimate line, can be obtained for each column csv file summary parameters, including the most value, the mean and the like; csv all data using the last row may be printed out in the form of a table. And, pandas used to form further transformed into np, in order to further processing, so the module loading numpy.

data preparation

# 将pandas文件转化为numpy格式,因为后续数据分离以及归一化对是基于numpy
df = df.values
df = np.array(df)
print(df)  # 打印所有数据

# 数据归一化,已知有13列数据,前12列是特征,最后一列是标签
# 数据归一化是为了压缩各个特征的取值范围,有利于loss的收敛
# 但是注意是对特征信息给归一化,不要压缩标签值
for i in range(12):
    df[:, i] = df[:, i]/(df[:, i].max()-df[:, i].min())

# 分离csv数据
x_data = df[:, :12]  # 取前12列
y_data = df[:, 12]  # 取第13列

csv data read after the first format into np, to do after the processing matrix. Due to the different characteristics of each actual influence the final result, so be quick and effective convergence of data normalization, compression of its range, it is a good training when. Note, however, normalized against the input value, not to tag values ​​are normalized.

Construction of model

# 建立模型
# 此处的x即为对于一个样本的概述,一个样本有12个特征;为了适应batchsize,此处不指明有几个样本;None可以为1,也可以是更大的
# y同理,每个标签只有一个维度
x = tf.placeholder(tf.float32, [None, 12], name='X')
y = tf.placeholder(tf.float32, [None, 1], name='Y')

with tf.name_scope("Model"):  # 封装权重、偏差、loss以及前向计算于一个命名空间
    w = tf.Variable(tf.random_normal([12, 1], stddev=0.01), name='W')
    b = tf.Variable(1.0, name='b')


    def model(x, w, b):  # 定义前行计算模型
        return(tf.matmul(x,w)+b)


    preb = model(x, w, b)  # 建立前向计算操作

The above code for multidimensional linear model, and establish a corresponding forward calculation operation; the above code using a namespace, which is a somewhat similar namespace c ++ is, except that, the effect as used herein not only more readable code on a namespace on graph showing a subgraph, i.e. in the computation graph has a good readability. Focus NONE appreciated placeholder, i.e. the number of samples entered is not fixed, which will have an important role in batch training.

Preparation before training

# 设置超参
train_epochs = 200
learning_rate = 0.01
logdir = 'E:/log'

with tf.name_scope("Loss_Function"):  # 封损失函数在此命名空间
    loss_function = tf.reduce_mean(tf.pow(y-preb, 2))

# 优化器
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss_function)
# 将loss记录在graph上
sum_loss_op = tf.summary.scalar("loss", loss_function)
merged = tf.summary.merge_all()

To set a good training before the Super parameters, the model here is simple, hyperparametric include only the number of rounds of training and learning rate; and to establish the loss of function and optimizer for the model; and, for the loss in value is displayed on the graph, this at a defined time sum_loss_op, since it is summary.scalar tensorboard displayed in the scalar, it is noted to achieve the display, merge operation is necessary, especially in a plurality of data to be displayed tensorboard, then this step is required to have of.

Training

# 启动会话
sess = tf.Session()
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter(logdir,sess.graph)  # 记录日志
print('***********TRAIN***********')
loss_list = []  # loss统计值集合
for epoch in range(train_epochs):  # 逐轮次训练
    loss_sum = 0.0  # 统计每轮训练的loss值
    for xs, ys in zip(x_data, y_data):  # 从训练数据与标签值中对应抽取数据
        xs = xs.reshape(1, 12)  # feeddict前一定要确保尺寸与占位符规定的一致
        ys = ys.reshape(1, 1)
        _, summary_str, loss = sess.run([optimizer, sum_loss_op, loss_function], feed_dict={x: xs, y: ys})# 进行优化、记录loss,计算loss三个操作
        loss_sum = loss_sum+loss  # 统计每轮训练的loss值
        writer.add_summary(summary_str, epoch)  # 将记录loss的返回值与轮数对应绘制图像
    xvalues, yvaules = shuffle(x_data, y_data)  # 每次训练打乱原数据集

    btemp = b.eval(session=sess)  # 获取此轮训练得到的b值
    wtemp = w.eval(session=sess)  # 获取本轮训练得到的w值
    loss_average = loss_sum/len(y_data)  # 计算本轮评价loss值
    loss_list.append(loss_average)  # 记录本轮评价loss值
    # 打印本轮训练结果
    print("epoch:", epoch+1, "loss: ", loss_average, "b: ", btemp, "w: ", wtemp) 

The code implemented for training of the model; Note Before feed_dict, to a predetermined dimension to ensure consistent feed placeholder value; After each upset the training data set to be, in order to prevent the model to take back the answer , in other words to prevent the model as a sequence characteristics were learning.

Model Application

# 检验模型
n = np.random.randint(506)
print(n)
x_test = x_data[n]
target = y_data[n]
x_test = x_test.reshape(1, 12)
predict = sess.run(preb, feed_dict={x: x_test})

print("prediction: ", predict, "Target: ", target)

writer.close()

Using the model code detection; As initial tutorial, the data set is not a training set and test set validation set division, the experimental purposes only, where the index value using a random number generator, using the index value to one sample, for contrast detection value and the actual value of the label.

 

 

 

 

 

 

Released five original articles · won praise 0 · Views 64

Guess you like

Origin blog.csdn.net/weixin_41707744/article/details/104701216