[PyTorch Deep Learning] Using graph convolutional neural network and taxi trajectory time series to predict traffic speed (with source code)

If you need the source code, please like and follow the collection and leave a private message in the comment area~~~

1: Data preparation

The dataset used is the taxi trajectory dataset from January 1, 2015 to January 31, 2015 in Shenzhen. The experimental data mainly includes two parts, one is a 156×156 adjacency matrix, which describes the spatial relationship between roads. Each row represents a road, and the values ​​in the matrix represent the connectivity between roads. The other is the feature matrix, which describes the velocity over time on each road. Each row represents a road, and each column is the traffic speed on the road at different time periods. The traffic speed of each road is summarized every 15 minutes, the data dimension is 2976×156, and the data of the past 10 time steps are used to predict the data of the next time step. Select 80% of the data as the training set, 20% of the data as the test set, and 10% of the data in the training set as the verification set to predict the traffic speed in real time

The meaning of the parameters of the custom function is shown in the table below

Build a dataset that divides the dataset into training, validation, and test sets

In PyTorch, DataLoader is a component for data loading. The data must be loaded before training the deep learning model.

 2: Model building

To construct the GCN layer, first calculate the fixed value D ̂^−1/2A ̂D ̂^−1/2 in the formula

When initializing the GCN layer, determine the number of input features in_features and the number of output features out_features of each node

Two-layer GCN network and three-layer fully connected network stack

3: Model termination and evaluation

The model termination part adopts early stopping technology

Four indicators were used to evaluate the model: root mean square error RMSE, Pearson correlation coefficient R2, mean absolute error MAE, weighted mean absolute percentage error WMAPE

4: Model training and testing

In model training: load data, use model to get predicted value, loss, and return loss value for parameter update

In the model testing part, you first need to use the torch.load function to import the model saved during the training process, and then use model.load_state_dict to load the saved parameter dictionary into the model

5: Result display

The results are shown as follows. It can be seen that the model is very accurate in predicting the traffic speed most of the time, but the prediction error on some maximum and minimum values ​​is very large, which is also the target direction of model improvement.

 6: Code

The last part of the code is as follows. If you need all the codes and data sets, please like and follow the collection and leave a private message in the comment area~~~

 The project structure is as follows

 

import torch
from torch.utils.data import Dataset
import numpy as np

"""
Parameter:
time_interval, time_lag, tg_in_one_day, forecast_day_number, is_train=True, is_val=False, val_rate=0.1, pre_len
"""


class Traffic_speed(Dataset):
	def __init__(self, time_interval, time_lag, tg_in_one_day, forecast_day_number, speed_data, pre_len, is_train=True, is_val=False, val_rate=0.1):
		super().__init__()
		# 此部分的作用是将数据集划分为训练集、验证集、测试集。
		# 完成后X的维度为 num*276*10,10代表10个时间步,Y的维度为 num*276*1
		# X为临近同一时段的10个时间步
		# Y为156条主干道未来1个时间步
		self.time_interval = time_interval
		self.time_lag = time_lag
		self.tg_in_one_day = tg_in_one_day
		self.forecast_day_number = forecast_day_number
		self.tg_in_one_week = self.tg_in_one_day*self.forecast_day_number
		self.speed_data = np.loadtxt(speed_data, delimiter=",").T  # 对数据进行转置
		self.max_speed = np.max(self.speed_data)
		self.min_speed = np.min(self.speed_data)
		self.is_train = is_train
		self.is_val = is_val
		self.val_rate = val_rate
		self.pre_len = pre_len

		# Normalization
		self.speed_data_norm = np.zeros((self.speed_data.shape[0], self.speed_data.shape[1]))
		for i in range(len(self.speed_data)):
			for j in range(len(self.speed_data[0])):
				self.speed_data_norm[i, j] = round((self.speed_data[i, j]-self.min_speed)/(self.max_speed-self.min_speed), 5)
		if self.is_train:
			self.start_index = self.tg_in_one_week + time_lag
			self.end_index = len(self.speed_data[0]) - self.tg_in_one_day * self.forecast_day_number - self.pre_len
		else:
			self.start_index = len(self.speed_data[0]) - self.tg_in_one_day * self.forecast_day_number
			self.end_index = len(self.speed_data[0]) - self.pre_len

		self.X = [[] for index in range(self.start_index, self.end_index)]
		self.Y = []
		self.Y_original = []
		# print(self.start_index, self.end_index)
		for index in range(self.start_index, self.end_index):
			temp = self.speed_data_norm[:, index - self.time_lag: index]  # 邻近几个时间段的进站量
			temp = temp.tolist()
			self.X[index - self.start_index] = temp
			self.Y.append(self.speed_data_norm[:, index:index + self.pre_len])
		self.X, self.Y = torch.from_numpy(np.array(self.X)), torch.from_numpy(np.array(self.Y))  # (num, 276, time_lag)

		# if val is not zero
		if self.val_rate * len(self.X) != 0:
			val_len = int(self.val_rate * len(self.X))
			train_len = len(self.X) - val_len
			if self.is_val:
				self.X = self.X[-val_len:]
				self.Y = self.Y[-val_len:]
			else:
				self.X = self.X[:train_len]
				self.Y = self.Y[:train_len]
		print("X.shape", self.X.shape, "Y.shape", self.Y.shape)

		if not self.is_train:
			for index in range(self.start_index, self.end_index):
				self.Y_original.append(self.speed_data[:, index:index + self.pre_len])  # the predicted speed before normalization
			self.Y_original = torch.from_numpy(np.array(self.Y_original))

	def get_max_min_speed(self):
		return self.max_speed, self.min_speed

	def __getitem__(self, item):
		if self.is_train:
			return self.X[item], self.Y[item]
		else:
			return self.X[item], self.Y[item], self.Y_original[item]

	def __len__(self):
		return len(self.X)
from data.datasets import Traffic_speed
from torch.utils.data import DataLoader


speed_data = "./data/sz_speed-论文数据.csv"

def get_speed_dataloader(time_interval=15, time_lag=5, tg_in_one_day=72, forecast_day_number=5, pre_len=1, batch_size=32):
	# train speed data loader
	print("train speed")
	speed_train = Traffic_speed(time_interval=time_interval, time_lag=time_lag, tg_in_one_day=tg_in_one_day, forecast_day_number=forecast_day_number,
								pre_len=pre_len, speed_data=speed_data, is_train=True, is_val=False, val_rate=0.1)
	max_speed, min_speed = speed_train.get_max_min_speed()
	speed_data_loader_train = DataLoader(speed_train, batch_size=batch_size, shuffle=False)

	# validation speed data loader
	print("val speed")
	speed_val = Traffic_speed(time_interval=time_interval, time_lag=time_lag, tg_in_one_day=tg_in_one_day, forecast_day_number=forecast_day_number,
								pre_len=pre_len, speed_data=speed_data, is_train=True, is_val=True, val_rate=0.1)
	speed_data_loader_val = DataLoader(speed_val, batch_size=batch_size, shuffle=False)

	# test speed data loader
	print("test speed")
	speed_test = Traffic_speed(time_interval=time_interval, time_lag=time_lag, tg_in_one_day=tg_in_one_day, forecast_day_number=forecast_day_number,
								pre_len=pre_len, speed_data=speed_data, is_train=False, is_val=False, val_rate=0)
	speed_data_loader_test = DataLoader(speed_test, batch_size=batch_size, shuffle=False)

	return speed_data_loader_train, speed_data_loader_val, speed_data_loader_test, max_speed, min_speed
import numpy as np
import os, time, torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
from utils.utils import GetLaplacian
from model.main_model import Model
import matplotlib.pyplot as plt
from utils.metrics import Metrics, Metrics_1d
from data.get_dataloader import get_speed_dataloader

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

epoch_num = 1000
lr = 0.001
time_interval = 15
time_lag = 10
tg_in_one_day = 72
forecast_day_number = 15
pre_len = 1
batch_size = 32
station_num = 156
# model_type = 'ours'
# TIMESTAMP = str(time.strftime("%Y_%m_%d_%H_%M_%S"))
# save_dir = './save_model/' + model_type + '_' + TIMESTAMP
# if not os.path.exists(save_dir):
# 	os.makedirs(save_dir)

speed_data_loader_train, speed_data_loader_val, speed_data_loader_test, max_speed, min_speed = \
	get_speed_dataloader(time_interval=time_interval, time_lag=time_lag, tg_in_one_day=tg_in_one_day, forecast_day_number=forecast_day_number, pre_len=pre_len, batch_size=batch_size)

# get normalized adj
adjacency = np.loadtxt('./data/sz_adj1.csv', delimiter=",")
adjacency = torch.tensor(GetLaplacian(adjacency).get_normalized_adj(station_num)).type(torch.float32).to(device)

global_start_time = time.time()
writer = SummaryWriter()

model = Model(time_lag, pre_len, station_num, device)

if torch.cuda.is_available():
	model.cuda()

model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
mse = torch.nn.MSELoss().to(device)

path = './save_model/ours_2021_08_25_08_40_23/model_dict_checkpoint_2929_0.00017515.pth'
checkpoint = torch.load(path)
model.load_state_dict(checkpoint, strict=True)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# test
result = []
result_original = []
if not os.path.exists('result/prediction'):
	os.makedirs('result/prediction/')
if not os.path.exists('result/original'):
	os.makedirs('result/original')
with torch.no_grad():
	model.eval()
	test_loss = 0
	for speed_te in enumerate(speed_data_loader_test):
		i_batch, (test_speed_X, test_speed_Y, test_speed_Y_original) = speed_te
		test_speed_X, test_speed_Y = test_speed_X.type(torch.float32).to(device), test_speed_Y.type(torch.float32).to(device)
		target = model(test_speed_X, adjacency)
		loss = mse(input=test_speed_Y, target=target)
		test_loss += loss.item()

		# evaluate on original scale
		# 获取result (batch, 276, pre_len)
		clone_prediction = target.cpu().detach().numpy().copy() * max_speed  # clone(): Copy the tensor and allocate the new memory
		# print(clone_prediction.shape)  # (16, 276, 1)
		for i in range(clone_prediction.shape[0]):
			result.append(clone_prediction[i])

		# 获取result_original
		test_speed_Y_original = test_speed_Y_original.cpu().detach().numpy()
		# print(test_OD_Y_original.shape)  # (16, 276, 1)
		for i in range(test_speed_Y_original.shape[0]):
			result_original.append(test_speed_Y_original[i])

	print(np.array(result).shape, np.array(result_original).shape)  # (num, 276, 1)
	# 取整&非负取0
	result = np.array(result).astype(np.int)
	result[result < 0] = 0
	result_original = np.array(result_original).astype(np.int)
	result_original[result_original < 0] = 0

	# # 取出多个车站进行画图   # (num, 276, 1)   # (num, 276, 2)  # (num, 276, 3)
	x = [[], [], [], [], []]
	y = [[], [], [], [], []]
	for i in range(result.shape[0]):
		x[0].append(result[i][4][0])
		y[0].append(result_original[i][4][0])
		x[1].append(result[i][18][0])
		y[1].append(result_original[i][18][0])
		x[2].append(result[i][30][0])
		y[2].append(result_original[i][30][0])
		x[3].append(result[i][60][0])
		y[3].append(result_original[i][60][0])
		x[4].append(result[i][94][0])
		y[4].append(result_original[i][94][0])
	result = np.array(result).reshape(station_num, -1)
	result_original = result_original.reshape(station_num, -1)

	RMSE, R2, MAE, WMAPE = Metrics(result_original, result).evaluate_performance()

	avg_test_loss = test_loss / len(speed_data_loader_test)
	print('test Loss:', avg_test_loss)

	RMSE_y0, R2_y0, MAE_y0, WMAPE_y0 = Metrics_1d(y[0], x[0]).evaluate_performance()
	RMSE_y1, R2_y1, MAE_y1, WMAPE_y1 = Metrics_1d(y[1], x[1]).evaluate_performance()
	RMSE_y2, R2_y2, MAE_y2, WMAPE_y2 = Metrics_1d(y[2], x[2]).evaluate_performance()
	RMSE_y3, R2_y3, MAE_y3, WMAPE_y3 = Metrics_1d(y[3], x[3]).evaluate_performance()
	RMSE_y4, R2_y4, MAE_y4, WMAPE_y4 = Metrics_1d(y[4], x[4]).evaluate_performance()

# L3, = plt.plot(x[0], color="r")
# L4, = plt.plot(y[0], color="b")
# plt.legend([L3, L4], ["L3-prediction", "L4-true"], loc='best')
# plt.show()

ALL = [RMSE, MAE, WMAPE]
y0_ALL = [RMSE_y0, MAE_y0, WMAPE_y0]
y1_ALL = [RMSE_y1, MAE_y1, WMAPE_y1]
y2_ALL = [RMSE_y2, MAE_y2, WMAPE_y2]
y3_ALL = [RMSE_y3, MAE_y3, WMAPE_y3]
y4_ALL = [RMSE_y4, MAE_y4, WMAPE_y4]

np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_ALL.txt', ALL)
np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_y0_ALL.txt', y0_ALL)
np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_y1_ALL.txt', y1_ALL)
np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_y2_ALL.txt', y2_ALL)
np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_y3_ALL.txt', y3_ALL)
np.savetxt('result/lr_' + str(lr) + '_batch_size_' + str(batch_size) + '_y4_ALL.txt', y4_ALL)
np.savetxt('result/X_original.txt', x)
np.savetxt('result/Y_prediction.txt', y)

print("ALL:", ALL)
print("y0_ALL:", y0_ALL)
print("y1_ALL:", y1_ALL)
print("y2_ALL:", y2_ALL)
print("y3_ALL:", y3_ALL)
print("y4_ALL:", y4_ALL)

print("end")

x = x[1]
y = y[1]
plt.xlabel("Time granularity=15min")
plt.ylabel("Speed")
L1, = plt.plot(x, color="r")
L2, = plt.plot(y, color="y")
plt.legend([L1, L2], ["pre", "actual"], loc='best')
plt.show()

 It's not easy to create and find it helpful, please like, follow and collect~~~

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/130448954