TCN From "Abaaaba" to "Balabala"
- The concept of TCN (why come! What problems can be solved)
- Parents of TCN (Origin)
- Introduction to the principle of TCN
- On the code!
1. What does TCN (time-domain convolutional network, temporal convolutional network) do and what it can do
Time series forecast, probability forecast, time forecast, traffic forecast
2. The origin of TCN
ps: Before understanding TCN, you need to have a certain understanding of CNN and RNN.
- solving issues:
It is a network structure capable of processing time series data. Under certain conditions, the effect is better than traditional neural networks (RNN, CNN, etc.).
3. Introduction to the principle of TCN
TCN network structure
1. The network structure of TCN is mainly composed of the above figure. This article is divided into two parts, the left and the right, the first is the left
Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout--->Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout
Obviously this can be divided into
(Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout)*2
ok, let's explain these four one by one, if you know more, you can choose to skip
1、Dilated Gausal Conv
Chinese name: Expansion Causal Convolution
Dilated causal convolution can be divided into three parts : dilation , causality , and convolution .
Convolution refers to the convolution in CNN, which refers to a sliding operation performed by the convolution kernel on the data;
Expansion refers to the interval sampling of the input that allows convolution , which is similar to the stride in the convolutional neural network, but there are also obvious differences
illustrate:
Causality refers to the data at time t in the i-th layer, which only depends on the influence of the (i-1) layer at time t and its previous values. Causal convolution can discard the reading of future data during training, and is a strict time-constrained model.
illustrate:
(ps: no expansion convolution is added)
2、WeightNorm
weight normalization
Normalize the weight value. If you want to study the normalization process & normalization formula carefully, you can click the link to learn
advantage:
1. The time overhead is small and the operation speed is fast!
2. Introduce less noise
3. WeightNorm is accelerated by rewriting the weight of the deep network, without introducing dependence on minibatch
3、ReLU()
A kind of activation function
advantage:
1. It can make the training speed of the network faster
2. Increase the nonlinearity of the network and improve the expressive ability of the model
3. Prevent the gradient from disappearing,
4. Make the network sparse, etc.
official:
Overview diagram:
4、Dropout()
Dropout means that during the training process of the deep learning network, for the neural network unit, it is temporarily discarded from the network according to a certain probability.
Advantages: prevent overfitting and improve the calculation speed of the model
Second, the last is the right side - -residual connection :
On the right is a 1*1 convolution block, which not only enables the network to have the function of transmitting information across layers, but also ensures the consistency of input and output.
3. Advantages of TCN:
1. Parallelism
2. To a large extent, gradient disappearance and gradient explosion can be avoided
3. The larger the receptive field, the more information learned
4. From zero coding
import os
import sys
import paddle
import paddle.nn as nn
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
import paddle.nn.functional as F
from paddle.nn.utils import weight_norm
from sklearn.preprocessing import MinMaxScaler
from pandas.plotting import register_matplotlib_converters
from sourceCode import TimeSeriesNetwork
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../..")))
class Chomp1d(nn.Layer):
def __init__(self, chomp_size):
super(Chomp1d, self).__init__()
self.chomp_size = chomp_size
def forward(self, x):
return x[:, :, :-self.chomp_size]
class TemporalBlock(nn.Layer):
def __init__(self,
n_inputs,
n_outputs,
kernel_size,
stride,
dilation,
padding,
dropout=0.2):
super(TemporalBlock, self).__init__()
self.conv1 = weight_norm(
nn.Conv1D(
n_inputs,
n_outputs,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation))
# Chomp1d is used to make sure the network is causal.
# We pad by (k-1)*d on the two sides of the input for convolution,
# and then use Chomp1d to remove the (k-1)*d output elements on the right.
self.chomp1 = Chomp1d(padding)
self.relu1 = nn.ReLU()
self.dropout1 = nn.Dropout(dropout)
self.conv2 = weight_norm(
nn.Conv1D(
n_outputs,
n_outputs,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation))
self.chomp2 = Chomp1d(padding)
self.relu2 = nn.ReLU()
self.dropout2 = nn.Dropout(dropout)
self.net = nn.Sequential(self.conv1, self.chomp1, self.relu1,
self.dropout1, self.conv2, self.chomp2,
self.relu2, self.dropout2)
self.downsample = nn.Conv1D(n_inputs, n_outputs,
1) if n_inputs != n_outputs else None
self.relu = nn.ReLU()
self.init_weights()
def init_weights(self):
self.conv1.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.conv1.weight.shape))
self.conv2.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.conv2.weight.shape))
if self.downsample is not None:
self.downsample.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.downsample.weight.shape))
def forward(self, x):
out = self.net(x)
res = x if self.downsample is None else self.downsample(x) # 让输入等于输出
return self.relu(out + res)
class TCNEncoder(nn.Layer):
def __init__(self, input_size, num_channels, kernel_size=2, dropout=0.2):
# input_size : 输入的预期特征数
# num_channels: 通道数
# kernel_size: 卷积核大小
super(TCNEncoder, self).__init__()
self._input_size = input_size
self._output_dim = num_channels[-1]
layers = nn.LayerList()
num_levels = len(num_channels)
# print('print num_channels: ', num_channels)
# print('print num_levels: ',num_levels)
# exit(0)
for i in range(num_levels):
dilation_size = 2 ** i
in_channels = input_size if i == 0 else num_channels[i - 1]
out_channels = num_channels[i]
layers.append(
TemporalBlock(
in_channels,
out_channels,
kernel_size,
stride=1,
dilation=dilation_size,
padding=(kernel_size - 1) * dilation_size,
dropout=dropout))
self.network = nn.Sequential(*layers)
def get_input_dim(self):
return self._input_size
def get_output_dim(self):
return self._output_dim
def forward(self, inputs):
inputs_t = inputs.transpose([0, 2, 1])
output = self.network(inputs_t).transpose([2, 0, 1])[-1]
return output
class TimeSeriesNetwork(nn.Layer):
def __init__(self, input_size, next_k=1, num_channels=[256]):
super(TimeSeriesNetwork, self).__init__()
self.last_num_channel = num_channels[-1]
self.tcn = TCNEncoder(
input_size=input_size,
num_channels=num_channels,
kernel_size=3,
dropout=0.2
)
self.linear = nn.Linear(in_features=self.last_num_channel, out_features=next_k)
def forward(self, x):
tcn_out = self.tcn(x)
y_pred = self.linear(tcn_out)
return y_pred
'''
我努力把自己塑造成悲剧里面的男主角,
把一切过错推到你的身上,
让你成为万恶的巫婆,
丧心病狂
可是我就是一个正常的人,
有悲有喜,
有错有对,
走到今天这个地步,
我们都有责任,
直到现在我还没有觉得我失去了你
你告诉我,我失去你了么?
'''
def config_mtp():
sns.set(style='whitegrid', palette='muted', font_scale=1.2)
HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#93D30C", "#8F00FF"]
sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))
rcParams['figure.figsize'] = 14, 10
register_matplotlib_converters()
def read_data():
df_all = pd.read_csv('./data/time_series_covid19_confirmed_global.csv')
# print(df_all.head())
# 我们将对全世界的病例数进行预测,因此我们不需要关心具体国家的经纬度等信息,只需关注具体日期下的全球病例数即可。
df = df_all.iloc[:, 4:]
daily_cases = df.sum(axis=0)
daily_cases.index = pd.to_datetime(daily_cases.index)
# print(daily_cases.head())
plt.figure(figsize=(5, 5))
plt.plot(daily_cases)
plt.title("Cumulative daily cases")
# plt.show()
# 为了提高样本时间序列的平稳性,继续取一阶差分
daily_cases = daily_cases.diff().fillna(daily_cases[0]).astype(np.int64)
# print(daily_cases.head())
plt.figure(figsize=(5, 5))
plt.plot(daily_cases)
plt.title("Daily cases")
plt.xticks(rotation=60)
plt.show()
return daily_cases
def create_sequences(data, seq_length):
xs = []
ys = []
for i in range(len(data) - seq_length + 1):
x = data[i:i + seq_length - 1]
y = data[i + seq_length - 1]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
def preprocess_data(daily_cases):
TEST_DATA_SIZE,SEQ_LEN = 30,10
TEST_DATA_SIZE = int(TEST_DATA_SIZE/100*len(daily_cases))
# TEST_DATA_SIZE=30,最后30个数据当成测试集,进行预测
train_data = daily_cases[:-TEST_DATA_SIZE]
test_data = daily_cases[-TEST_DATA_SIZE:]
print("The number of the samples in train set is : %i" % train_data.shape[0])
print(train_data.shape, test_data.shape)
# 为了提升模型收敛速度与性能,我们使用scikit-learn进行数据归一化。
scaler = MinMaxScaler()
train_data = scaler.fit_transform(np.expand_dims(train_data, axis=1)).astype('float32')
test_data = scaler.transform(np.expand_dims(test_data, axis=1)).astype('float32')
# 搭建时间序列
# 可以用前10天的病例数预测当天的病例数,为了让测试集中的所有数据都能参与预测,我们将向测试集补充少量数据,这部分数据只会作为模型的输入。
x_train, y_train = create_sequences(train_data, SEQ_LEN)
test_data = np.concatenate((train_data[-SEQ_LEN + 1:], test_data), axis=0)
x_test, y_test = create_sequences(test_data, SEQ_LEN)
# 尝试输出
'''
print("The shape of x_train is: %s"%str(x_train.shape))
print("The shape of y_train is: %s"%str(y_train.shape))
print("The shape of x_test is: %s"%str(x_test.shape))
print("The shape of y_test is: %s"%str(y_test.shape))
'''
return x_train,y_train,x_test,y_test,scaler
# 数据集处理完毕,将数据集封装到CovidDataset,以便模型训练、预测时调用。
class CovidDataset(paddle.io.Dataset):
def __init__(self, feature, label):
self.feature = feature
self.label = label
super(CovidDataset, self).__init__()
def __len__(self):
return len(self.label)
def __getitem__(self, index):
return [self.feature[index], self.label[index]]
def parameter():
LR = 1e-2
model = paddle.Model(network)
optimizer = paddle.optimizer.Adam(
learning_rate=LR, parameters=model.parameters())
loss = paddle.nn.MSELoss(reduction='sum')
model.prepare(optimizer, loss)
config_mtp()
data = read_data()
x_train,y_train,x_test,y_test,scaler = preprocess_data(data)
train_dataset = CovidDataset(x_train, y_train)
test_dataset = CovidDataset(x_test, y_test)
network = TimeSeriesNetwork(input_size=1)
# 参数配置
LR = 1e-2
model = paddle.Model(network)
optimizer = paddle.optimizer.Adam(learning_rate=LR, parameters=model.parameters()) # 优化器
loss = paddle.nn.MSELoss(reduction='sum')
model.prepare(optimizer, loss) # Configures the model before runing,运行前配置模型
# 训练
USE_GPU = False
TRAIN_EPOCH = 100
LOG_FREQ = 20
SAVE_DIR = os.path.join(os.getcwd(),"save_dir")
SAVE_FREQ = 20
if USE_GPU:
paddle.set_device("gpu")
else:
paddle.set_device("cpu")
model.fit(train_dataset,
batch_size=32,
drop_last=True,
epochs=TRAIN_EPOCH,
log_freq=LOG_FREQ,
save_dir=SAVE_DIR,
save_freq=SAVE_FREQ,
verbose=1 # The verbosity mode, should be 0, 1, or 2. 0 = silent, 1 = progress bar, 2 = one line per epoch. Default: 2.
)
# 预测
preds = model.predict(
test_data=test_dataset
)
# 数据后处理,将归一化的数据转化为原数据,画出真实值对应的曲线和预测值对应的曲线。
true_cases = scaler.inverse_transform(
np.expand_dims(y_test.flatten(), axis=0)
).flatten()
predicted_cases = scaler.inverse_transform(
np.expand_dims(np.array(preds).flatten(), axis=0)
).flatten()
print(true_cases.shape, predicted_cases.shape)
# print (type(data))
# print(data[1:3])
# print (len(data), len(data))
# print(data.index[:len(data)])
mse_loss = paddle.nn.MSELoss(reduction='mean')
print(paddle.sqrt(mse_loss(paddle.to_tensor(true_cases), paddle.to_tensor(predicted_cases))))
print(true_cases, predicted_cases)
If you need data, please comment below, and you can also get it by private message.
Don't forget to like, comment, and bookmark, it's really important to me~