本文介绍了三种用于时间序列分类任务的网络架构，包括：LSTM、CNN-LSTM、ConvLSTM，并使用这些网络架构应用于业内标准的数据集UCI-HAR-Dataset进行人类活动识别。

文章目录

1. LSTM 模型

1.1 模型定义
1.2 模型评估
1.3 完整代码：

2. CNN-LSTM Model

2.1 数据输入shape
2.2 模型定义
2.3 完整代码

3. ConvLSTM 模型

3.1 数据输入shape
3.2 完整代码

4. 拓展

1. LSTM 模型

在本节中，我们将为人类活动识别数据集开发一个LSTM模型。 LSTM 是一种递归神经网络，能够学习和记住长序列的输入数据。它们旨在与包含较长数据序列（最长200到400个时间步长）的数据一起使用，它们可能很适合此问题。该模型可以支持输入数据的多个并行序列，例如加速度计和陀螺仪数据的每个轴。该模型学习从观测序列中提取特征，以及将内部特征映射到不同的活动类型。

使用LSTM进行序列分类的好处是可以直接从原始时间序列数据中学习，而无需学习专业知识来做特征工程。该模型可以学习时间序列数据的内部表示，并且理想情况下可以达到与经过特征工程处理的数据集版本匹配的模型可比的性能。

1.1 模型定义

我们将模型定义为具有单隐层的LSTM模型。接下来是一个dropout层，旨在减少模型对训练数据的过拟合。最后在使用输出层进行预测之前，使用全连接层来解释由LSTM隐藏层提取的特征。对于多类分类任务，使用随机梯度下降Adam来优化网络，并使用分类交叉熵损失函数。模型定义如下：

model = Sequential()
model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

在这种情况下，合适的epoch为15，batch_size=64，在更新模型权重之前将64个数据窗口输入模型。模型拟合后，在测试数据集上对其进行评估，并返回测试数据集上的拟合模型的准确性。注意，在拟合LSTM时通常不对序列数据进行 shuffle 操作。但在此模型中，我们会在训练期间随机调整输入数据的窗口（默认设置）。在这个问题中，我们感兴趣的是利用LSTMs的能力来学习和提取一个窗口中跨时间步长的特性，而不是跨窗口的特性。

1.2 模型评估

我们不能从单一的评价来判断模型的技能。这是因为神经网络是随机的，这意味着在相同的数据上训练相同的模型配置时，会产生不同的模型。这是网络的一个特征，因为它赋予了模型自适应能力，但需要对模型进行稍微复杂的评估。我们将多次重复对模型的评估，然后总结每次运行时模型的性能。代码实现：

def summarize_results(scores, params):
    print(scores, params)
    # 总结均值和标准差
    for i in range(len(scores)):
        m, s = np.mean(scores[i]), np.std(scores[i])
        print('Param=%s: %.3f%% (+/-%.3f)' % (params[i], m, s))

def run_experiment(trainX, trainy, testX, testy, repeats=10):
    # one-hot编码
    # 这个之前的文章中提到了，因为原数据集标签从1开始，而one-hot编码从0开始，所以要先减去1
    trainy = to_categorical(trainy-1)
    testy = to_categorical(testy-1)
    
    scores = []
    for r in range(repeats):
        score = evaluate_model(trainX, trainy, testX, testy)
        score = score * 100.0
        print('>#%d: %.3f' % (r+1, score))
        scores.append(score)
        print(scores)
    
    mean_scores, std_scores = np.mean(scores), np.std(scores)
    print('Accuracy: %.3f%% (+/-%.3f)' % (mean_scores, std_scores))

1.3 完整代码：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.utils import to_categorical

def load_file(filepath):
    dataframe = pd.read_csv(filepath, header=None, delim_whitespace=True)
    return dataframe.values

def load_dataset(data_rootdir, dirname, group):
    '''
    该函数实现将训练数据或测试数据文件列表堆叠为三维数组
    '''
    filename_list = []
    filepath_list = []
    X = []
    
    # os.walk() 方法是一个简单易用的文件、目录遍历器，可以高效的处理文件、目录。
    for rootdir, dirnames, filenames in os.walk(data_rootdir + dirname):
        for filename in filenames:
            filename_list.append(filename)
            filepath_list.append(os.path.join(rootdir, filename))
        #print(filename_list)
        #print(filepath_list)
    
    # 遍历根目录下的文件，并读取为DataFrame格式；
    for filepath in filepath_list:
        X.append(load_file(filepath))
    
    X = np.dstack(X) # dstack沿第三个维度叠加，两个二维数组叠加后，前两个维度尺寸不变，第三个维度增加；
    y = load_file(data_rootdir+'/y_'+group+'.txt')
    # one-hot编码。这个之前的文章中提到了，因为原数据集标签从1开始，而one-hot编码从0开始，所以要先减去1
    y = to_categorical(y-1)
    print('{}_X.shape:{},{}_y.shape:{}\n'.format(group,X.shape,group,y.shape))
    return X, y


def evaluate_model(trainX, trainy, testX, testy):
    verbose, epochs, batch_size = 0, 15, 64
    n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
    
    model = Sequential()
    model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
    model.add(Dropout(0.5))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(n_outputs, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)

    _, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
    return accuracy
        

def run_experiment(trainX, trainy, testX, testy, repeats=10):

    scores = list()
    for r in range(repeats):
        score = evaluate_model(trainX, trainy, testX, testy)
        score = score * 100.0
        print('>#%d: %.3f' % (r+1, score))
        scores.append(score)
    
    m, s = np.mean(scores), np.std(scores)
    print('Accuracy: %.3f%% (+/-%.3f)' % (m, s))

if __name__ == '__main__':
	train_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/train/'
	test_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/test/'
	dirname = '/Inertial Signals/'
	trainX, trainy = load_dataset(train_dir, dirname, 'train')
	testX, testy = load_dataset(test_dir, dirname, 'test')

	run_experiment(trainX, trainy, testX, testy, repeats=10)

首先运行示例将加载数据集。将创建并评估模型，并为每个模型打印调试信息。最后，打印分数样本，然后打印均值和标准差。可以看到该模型运行良好，在原始数据集上训练的分类精度约为89.7％，标准偏差约为1.3。考虑到原始论文发表了89％的结果，该结果是在具有特定领域特征的重型工程数据集而不是原始数据集上进行训练的，因此，这是一个很好的结果。

train_X.shape:(7352, 128, 9),train_y.shape:(7352, 6)

test_X.shape:(2947, 128, 9),test_y.shape:(2947, 6)
>#1: 90.770
>#2: 90.804
>#3: 88.768
>#4: 88.870
>#5: 90.227
>#6: 87.615
>#7: 91.313
>#8: 87.479
>#9: 91.144
>#10: 90.092
>Accuracy: 89.708% (+/-1.354)

2. CNN-LSTM Model

2.1 数据输入shape

CNN-LSTM体系结构涉及使用卷积神经网络（CNN）层对LSTM的输入数据进行特征提取，以支持序列预测。关于使用CNN-LSTM模型进行时间序列预测的内容，在之前的文章中已经介绍过了，此处不再赘述。CNN-LSTM模型将以块的形式读取主序列的子序列，从每个块中提取特征，然后允许LSTM解释特征从每个块中提取。一种实现此模型的方法是将128个时间步长的每个窗口划分为子序列，以供CNN模型处理。例如，每个窗口中的128个时间步长可以分为时间步长为32的四个子序列。代码实现：

# 将数据重塑为子序列
n_steps, n_length = 4, 32
trainX = trainX.reshape((trainX.shape[0], n_steps, n_length, n_features))
testX = testX.reshape((testX.shape[0], n_steps, n_length, n_features)

2.2 模型定义

然后，我们可以定义一个CNN模型，该模型期望以32个时间步长和9个特征的长度顺序读取。整个CNN模型可以包装在TimeDistributed层中，以允许在窗口的四个子序列中的每一个中读取相同的CNN模型。然后将提取的特征展平并提供给LSTM模型以进行读取，在对活动进行最终分类之前提取其自身的特征。模型定义如下：

model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=3, activation='relu'),
input_shape=(None,n_length,n_features)))
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=3, activation='relu')))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(100))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))

2.3 完整代码

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv1D, MaxPooling1D
from tensorflow.keras.layers import LSTM, TimeDistributed, ConvLSTM2D
from tensorflow.keras.utils import to_categorical

def load_file(filepath):
    dataframe = pd.read_csv(filepath, header=None, delim_whitespace=True)
    return dataframe.values

def load_dataset(data_rootdir, dirname, group):
    '''
    该函数实现将训练数据或测试数据文件列表堆叠为三维数组
    '''
    filename_list = []
    filepath_list = []
    X = []
    
    # os.walk() 方法是一个简单易用的文件、目录遍历器，可以高效的处理文件、目录。
    for rootdir, dirnames, filenames in os.walk(data_rootdir + dirname):
        for filename in filenames:
            filename_list.append(filename)
            filepath_list.append(os.path.join(rootdir, filename))
        #print(filename_list)
        #print(filepath_list)
    
    # 遍历根目录下的文件，并读取为DataFrame格式；
    for filepath in filepath_list:
        X.append(load_file(filepath))
    
    X = np.dstack(X) # dstack沿第三个维度叠加，两个二维数组叠加后，前两个维度尺寸不变，第三个维度增加；
    y = load_file(data_rootdir+'/y_'+group+'.txt')
    # one-hot编码。这个之前的文章中提到了，因为原数据集标签从1开始，而one-hot编码从0开始，所以要先减去1
    y = to_categorical(y-1)
    print('{}_X.shape:{},{}_y.shape:{}\n'.format(group,X.shape,group,y.shape))
    return X, y


def evaluate_model(trainX, trainy, testX, testy):
    verbose, epochs, batch_size = 0, 25, 64
    n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]

    n_steps, n_length = 4, 32
    trainX = trainX.reshape((trainX.shape[0], n_steps, n_length, n_features))
    testX = testX.reshape((testX.shape[0], n_steps, n_length, n_features))

    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=3, activation='relu'), 
                              input_shape=(None, n_length, n_features)))
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=3, activation='relu')))
    model.add(TimeDistributed(Dropout(0.5)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(100))
    model.add(Dropout(0.5))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(n_outputs, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)

    _, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
    return accuracy
        

def run_experiment(trainX, trainy, testX, testy, repeats=10):

    scores = list()
    for r in range(repeats):
        score = evaluate_model(trainX, trainy, testX, testy)
        score = score * 100.0
        print('>#%d: %.3f' % (r+1, score))
        scores.append(score)
    
    m, s = np.mean(scores), np.std(scores)
    print('Accuracy: %.3f%% (+/-%.3f)' % (m, s))

if __name__ == '__main__':
	train_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/train/'
	test_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/test/'
	dirname = '/Inertial Signals/'
	trainX, trainy = load_dataset(train_dir, dirname, 'train')
	testX, testy = load_dataset(test_dir, dirname, 'test')

	run_experiment(trainX, trainy, testX, testy, repeats=2)

运行示例将对2个运行中的每个运行的模型性能进行汇总，然后再报告测试集上模型性能的最终摘要。我们可以看到，该模型的性能约为90.5％，标准偏差约为0.37％。

train_X.shape:(7352, 128, 9),train_y.shape:(7352, 6)

test_X.shape:(2947, 128, 9),test_y.shape:(2947, 6)

>#1: 90.126
>#2: 90.872
Accuracy: 90.499% (+/-0.373)

3. ConvLSTM 模型

CNN-LSTM思想的进一步扩展是执行CNN的卷积（例如CNN如何读取输入序列数据）作为LSTM的一部分。这种组合称为卷积LSTM，简称ConvLSTM，像CNN-LSTM一样也用于时空数据。默认情况下，ConvLSTM2D 类要求数据输入的shape为：[样本，时间步长，行，列，通道]（[samples, time, rows, cols, channels]）。其中数据的每个时间步均定义为 (行×列) 数据点的图像。

3.1 数据输入shape

在上一节中，将给定的数据窗口（128个时间步长）分为时间步长为32的四个子序列。可以使用相同的子序列方法来定义ConvLSTM2D输入，其中时间步是窗口中子序列的数目，在处理一维数据时，行数是1，而列数表示子序列中的时间步数，在这种情况下为32。对于该问题框架，ConvLSTM2D的输入为：

样本（samples）：n，表示数据集中的窗口数。
时间（time）：4，将一个有128个时间步的窗口分成四个子序列。
行（rows）：1，表示每个子序列的一维形状。
列（columns）：32，输入子序列中的32个时间步。
通道（channels）：9，九个输入变量（特征，九轴传感数据）。

代码实现：

n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
# 重塑为子序列 (samples, timesteps, rows, cols, channels)
n_steps, n_length = 4, 32
trainX = trainX.reshape((trainX.shape[0], n_steps, 1, n_length, n_features))
testX = testX.reshape((testX.shape[0], n_steps, 1, n_length, n_features))

ConvLSTM2D类需要在CNN和LSTM方面进行配置。这包括指定过滤器的数量（例如64），在这种情况下（子序列时间步长的1行和3列）的二维核大小，relu激活函数。与CNN或LSTM模型一样，必须先将输出展平为一个长向量，然后才能通过全连接层对其进行解释。

3.2 完整代码

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import LSTM, TimeDistributed, ConvLSTM2D
from tensorflow.keras.utils import to_categorical

def load_file(filepath):
    dataframe = pd.read_csv(filepath, header=None, delim_whitespace=True)
    return dataframe.values

def load_dataset(data_rootdir, dirname, group):
    '''
    该函数实现将训练数据或测试数据文件列表堆叠为三维数组
    '''
    filename_list = []
    filepath_list = []
    X = []
    
    # os.walk() 方法是一个简单易用的文件、目录遍历器，可以高效的处理文件、目录。
    for rootdir, dirnames, filenames in os.walk(data_rootdir + dirname):
        for filename in filenames:
            filename_list.append(filename)
            filepath_list.append(os.path.join(rootdir, filename))
        #print(filename_list)
        #print(filepath_list)
    
    # 遍历根目录下的文件，并读取为DataFrame格式；
    for filepath in filepath_list:
        X.append(load_file(filepath))
    
    X = np.dstack(X) # dstack沿第三个维度叠加，两个二维数组叠加后，前两个维度尺寸不变，第三个维度增加；
    y = load_file(data_rootdir+'/y_'+group+'.txt')
    # one-hot编码。这个之前的文章中提到了，因为原数据集标签从1开始，而one-hot编码从0开始，所以要先减去1
    y = to_categorical(y-1)
    print('{}_X.shape:{},{}_y.shape:{}\n'.format(group,X.shape,group,y.shape))
    return X, y


def evaluate_model(trainX, trainy, testX, testy):
    verbose, epochs, batch_size = 0, 25, 64
    n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]

    n_steps, n_length = 4, 32

    trainX = trainX.reshape((trainX.shape[0], n_steps, 1, n_length, n_features))
    testX = testX.reshape((testX.shape[0], n_steps, 1, n_length, n_features))

    model = Sequential()
    model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
    model.add(Dropout(0.5))
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(n_outputs, activation='softmax'))
    
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)

    _, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
    return accuracy
        

def run_experiment(trainX, trainy, testX, testy, repeats=10):

    scores = list()
    for r in range(repeats):
        score = evaluate_model(trainX, trainy, testX, testy)
        score = score * 100.0
        print('>#%d: %.3f' % (r+1, score))
        scores.append(score)
    
    m, s = np.mean(scores), np.std(scores)
    print('Accuracy: %.3f%% (+/-%.3f)' % (m, s))

if __name__ == '__main__':
	train_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/train/'
	test_dir = 'D:/GraduationCode/01 Datasets/UCI HAR Dataset/test/'
	dirname = '/Inertial Signals/'
	trainX, trainy = load_dataset(train_dir, dirname, 'train')
	testX, testy = load_dataset(test_dir, dirname, 'test')

	run_experiment(trainX, trainy, testX, testy, repeats=2)

与先前的实验一样，运行模型会在每次拟合和评估时打印出模型的性能。运行结束时将提供最终模型性能的摘要。可以看到，该模型在问题上始终表现良好，可达到约90.7％的准确度，而且可能比CNN-LSTM模型消耗的资源更少。

train_X.shape:(7352, 128, 9),train_y.shape:(7352, 6)

test_X.shape:(2947, 128, 9),test_y.shape:(2947, 6)

>#1: 90.159
>#2: 91.347
Accuracy: 90.753% (+/-0.594)

4. 拓展

针对时间序列分类任务，还可以做如下探索：

数据准备。考虑研究简单的数据缩放方案是否可以进一步提高模型性能，例如归一化，标准化。
LSTM变体。 LSTM体系结构的各种变体可以在此问题上获得更好的性能，例如堆叠的LSTM和双向LSTM。
超参数调整。考虑研究模型超参数的调整，例如units，epochs，batch_size等。

参考：
参考1
参考2

datamonday

发布了167 篇原创文章 · 获赞 686 · 访问量 5万+

私信关注

时间序列分类04：如何开发LSTMs模型实现人类活动识别（CNN-LSTM、ConvLSTM）