Semi-supervised convolutional neural networks for human activity recognition

基于半监督卷积神经网络的人体动作识别

Zeng, M. Y., Tong & Wang, Xiao & Nguyen, Le T. & Mengshoel, Ole & Lane, Ian. (2017). Semi-supervised convolutional neural networks for human activity recognition. 2017 IEEE International Conference on Big Data (Big Data).

Key words：Human Activity Recognition; Deep Neural Networks; Semi-Supervised Learning; Convolutional Neural Networks

注：博文中图片均出自此paper。

Abstract

作用人体动作识别（HAR）的标记数据通常受到尺寸和多样性的限制，导致训练出的模型不够泛化（generalize）。

半监督学习通过使用弱标记样本增强标记样本，进而增强模型的表现。

然而在之前的半监督学习的研究中，假设特征提取已经完成了。

在本篇论文中，将使用卷积神经网络（convolution neural network，CNN）来学习隐藏的不同特征（discriminative hidden features）。

Our semi-supervised CNNs learn from both labeled and unlabeled data while also performing feature learning on raw sensor data. 即CNNs又参与网络训练又参与特征学习。

1. INTRODUCTION

监督学习DNNs（supervised methods, deep neural networks）对HAR效果很好，但标记数据仍是一大问题。

相比于其他项目，HAR中数据标记问题很可观（substantial），这是因为HAR数据集通常

（1）标记样本少，对有些动作太过笼统。（For example, the labeled training data may only cover walking at certain speeds. In reality, humans walk at a range of speeds.）

（2）高度个性化和多样化。（people may perform the same activity in very different ways. 比如不同人难以分辨快走和跑）

所以当标记数据受限制时，我们可以用未标记的数据调整标记数据的特征表现来增强HAR的表现，这个过程叫做特征学习（feature learning）。

在先前的研究中，半监督学习HAR项目常使用人工制作的特征（handcrafted features），而人工制作的特征不是由特征学习来的，这对未标记数据的发挥产生了限制。

在本作中，我们研究如何利用有限的标记数据和大量的为标记数据去训练准确且泛化的DNNs（how to train accurate and generalizable DNNs with limited labeled data and large scale unlabeled data）。

2. RELATED WORKS

(1)machine learning in HAR

(2)semi-supervised learning in HAR

半监督学习有：

An on-line adaptation method、The self-learning based approaches 、The graphbased approach。

然而这些方法将标签传播和分类分为两个环节，忽视了标记数据和未标记数据之间的相关性。

在近期的研究中，出现了ladder networks，它能够同时在未标记数据集上训练一个深度编码器（deep auto encoder）以及在标记数据集上训练一个神经网络。

3. SEMI-SUPERVISED CNN BASED MODELS

(1)CNN for Supervised Learning

一个典型CNN模型的基本训练过程。

CNN输入输出：z(1) i , ..., z(L) i , yi =CNN(xi )

CNN cost（交叉熵）：

传统的神经网络训练优化过程：根据cost（或者叫loss）对网络模型参数进行优化，最终使得cost最小化。

(2)CNN Encoder-Decoder for Unsupervised Learning

即对于一组数据x，先卷积（convolution）+池化（max-pooling）+全连接过程，

再全连接+反卷积（deconvolution）+反池化（upsampling，因为池化过程也叫下采样subsample）过程，得到 $\hat{x}$

（这一结构类似于denoising autoencoder (DAE) ）

此结构的cost（平方差）：

关于这一段的理解可以参考http://www.cnblogs.com/yangmang/p/7428014.html

(3)Semi-Supervised CNN-Encoder-Decoder for HAR

我们结合了监督学习CNN模型和CNN-Encoder-Decoder模型来搭建HAR半监督学习。

除了标记数据{(xi , ti ) |1 ≤ i ≤ N}，还使用了未标记数据{xi |N+1≤ i ≤ N+M}。

在模型中对标记数据和未标记数据来说有三个通道（paths）：clean encoding, noisy encoding, and the decoding

（a）标记数据和未标记数据都经过clean encoder path来计算网络层中的隐藏参数。

（b）对于noisy encoder path来说，标记数据和未标记数据先混入高斯噪声（Gaussian noise），然后通过网络得出 $\tilde{z}$ 。

（b-1）对于有标记的有噪声数据( ˜xi ,1 ≤ i ≤ N)，利用softmax分类+交叉熵cost得出预测值 $\tilde{y}$ 。

（b-2）对于未标记的有噪声数据( ˜xi ,N+1 ≤ i ≤ N+M)，利用decoder重构，使其重构成相应的clean input，此步骤用平方差计算cost。

所以对于整个CNN-Encoder-Decoder模型来说，cost由两部分组成：

那么通过这样一个CNN-Encoder-Decoder模型，我们可以同时learn the network and features。

(4)Semi-Supervised CNN-Ladder for HAR

CNN-Ladder包含两个连接：垂直连接和侧边连接（the vertical connections and the lateral connections ）。

垂直连接包含clean encoder、noisy encoder和decoder。

与CNN-Encoder-Decoder不同，重构过程中 $\hat{z}$ 不仅是由重构过程中上一层 $\hat{z}$ 得来的，侧边（lateral）各层对应的 $\tilde{z}$ 都要参与重构。

这样做的好处在于find better middle-level representations。

同时为了improve the middle-level features reconstruction，我们令decoder中间层与encoder中的对应层尽可能相似，所以这种结构的cost：

4. EXPERIMENTS

在三个公共数据集上验证（validate）了我们的方法。

（1）对比本方法与其他监督学习神经网络方法。（2）对比本方法与其他半监督学习方法。

表中的数值为F1-score

（3）用不同数量的标记数据和未标记数据在本方法上进行实验）。

（3-a）标记数据量变化

这表明相比于CNN，CNN-Ladder能够在标记数据量更小的情况下达到相同的精度。

（3-b）未标记数据量变化

这表明better latent features in the auto-encoder can be trained with more unlabeled examples and help adjust the latent CNN features, thereby improving accuracy.

（4）The Impact of Adjusting Features in Different Layers

A high Fm score can typically be achieved by setting a large λl for the layers representing low-level features.

This indicates that low-level features of the neural networks can be much improved by using the unlabeled data.

总结就是CNN-Ladder中利用未标记数据提升/调整了low-level features，而在其他半监督学习或者CNN模型中这是被忽略的。

（5）讨论我们的方法为什么比其他半监督学习方法表现的好。

这张图展示了the features in the last layer of CNN-Ladder with unlabeled data and CNN without unlabeled data。

在CNN模型中，由于不同user对相同动作的行为姿态不同，所以即使对相同动作，cnn提取的feature也是很分散的。

而在CNN-Ladder中，the test examples concentrate in the region where the labeled data locate。这是因为Using the low-level feature representations trained with additional unlabeled data, the activities of different users become similar。

[paper阅读理解] 基于半监督卷积神经网络的人体动作识别