Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition
<p>Different types of units in neural networks. (<b>a</b>) MLP with three dense layers; (<b>b</b>) recurrent neural network (RNN) with two dense layers. The activation and hidden value of the unit in layer (<math display="inline"> <mrow> <mi>l</mi> <mo>+</mo> <mn>1</mn> </mrow> </math>) are computed in the same time step <span class="html-italic">t</span>; (<b>c</b>) The recurrent LSTM cell is an extension of RNNs, where the internal memory can be updated, erased or read out.</p> "> Figure 2
<p>Representation of a temporal convolution over a single sensor channel in a three-layer convolutional neural network (CNN). Layer <math display="inline"> <mrow> <mo>(</mo> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </math> defines the sensor data at the input. The next layer (<span class="html-italic">l</span>) is composed of two feature maps (<math display="inline"> <mrow> <msubsup> <mi>a</mi> <mn>1</mn> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <mi>τ</mi> <mo>)</mo> </mrow> </mrow> </math> and <math display="inline"> <mrow> <msubsup> <mi>a</mi> <mn>2</mn> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <mi>τ</mi> <mo>)</mo> </mrow> </mrow> </math>) extracted by two different kernels (<math display="inline"> <msubsup> <mi>K</mi> <mrow> <mn>11</mn> </mrow> <mrow> <mo>(</mo> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </math> and <math display="inline"> <msubsup> <mi>K</mi> <mrow> <mn>21</mn> </mrow> <mrow> <mo>(</mo> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </math>). The deepest layer (layer <math display="inline"> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </math>) is composed by a single feature map, resulting from temporal convolution in layer <span class="html-italic">l</span> of a two-dimensional kernel <math display="inline"> <msubsup> <mi>K</mi> <mrow> <mn>1</mn> </mrow> <mi>l</mi> </msubsup> </math>. The time axis (which is convolved over) is horizontal.</p> "> Figure 3
<p>Architecture of the DeepConvLSTM (Conv, convolutional) framework for activity recognition. From the left, the signals coming from the wearable sensors are processed by four convolutional layers, which allow learning features from the data. Two dense layers then perform a non-linear transformation, which yields the classification outcome with a softmax logistic regression output layer on the right. Input at Layer 1 corresponds to sensor data of size <math display="inline"> <mrow> <mi>D</mi> <mo>×</mo> <msup> <mi>S</mi> <mn>1</mn> </msup> </mrow> </math>, where <span class="html-italic">D</span> denotes the number of sensor channels and <math display="inline"> <msup> <mi>S</mi> <mi>l</mi> </msup> </math> the length of features maps in layer <span class="html-italic">l</span>. Layers 2–5 are convolutional layers. <math display="inline"> <msup> <mi>K</mi> <mi>l</mi> </msup> </math> denotes the kernels in layer <span class="html-italic">l</span> (depicted as red squares). <math display="inline"> <msup> <mi>F</mi> <mi>l</mi> </msup> </math> denotes the number of feature maps in layer <span class="html-italic">l</span>. In convolutional layers, <math display="inline"> <msubsup> <mi>a</mi> <mi>i</mi> <mi>l</mi> </msubsup> </math> denotes the activation that defines the feature map <span class="html-italic">i</span> in layer <span class="html-italic">l</span>. Layers 6 and 7 are dense layers. In dense layers, <math display="inline"> <msubsup> <mi>a</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mi>l</mi> </msubsup> </math> denotes the activation of the unit <span class="html-italic">i</span> in hidden layer <span class="html-italic">l</span> at time <span class="html-italic">t</span>. The time axis is vertical.</p> "> Figure 4
<p>Placement of on-body sensors used in the OPPORTUNITYdataset (left: inertial measurements units; right: 3-axis accelerometers) [<a href="#B7-sensors-16-00115" class="html-bibr">7</a>].</p> "> Figure 5
<p>Sequence labelling after segmenting the data with a sliding window. The sensor signals are segmented by a jumping window. The activity class within each sequence is considered to be the ground truth label annotated at the sample <span class="html-italic">T</span> of that window.</p> "> Figure 6
<p>Output class probabilities for a ~25 s-long fragment of sensor signals in the test set of the OPPORTUNITY dataset, which comprises 10 annotated gestures. Each point in the plot represents the class probabilities obtained from processing the data within a sequence of 500 ms obtained from a sliding window ending at that point. The dashed line represents the <span class="html-italic">Null</span> class. DeepConvLSTM offers a better performance identifying the start and ending of gestures.</p> "> Figure 7
<p><math display="inline"> <msub> <mi>F</mi> <mn>1</mn> </msub> </math> score performance of DeepConvLSTM on the OPPORTUNITY dataset. Classification performance is displayed individually per gesture, for different lengths of the input sensor data segments. Experiments carried out with sequences of length of 400 ms, 500 ms, 1400 ms and 2750 ms. The horizontal axis represents the ratio between the gesture length and the sequence length (ratios under one represent performance for gestures whose durations are shorter than the sequence duration).</p> "> Figure 8
<p>Performance of Skoda and OPPORTUNITY (recognizing gestures and with the <span class="html-italic">Null</span> class) datasets with different numbers of convolutional layers.</p> ">
Abstract
:1. Introduction
- We present DeepConvLSTM: a deep learning framework composed of convolutional and LSTM recurrent layers, that is capable of automatically learning feature representations and modelling the temporal dependencies between their activation.
- We demonstrate that this framework is suitable for activity recognition from wearable sensor data by using it on two families of human activity recognition problems, that of static/periodic activities (modes of locomotion and postures) and that of sporadic activities (gestures).
- We show that the framework can be applied seamlessly to different sensor modalities individually and that it can also fuse them to improve performance. We demonstrate this on accelerometers, gyroscopes and combinations thereof.
- We show that the system works directly on the raw sensor data with minimal pre-processing, which makes it particularly general and minimises engineering bias.
- We compare the performance of our approach to that reported by contestants participating to a recognised activity recognition challenge (OPPORTUNITY) and to another open dataset (Skoda).
- We show that the proposed architecture outperforms published results obtained on the OPPORTUNITY challenge, including a deep CNN, which had already offered state-of-the-art results in previous studies [17].
- We discuss the results, including the characterisation of key parameters’ influence on performance, and outline venues for future research towards taking additional advantages of the characteristics of the deep architecture.
2. State of the Art
2.1. From Feedforward to Recurrent Networks
2.2. Feature Learning with Convolutional Networks
2.3. Application of Deep Networks for HAR
3. Architecture
3.1. DeepConvLSTM
3.2. Baseline Deep CNN
3.3. Model Implementation and Training
Layer | DeepConvLSTM | Baseline CNN | ||
---|---|---|---|---|
Size Per Parameter | Size Per Layer | Size Per Parameter | Size Per Layer | |
2 | K: | 384 | K: | 384 |
: 64 | : 64 | |||
3–5 | K: | 20,544 | K: | 20,544 |
: 64 | : 64 | |||
6 | : | 942,592 | W: | 7,405,696 |
: | : 128 | |||
: 128 | ||||
: 128 | ||||
: 128 | ||||
: 128 | ||||
7 | : | 33,280 | W: | 16,512 |
: | : 128 | |||
: 128 | ||||
: 128 | ||||
: 128 | ||||
: 128 | ||||
8 | W: | W: | ||
: | : | |||
Total | 996,800 |
4. Experimental Setup
4.1. Benchmark Datasets
4.1.1. The OpportunityDataset
- Task A: recognition modes of locomotion and postures. The goal of this task is to classify modes of locomotion from the full set of body-worn sensors. This is a 5-class segmentation and classification problem.
- Task B: recognition of sporadic gestures. This task concerns recognition of the different right-arm gestures. This is an 18-class segmentation and classification problem.
OPPORTUNITY | Skoda | |||||||
---|---|---|---|---|---|---|---|---|
Gestures | Modes of Locomotion | |||||||
Name | # of Repetitions | # of Instances | Name | # of Repetitions | # of Instances | Name | # of Repetitions | # of Instances |
Open Door 1 | 94 | 1583 | Stand | 1267 | 38,429 | Write on Notepad | 58 | 20,874 |
Open Door 2 | 92 | 1685 | Walk | 1291 | 22,522 | Open Hood | 68 | 24,444 |
Close Door 1 | 89 | 1497 | Sit | 124 | 16,162 | Close Hood | 66 | 23,530 |
Close Door 2 | 90 | 1588 | Lie | 30 | 2866 | Check Gaps Door | 67 | 16,961 |
Open Fridge | 157 | 196 | Null | 283 | 16,688 | Open Door | 69 | 10,410 |
Close Fridge | 159 | 1728 | Check Steering Wheel | 69 | 12,994 | |||
Open Dishwasher | 102 | 1314 | Open and Close Trunk | 63 | 23,061 | |||
Close Dishwasher | 99 | 1214 | Close both Doors | 69 | 18,039 | |||
Open Drawer 1 | 96 | 897 | Close Door | 70 | 9783 | |||
Close Drawer 1 | 95 | 781 | Check Trunk | 64 | 19,757 | |||
Open Drawer 2 | 91 | 861 | ||||||
Close Drawer 2 | 90 | 754 | ||||||
Open Drawer 3 | 102 | 1082 | ||||||
Close Drawer 3 | 103 | 1070 | ||||||
Clean Table | 79 | 1717 | ||||||
Drink from Cup | 213 | 6115 | ||||||
Toggle Switch | 156 | 1257 | ||||||
Null | 1605 | 69,558 |
4.1.2. The Skoda Dataset
4.2. Performance Measure
5. Results and Discussion
5.1. Performance Comparison
OPPORTUNITY Dataset | ||
---|---|---|
Challenge Submissions [7] | ||
Method | Description | |
LDA | Linear discriminant analysis. Gaussian classifier that classifies on the assumption that the features are normally distributed and all classes have the same covariance matrix. | |
QDA | Quadratic discriminant analysis. Similar to the LDA, this technique also assumes a normal distribution for the features, but the class covariances may differ. | |
NCC | Nearest centroid classifier. The Euclidean distance between the test sample and the centroid for each class of samples is used for the classification. | |
1NN | k nearest neighbour algorithm. Lazy algorithm where the Euclidean distances between a test sample and the training samples are computed and the most frequently-occurring label of the k-closest samples is the output. | |
3NN | See 1NN. Using 3 neighbours. | |
UP | Submission to the OPPORTUNITY challenge from U. of Parma. Pattern comparison using mean, variance, maximum and minimum values. | |
NStar | Submission to the OPPORTUNITY challenge from U. of Singapore. kNN algorithm using a single neighbour and normalized data. | |
SStar | Submission to the OPPORTUNITY challenge from U. of Singapore. Support vector machine algorithm using scaled data. | |
CStar | Submission to the OPPORTUNITY challenge from U. of Singapore. Fusion of a kNN algorithm using the closest neighbour and a support vector machine. | |
NU | Submission to the OPPORTUNITY challenge from U. of Nagoya. C4.5 decision tree algorithm using mean, variance and energy. | |
MU | Submission to the OPPORTUNITY challenge from U. of Monash. Decision tree grafting algorithm. | |
Deep approaches | ||
Method | Description | |
CNN [17] | Results reported by Yang et. al., in [17]. The value is computed using the average performance for Subjects 1, 2 and 3. | |
Skoda dataset | ||
Deep approaches | ||
Method | Description | |
CNN [23] | Results reported by Ming Zeng et. al., in [23]. Performance computed using one accelerometer on the right arm to identify all activities. | |
CNN [43] | Results reported by Alsheikh et. al., in [43]. Performance computed using one accelerometer node (id #16) to identify all activities. |
Method | Modes of Locomotion | Modes of Locomotion | Gesture Recognition | Gesture Recognition |
---|---|---|---|---|
(No Null Class) | (No Null Class) | |||
OPPORTUNITY Challenge Submissions | ||||
LDA | 0.64 | 0.59 | 0.25 | 0.69 |
QDA | 0.77 | 0.68 | 0.24 | 0.53 |
NCC | 0.60 | 0.54 | 0.19 | 0.51 |
1 NN | 0.85 | 0.84 | 0.55 | 0.87 |
3 NN | 0.85 | 0.85 | 0.56 | 0.85 |
UP | 0.84 | 0.60 | 0.22 | 0.64 |
NStar | 0.86 | 0.61 | 0.65 | 0.84 |
SStar | 0.86 | 0.64 | 0.70 | 0.86 |
CStar | 0.87 | 0.63 | 0.77 | 0.88 |
NU | 0.75 | 0.53 | ||
MU | 0.87 | 0.62 | ||
Deep architectures | ||||
CNN [17] | 0.851 | |||
Baseline CNN | 0.912 | 0.878 | 0.783 | 0.883 |
DeepConvLSTM | 0.930 | 0.895 | 0.866 | 0.915 |
Predicted Gesture | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Null | Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Dishwasher | Close Dishwasher | Open Drawer 1 | Close Drawer 1 | Open Drawer 2 | Close Drawer 2 | Open Drawer 3 | Close Drawer 3 | Clean Table | Drink from Cup | Toggle Switch | ||
Actual Gesture | Null | 13,532 | 16 | 5 | 15 | 13 | 54 | 35 | 35 | 72 | 10 | 13 | 5 | 4 | 22 | 39 | 7 | 158 | 29 |
Open Door 1 | 10 | 76 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Door 2 | 7 | 0 | 155 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 1 | 8 | 15 | 0 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 2 | 10 | 0 | 0 | 0 | 130 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Fridge | 111 | 0 | 0 | 0 | 0 | 253 | 22 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
Close Fridge | 41 | 0 | 0 | 0 | 0 | 19 | 210 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Dishwasher | 61 | 0 | 0 | 0 | 0 | 6 | 0 | 99 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Dishwasher | 43 | 0 | 0 | 0 | 0 | 2 | 0 | 10 | 79 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
Open Drawer 1 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 38 | 6 | 2 | 1 | 3 | 1 | 0 | 0 | 1 | |
Close Drawer 1 | 20 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 8 | 46 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Drawer 2 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 18 | 2 | 29 | 6 | 1 | 0 | 0 | 0 | 1 | |
Close Drawer 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 5 | 4 | 25 | 0 | 3 | 0 | 0 | 0 | |
Open Drawer 3 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 88 | 3 | 0 | 0 | 0 | |
Close Drawer 3 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 9 | 5 | 80 | 0 | 0 | 0 | |
Clean Table | 88 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 81 | 2 | 0 | |
Drink from Cup | 143 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 397 | 0 | |
Toggle Switch | 57 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 122 |
Predicted Gesture | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Null | Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Dishwasher | Close Dishwasher | Open Drawer 1 | Close Drawer 1 | Open Drawer 2 | Close Drawer 2 | Open Drawer 3 | Close Drawer 3 | Clean Table | Drink from Cup | Toggle Switch | ||
Actual Gesture | Null | 13,752 | 5 | 8 | 6 | 5 | 39 | 18 | 14 | 29 | 2 | 0 | 1 | 1 | 40 | 20 | 2 | 114 | 8 |
Open Door 1 | 17 | 51 | 0 | 28 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Door 2 | 15 | 0 | 111 | 0 | 38 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 1 | 10 | 22 | 0 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 2 | 9 | 0 | 7 | 0 | 124 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Fridge | 130 | 0 | 0 | 0 | 0 | 220 | 34 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Fridge | 49 | 0 | 0 | 0 | 0 | 76 | 146 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Dishwasher | 108 | 0 | 0 | 0 | 0 | 4 | 0 | 45 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Dishwasher | 75 | 0 | 0 | 0 | 0 | 4 | 0 | 30 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Drawer 1 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | |
Close Drawer 1 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Drawer 2 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 1 | 18 | 1 | 6 | 0 | 0 | 0 | 0 | |
Close Drawer 2 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 13 | 5 | 9 | 0 | 0 | 0 | 0 | |
Open Drawer 3 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 56 | 28 | 0 | 0 | 0 | |
Close Drawer 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 51 | 42 | 0 | 0 | 0 | |
Clean Table | 98 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 73 | 0 | 0 | |
Drink from Cup | 194 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 349 | 0 | |
Toggle Switch | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 82 |
Predicted Gesture | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Dishwasher | Close Dishwasher | Open Drawer 1 | Close Drawer 1 | Open Drawer 2 | Close Drawer 2 | Open Drawer 3 | Close Drawer 3 | Clean Table | Drink from Cup | Toggle Switch | ||
Actual Gesture | Open Door 1 | 81 | 0 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Open Door 2 | 0 | 149 | 1 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 1 | 15 | 0 | 73 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Close Door 2 | 0 | 2 | 1 | 124 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Fridge | 1 | 1 | 0 | 0 | 342 | 29 | 11 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | |
Close Fridge | 0 | 0 | 0 | 0 | 10 | 258 | 0 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
Open Dishwasher | 0 | 0 | 0 | 0 | 4 | 0 | 151 | 5 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 2 | |
Close Dishwasher | 0 | 0 | 0 | 0 | 4 | 0 | 15 | 107 | 0 | 0 | 0 | 2 | 0 | 6 | 0 | 1 | 2 | |
Open Drawer 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 36 | 17 | 0 | 1 | 0 | 2 | 0 | 0 | 8 | |
Close Drawer 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 5 | 66 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
Open Drawer 2 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 13 | 8 | 35 | 3 | 1 | 0 | 0 | 0 | 5 | |
Close Drawer 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 4 | 2 | 10 | 3 | 26 | 1 | 0 | 0 | 0 | 0 | |
Open Drawer 3 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 2 | 0 | 7 | 4 | 87 | 7 | 0 | 0 | 1 | |
Close Drawer 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 27 | 7 | 66 | 0 | 0 | 0 | |
Clean Table | 0 | 0 | 0 | 0 | 2 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 147 | 17 | 0 | |
Drink from Cup | 1 | 1 | 2 | 1 | 0 | 0 | 24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 515 | 0 | |
Toggle Switch | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 161 |
Predicted Gesture | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Dishwasher | Close Dishwasher | Open Drawer 1 | Close Drawer 1 | Open Drawer 2 | Close Drawer 2 | Open Drawer 3 | Close Drawer 3 | Clean Table | Drink from Cup | Toggle Switch | ||
Actual Gesture | Open Door 1 | 73 | 0 | 23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
Open Door 2 | 0 | 111 | 0 | 43 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 4 | |
Close Door 1 | 22 | 0 | 63 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
Close Door 2 | 2 | 4 | 1 | 118 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | |
Open Fridge | 1 | 1 | 0 | 0 | 304 | 59 | 17 | 1 | 4 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 2 | |
Close Fridge | 0 | 0 | 0 | 0 | 20 | 243 | 5 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
Open Dishwasher | 0 | 0 | 0 | 0 | 15 | 1 | 121 | 11 | 5 | 0 | 0 | 0 | 6 | 4 | 0 | 1 | 3 | |
Close Dishwasher | 0 | 0 | 0 | 0 | 7 | 11 | 19 | 90 | 1 | 0 | 3 | 1 | 0 | 4 | 1 | 0 | 0 | |
Open Drawer 1 | 0 | 0 | 0 | 0 | 3 | 0 | 2 | 3 | 35 | 12 | 6 | 0 | 1 | 1 | 0 | 0 | 4 | |
Close Drawer 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 16 | 51 | 3 | 0 | 0 | 0 | 0 | 2 | 0 | |
Open Drawer 2 | 0 | 0 | 0 | 0 | 4 | 0 | 2 | 0 | 19 | 3 | 31 | 5 | 2 | 0 | 0 | 0 | 1 | |
Close Drawer 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 4 | 1 | 15 | 18 | 1 | 6 | 0 | 0 | 0 | |
Open Drawer 3 | 0 | 0 | 0 | 0 | 1 | 0 | 6 | 1 | 3 | 0 | 9 | 0 | 62 | 29 | 1 | 0 | 1 | |
Close Drawer 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 14 | 84 | 0 | 0 | 0 | |
Clean Table | 1 | 0 | 2 | 0 | 9 | 11 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 134 | 12 | 0 | |
Drink from Cup | 3 | 1 | 4 | 1 | 4 | 6 | 9 | 14 | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 499 | 0 | |
Toggle Switch | 0 | 1 | 1 | 0 | 0 | 4 | 0 | 0 | 15 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 149 |
5.2. Multimodal Fusion Analysis
Accelerometers | Gyroscopes | Accelerometers | Accelerometers | Opportunity | |
---|---|---|---|---|---|
+ Gyroscopes | + Gyroscopes | Sensors Set | |||
+ Magnetic | |||||
# of sensors channels | 15 | 15 | 30 | 45 | 113 |
score | 0.689 | 0.611 | 0.745 | 0.839 | 0.864 |
5.3. Hyperparameters Evaluation
5.4. Discussion
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Rashidi, P.; Cook, D.J. The resident in the loop: Adapting the smart home to the user. IEEE Trans. Syst. Man. Cybern. J. Part A 2009, 39, 949–959. [Google Scholar] [CrossRef]
- Patel, S.; Park, H.; Bonato, P.; Chan, L.; Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. NeuroEng. Rehabil. 2012, 9. [Google Scholar] [CrossRef] [PubMed]
- Avci, A.; Bosch, S.; Marin-Perianu, M.; Marin-Perianu, R.; Havinga, P. Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey. In Proceedings of the 23rd International Conference on Architecture of Computing Systems (ARCS), Hannover, Germany, 22–23 Febuary 2010; pp. 1–10.
- Mazilu, S.; Blanke, U.; Hardegger, M.; Tröster, G.; Gazit, E.; Hausdorff, J.M. GaitAssist: A Daily-Life Support and Training System for Parkinson’s Disease Patients with Freezing of Gait. In Proceedings of the ACM Conference on Human Factors in Computing Systems (SIGCHI), Toronto, ON, Canada, 26 April–1 May 2014.
- Kranz, M.; Möller, A.; Hammerla, N.; Diewald, S.; Plötz, T.; Olivier, P.; Roalter, L. The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices. Perv. Mob. Comput. 2013, 9, 203–215. [Google Scholar] [CrossRef]
- Stiefmeier, T.; Roggen, D.; Ogris, G.; Lukowicz, P.; Tröster, G. Wearable Activity Tracking in Car Manufacturing. IEEE Perv. Comput. Mag. 2008, 7, 42–50. [Google Scholar] [CrossRef]
- Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.; Millán, J.; Roggen, D.; Tröster, G. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 2013, 34, 2033–2042. [Google Scholar] [CrossRef]
- Bulling, A.; Blanke, U.; Schiele, B. A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors. ACM Comput. Surv. 2014, 46, 1–33. [Google Scholar] [CrossRef]
- Roggen, D.; Cuspinera, L.P.; Pombo, G.; Ali, F.; Nguyen-Dinh, L. Limited-Memory Warping LCSS for Real-Time Low-Power Pattern Recognition in Wireless Nodes. In Proceedings of the 12th European Conference Wireless Sensor Networks (EWSN), Porto, Portugal, 9–11 February 2015; pp. 151–167.
- Ordonez, F.J.; Englebienne, G.; de Toledo, P.; van Kasteren, T.; Sanchis, A.; Krose, B. In-Home Activity Recognition: Bayesian Inference for Hidden Markov Models. Perv. Comput. IEEE 2014, 13, 67–75. [Google Scholar] [CrossRef]
- Preece, S.J.; Goulermas, J.Y.; Kenney, L.P.J.; Howard, D.; Meijer, K.; Crompton, R. Activity identification using body-mounted sensors: A review of classification techniques. Physiol. Meas. 2009, 30, 21–27. [Google Scholar] [CrossRef] [PubMed]
- Figo, D.; Diniz, P.C.; Ferreira, D.R.; Cardoso, J.M.P. Preprocessing techniques for context recognition from accelerometer data. Perv. Mob. Comput. 2010, 14, 645–662. [Google Scholar] [CrossRef]
- Lee, H.; Grosse, R.; Ranganath, R.; Ng, A.Y. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; pp. 609–616.
- Lee, H.; Pham, P.; Largman, Y.; Ng, A. Unsupervised feature learning for audio classification using convolutional deep belief networks. In Proceedings of the 22th Annual Conference on Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 8–10 December 2008; pp. 1096–1104.
- LeCun, Y.; Bengio, Y. Chapter Convolutional Networks for Images, Speech, and Time Series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1998; pp. 255–258. [Google Scholar]
- Sainath, T.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks. In Proceedings of the 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 19–24 April 2015; pp. 4580–4584.
- Yang, J.B.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep Convolutional Neural Networks On Multichannel Time Series For Human Activity Recognition. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015; pp. 3995–4001.
- Siegelmann, H.T.; Sontag, E.D. Turing computability with neural nets. Appl. Math. Lett. 1991, 4, 77–80. [Google Scholar] [CrossRef]
- Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2003, 3, 115–143. [Google Scholar]
- Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceeedings of the 38th International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, USA, 26–31 May 2013; pp. 6645–6649.
- Palaz, D.; Magimai.-Doss, M.; Collobert, R. Analysis of CNN-based Speech Recognition System using Raw Speech as Input. In Proceedings of the 16th Annual Conference of International Speech Communication Association (Interspeech), Dresden, Germany, 6–10 September 2015; pp. 11–15.
- Pigou, L.; Oord, A.V.D.; Dieleman, S.; van Herreweghe, M.; Dambre, J. Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video. arXiv Preprint 2015. arXiv:1506.01911. [Google Scholar]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional Neural Networks for human activity recognition using mobile sensors. In Proceedings of the 6th IEEE International Conference on Mobile Computing, Applications and Services (MobiCASE), Austin, TX, USA, 6–7 November 2014; pp. 197–205.
- Oord, A.V.D.; Dieleman, S.; Schrauwen, B. Deep content-based music recommendation. In Proeedings of the Neural Information Processing Systems, Lake Tahoe, NE, USA, 5–10 December 2013; pp. 2643–2651.
- Sainath, T.N.; Kingsbury, B.; Saon, G.; Soltau, H.; Mohamed, A.R.; Dahl, G.; Ramabhadran, B. Deep convolutional neural networks for large-scale speech tasks. Neural Netw. 2015, 64, 39–48. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105.
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. Cornell Univ. Lib. 2013. arXiv:1312.6229. [Google Scholar]
- Toshev, A.; Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland, 6–12 September 2014; pp. 1653–1660.
- Deng, L.; Platt, J.C. Ensemble deep learning for speech recognition. In Proceedings of the 15th Annual Conference of International Speech Communication Association (Interspeech), Singapore, 14–18 September 2014; pp. 1915–1919.
- Ng, J.Y.H.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. Cornell Univ. Lab. 2015. arXiv:1503.08909. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Plötz, T.; Hammerla, N.Y.; Olivier, P. Feature Learning for Activity Recognition in Ubiquitous Computing. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 16–22 July 2011; pp. 1729–1734.
- Karpathy, A.; Johnson, J.; Li, F.F. Visualizing and understanding recurrent networks. Cornell Univ. Lab. 2015. arXiv:1506.02078. [Google Scholar]
- Dieleman, S.; Schlüter, J.; Raffel, C.; Olson, E.; Sønderby, S.K.; Nouri, D.; Maturana, D.; Thoma, M.; Battenberg, E.; Kelly, J.; et al. Lasagne: First Release; Zenodo: Geneva, Switzerland, 2015. [Google Scholar]
- Dauphin, Y.N.; de Vries, H.; Chung, J.; Bengio, Y. RMSProp and equilibrated adaptive learning rates for non-convex optimization. arXiv 2015. arXiv:1502.04390. [Google Scholar]
- Roggen, D.; Calatroni, A.; Rossi, M.; Holleczek, T.; Förster, K.; Tröster, G.; Lukowicz, P.; Bannach, D.; Pirkl, G.; Ferscha, A.; et al. Collecting complex activity data sets in highly rich networked sensor environments. In Proceedings of the 7th IEEE International Conference on Networked Sensing Systems (INSS), Kassel, Germany, 15–18 June 2010; pp. 233–240.
- Reiss, A.; Stricker, D. Introducing a New Benchmarked Dataset for Activity Monitoring. In Proceedings of the 16th International Symposium on Wearable Computers (ISWC), Newcastle, UK, 18–22 June 2012; pp. 108–109.
- Zappi, P.; Lombriser, C.; Farella, E.; Roggen, D.; Benini, L.; Tröster, G. Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. In Proceedings of the 5th European Conference on Wireless Sensor Networks (EWSN), Bologna, Italy, 30 January–1 February 2008; pp. 17–33.
- Banos, O.; Garcia, R.; Holgado, J.A.; Damas, M.; Pomares, H.; Rojas, I.; Saez, A.; Villalonga, C. mHealthDroid: a novel framework for agile development of mobile health applications. In Proceedings of the 6th International Work-conference on Ambient Assisted Living an Active Ageing, Belfast, UK, 2–5 December 2014; pp. 91–98.
- Gordon, D.; Czerny, J.; Beigl, M. Activity recognition for creatures of habit. Pers. Ubiquitous Comput. 2014, 18, 205–221. [Google Scholar] [CrossRef]
- Opportunity Dataset. 2012. Available online: https://archive.ics.uci.edu/ml/datasets/OPPORTUNITY+Activity+Recognition (accessed on 19 November 2015).
- Skoda Dataset. 2008. Available online: http://www.ife.ee.ethz.ch/research/groups/Dataset (accessed on 19 November 2015).
- Alsheikh, M.A.; Selim, A.; Niyato, D.; Doyle, L.; Lin, S.; Tan, H.P. Deep Activity Recognition Models with Triaxial Accelerometers. arXiv preprint 2015. arXiv:1511.04664. [Google Scholar]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9.
- Berchtold, M.; Budde, M.; Gordon, D.; Schmidtke, H.R.; Beigl, M. Actiserv: Activity recognition service for mobile phones. In Proceedings of the International Symposium on Wearable Computers (ISWC), Seoul, Korea, 10–13 October 2010; pp. 1–8.
- Cheng, K.T.; Wang, Y.C. Using mobile GPU for general-purpose computing: A case study of face recognition on smartphones. In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 25–28 April 2011; pp. 1–4.
- Welbourne, E.; Tapia, E.M. CrowdSignals: A call to crowdfund the community’s largest mobile dataset. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM, Seattle, WA, USA, 13–17 September 2014; pp. 873–877.
- Ordonez, F.J.; Roggen, D. DeepConvLSTM. Available online: https://github.com/sussexwearlab/DeepConvLSTM (accessed on 23 December 2015).
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. https://doi.org/10.3390/s16010115
Ordóñez FJ, Roggen D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors. 2016; 16(1):115. https://doi.org/10.3390/s16010115
Chicago/Turabian StyleOrdóñez, Francisco Javier, and Daniel Roggen. 2016. "Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition" Sensors 16, no. 1: 115. https://doi.org/10.3390/s16010115
APA StyleOrdóñez, F. J., & Roggen, D. (2016). Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors, 16(1), 115. https://doi.org/10.3390/s16010115