Online Program

Return to main conference page

All Times EDT

Friday, June 5
Practice and Applications
Practice and Applications Posters, Part 1
Fri, Jun 5, 10:00 AM - 1:00 PM
TBD
 

A Novel Application of Time Series Classification Using the Continuous Wavelet Transform and Convolutional Neural Networks on Smartphone Sensor Data (308366)

*William Robert Nadolski, SAS Institute 
Ahmet Taspinar, ING Group 

Keywords: Time Series Classification, Deep Learning, Continuous Wavelet Transformation, Image Scalograms, Human Activity Recognition, Smartphone Sensor Data, Convolutional Neural Network, Frequency Analysis

ABSTRACT:

An end-to-end deep learning approach for accurate classification of multivariate time series is provided along with an example of its application on the Human Activity Recognition (HAR) dataset. Specifically, the approach utilizes the Continuous Wavelet Transform (CWT) in order to convert each sensor recording from its time domain into a corresponding time-scale (frequency) image. These images are then passed to a Convolutional Neural Network (CNN) and used to predict one of six different human activities being performed by the participant. Maximum validation accuracy of 94.29% is achieved using a 70/30 training/validation split. The benefits of this approach include not only higher classification accuracy as compared to traditional machine learning approaches (based upon summary statistics), but also eliminates the need for handcrafted feature engineering based upon domain expertise.

INTRODUCTION:

Classification of multivariate time-series data has historically posed a significant challenge to the data science and machine learning community. This is primarily due to the extremely high dimensionality associated with multivariate time series data and the frequent need to intelligently reduce the dimensionality of the data into a more manageable format allowing for accurate classification while also maintaining reasonable computational complexity. This problem is further compounded by the increasing sampling rate at which data is being recorded by machine sensors, and the increasing number of available sensors, as the Industrial Internet of Things (IIoT) spreads to more areas of personal and commercial life.

There are commonly two approaches used to handle such challenges. The first relies upon extraction of basic statistical aggregate values (e.g. mean, median, min, max, standard deviation, variance, skewness, kurtosis, autocorrelation, etc.) to be used as input features for machine learning models, which can characterize and model the data of interest. While computationally inexpensive, these features have a tendency to oversimplify the dynamical nature of complex waveform behavior and result in less than optimal classification accuracy. The alternative approach has typically been to rely on domain subject experts to define handcrafted features of interest which are then extracted from the relevant time series and used as inputs for modelling. When intelligently designed, these handcrafted features tend to produce better classification outcomes but at the expense of requiring significant domain-specific knowledge and time spent manually developing the required logic.

Deep learning approaches promise to alleviate this trade-off and allow for automatic extraction of informative and highly discriminative features without the need for domain expertise or handcrafted feature engineering. Despite the success of deep learning approaches in a wide variety of settings, researchers (Fawaz et. al, 2018) have noticed that there have been relatively few applications of these approaches to the task of multivariate time series classification.

The purpose of this paper is to lay out an approach for implementing a deep learning-based method of multivariate time series classification and reviewing its performance as applied to the Human Activity Recognition (HAR) dataset. Specifically, the Continuous Wavelet Transform (CWT) is first applied to the time series sensor data in order to convert the sensor data from the time domain to the time-scale (frequency) domain. The resulting set of scalogram images for each instance of each activity are then passed to a Convolutional Neural Network (CNN) and trained to classify the activity being performed by each participant.

DATA:

The data used is the Human Activity Recognition dataset originally prepared by Davide Anguita et al. (2013) and hosted by the UC Irvine Machine Learning Repository (2012). It consists of triaxial accelerometer, gyroscope, and total acceleration sensor recordings captured via a waist-worn smartphone (Samsung Galaxy) while the participant was performing one of six different daily activities: walking, standing, sitting, laying, walking upstairs, and walking downstairs.

The data is recorded with a frequency of 50 Hz and segmented into 10,299 signals, where each signal consists of 128 samples corresponding to a duration of 2.56 seconds. The data has been split into training and validation sets with a ratio of 70/30. The data has also been standardized and pre-processed to remove noise using a Butterworth low-pass filter with a 0.3 Hz cutoff. The resulting dataset contains 10,299 labelled time series signals with a length of 128 samples and with 9 components each. The 9 different components consist of: body_acc_x, body_acc_y, body_accel_z, body_gyro_x, body_gyro_y, body_gyro_z, tot_acc_x, tot_acc_y, tot_acc_z.

METHODS:

Continuous Wavelet Transformation Details:

It is reasonable to expect that the dynamic waveform behavior associated with human acceleration and gyroscopic motion can be meaningfully converted from the time domain into the frequency domain. Numerous techniques exist for performing such a transformation including, but not limited to, the Discrete Fourier Transform (DFT), Short-Time Fourier Transform (STFT), and Continuous Wavelet Transform (CWT). The DFT provides detailed information about the frequency-domain but in the process loses all information regarding the time domain. The STFT method provides information about the time and frequency domain but is limited by the uncertainty principle in choosing the optimal resolution for each.

We believe the Continuous Wavelet Transform (CWT) to be a superior approach since it provides detailed information about the frequency domain as well as the time domain by utilizing different scales and selecting the optimal scale for each. A detailed explanation of the CWT is outside the scope of this paper, but we refer the reader to literature for more information (Blatter, 2002).

The method herein proposed involves utilizing the Haar-based CWT to convert each time series signal component into a two-dimensional representation of the power of the given signal across both time and frequency. In this way, each time series signal with n samples results in an (n x m) matrix which can be visualized as a scalogram. Here the size of m depends on the chosen scale, which in our case is 128, resulting in a square matrix.

To support processing within the SAS DLPy environment1 the nine CWT scalograms for each activity instance must be stitched together in order to produce a single 3n x 3n image which allows a single image to represent the time-frequency behavior of all signal components. Of utmost importance is ensuring consistency in the relative position of each signal component in this stitched image. Alternatively, the nine scalogram matrices can instead be stacked into a nine-dimensional tensor if using Python for execution.

Convolutional Neural Network Details:

By converting the time series signals into stitched image scalograms, we have essentially recast the task into an image classification problem which can be adequately handled by a convolutional neural network (CNN). In order to reduce the computational processing time and the size of the CNN weights, each 3n x 3n stitched image was first down sampled and resized to result in a final 127x127 pixel input image. These final images were then passed to a modified LeNet-5 CNN featuring successive convolution and max pooling layers followed by a fully connected layer with softmax activation function.

RESULTS:

A CNN model was trained for 100 epochs using the SAS DLPy coding environment and achieved a maximum validation accuracy of 94.29%. For comparison, a naïve benchmark utilizing gradient boosting in combination with traditional aggregated summary statistics (min, max, mean, standard deviation, skewness, and kurtosis) of each signal component was only able to achieve a validation accuracy of 85.95%. As a result, the CWT & CNN approach evaluated in this paper was able to provide an increase of 8.34% without requiring the extraction of any domain-specific handcrafted signal features.

A normalized confusion matrix was also produced indicating the model struggled most with differentiating between the sitting, standing, and laying activities and had very little trouble discriminating between the other three activities (walking, walking upstairs, walking downstairs).

CONCLUSION:

This paper has demonstrated a novel method of performing multivariate time series classification. The proposed approach is able to obtain better validation accuracy as compared to traditional machine learning approaches without requiring the creation of handcrafted domain-specific features. It is our hope that the method outlined in this paper can provide statisticians, data scientists, and machine learning practitioners with another valuable approach for tackling multivariate time series classification tasks using deep learning.

Full code for replicating these results within the SAS DLPy coding environment can be found on William’s GitHub repository (2019). Code for implementing similar analysis exclusively within the Python environment can be found on Ahmet’s GitHub repository along with an informative web post with thorough details related to all aspects of this proposed methodology (2018).