Open AccessArticle

LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors^†

Sérgio D. Correia

^1,2,3,*

Pedro M. Roque

¹ and

João P. Matos-Carvalho

^2,3

Technology Department, Portalegre Polytechnic University, Campus Politécnico n. 10, 7300-555 Portalegre, Portugal

Center of Technology and Systems (UNINOVA-CTS) and Associated Lab of Intelligent Systems (LASI), 2829-516 Caparica, Portugal

COPELABS, Universidade Lusófona—Centro Universitário de Lisboa, Campo Grande 376, 1749-024 Lisboa, Portugal

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in GoodIT’24: Proceedings of the 2024 International Conference on Information Technology for Social Good, Bremen, Germany, 4–6 September 2024.

Symmetry 2024, 16(10), 1296; https://doi.org/10.3390/sym16101296

Submission received: 27 August 2024 / Revised: 13 September 2024 / Accepted: 20 September 2024 / Published: 2 October 2024

(This article belongs to the Section Computer)

Download

Browse Figures

Figure 1
LSTM cell internal structure. "> Figure 2
Neural network topology for the fall-detection problem. "> Figure 3
Memory occupancy for 2 LSTM layers, 1 FC layer, 8 bytes representation. "> Figure 4
Memory occupancy for 2 LSTM layers, 1 FC layer, and 4 bytes representation. "> Figure 5
Deep network topology for the fall-detection problem. "> Figure 6
Accuracy and loss of the validation data samples. "> Figure 7
Accuracyvs. precision vs. recall. "> Figure 8
Confusionmatrix for 1 LSTM layer with cell size of 100. "> Figure 9
Accuracyand memory occupancy of 1 LSTM layer for different microcontrollers. "> Figure 10
Workflow of the data types at the inference stage. "> Figure 11
LSTM and FC weights and bias histograms vs. equivalent normal distribution. "> Figure 12
Distribution of the LSTM bias between the different gates. "> Figure 13
Quantization, inference, and simulation methodology. "> Figure 14
Confusion matrices of the quantized networks with state-of-the-art uniform quantization. "> Figure 15
Confusion matrix of the quantized networks when applying the proposed Gate Disclosure. "> Figure 16
Confusion matrix of the quantized networks when applying the proposed Gate Disclosure. "> Figure 17
Accuracy and memory occupancy of 1 LSTM layer for different microcontrollers and different quantization representations. ">

Versions Notes

Abstract

In this paper, the concept of symmetry is used to design the efficient inference of a fall-detection algorithm for elderly people on embedded processors—that is, there is a symmetric relation between the model’s structure and the memory footprint on the embedded processor. Artificial intelligence (AI) and, more particularly, Long Short-Term Memory (LSTM) neural networks are commonly used in the detection of falls in the elderly population based on acceleration measures. Nevertheless, embedded systems that may be utilized on wearable or wireless sensor networks have a hurdle due to the customarily massive dimensions of those networks. Because of this, the algorithms’ most popular implementation relies on edge or cloud computing, which raises privacy concerns and presents challenges since a lot of data need to be sent via a communication channel. The current work proposes a memory occupancy model for LSTM-type networks to pave the way to more efficient embedded implementations. Also, it offers a sensitivity analysis of the network hyper-parameters through a grid search procedure to refine the LSTM topology network under scrutiny. Lastly, it proposes a new methodology that acts over the quantization granularity for the embedded AI implementation on wearable devices. The extensive simulation results demonstrate the effectiveness and feasibility of the proposed methodology. For the embedded implementation of the LSTM for the fall-detection problem on a wearable platform, one can see that an STM8L low-power processor could support a 40-hidden-cell LSTM network with an accuracy of

96.52 %

Keywords:

artificial intelligence; embedded computing; quantization; sensors networks; soft computing; wearable sensors

1. Introduction

According to the World Health Organization (WHO) [1], about

30 %

of adults over 65 have falls annually, indicating that over 300,000 individuals may die from these types of incidents annually. Based on the assessments of fall victims [2], falls were identified as one of the primary causes of injuries in this age group, with at least one more fall occurring in the next six months for

22.6 %

of the elderly who had fallen. The injuries caused by this sort of incident might lead to major trauma and fractures, among other things, in addition to psychological issues and the potential for future psychological stress [3]. Accidents after a fall are often potentially fatal if the person is not found and saved in time [4]. From the discussion above, it is clear that research on fall detection methods is crucial to lowering the number of accidents and their health consequences among the elderly. The present work is based on a technological perspective, attempting to detect falls using wearable equipment and addressing any medical, geriatric, or sociological issues. The design of an information and communication technology model would complete a fall-detection system, complementing the current study [5].

Advances in sensor and signal processing technologies have paved the way for developing autonomous, body-based systems for monitoring activities and detecting falls. Sensor data are typically acquired and processed in a central unit to infer information about the individual’s posture. These platforms consist of some different types of sensors at the body level (accelerometers, smartphones, or smartwatches) or environmental (Passive Infrared Sensors or PIRs, microphones, or digital cameras) [6,7,8]. While the former are associated with monitoring the physical body, the latter cannot accompany an individual on trips outside their usual residential environment and can potentially raise privacy concerns; thus, they are not considered in this work. This first group is a field where artificial intelligence has been applied with some success [9]. We found in the literature deep convolutional network structures [10], recursive neural networks [11], or vector support machines [12] applied to the fall-detection problem. On the one hand, the reliability in detecting falls assumes significant values with the aid of AI methods, that is, high effective detection percentages in several works published in the literature. On the other hand, they are obtained with simulation data with the profiles of specific falls and can present high false-positive rates (alarm triggering without a risk of a fall).

The current work is then framed in the study and the development of algorithms for detecting falls in elderly populations based on AI and in-body measures. Technically, it is based on the readings of certain physical quantities affecting the individual’s state, which stands out for its innovation regarding where the data are intended to be processed. In addition to acquiring the usual in-body acceleration magnitudes, the computational platform, which is just a wearable device, also takes care of the inference of a pre-trained neural network, as the opposite of central or edge computations [13]. As such, the system will only transmit in the case of a predicted fall. The main challenge is regarded to be energy management since the computational platform for wearables is typically of low processing power and limited memory and should work with shallow duty cycles. In contrast, classical neural network inference usually relies on the edge or cloud computing of the algorithms, with the rise of privacy issues and a large amount of data to be transferred through some communication channels [14].

The overall methodology comprises the following: (i) The proposal of a memory occupancy model for evaluating the memory required for implementing the onsite inference of a Long Short-Term Memory (LSTM)-type neural network; (ii) A sensitivity analysis of the network hyper-parameters through a grid search procedure to refine the network topology; (iii) The proposal of a new methodology to perform the quantization of the network by reducing the memory occupancy of the hyper-parameter values. The concept of symmetry is also used to select the network structure to deploy on the embedded processor, that is, there is a symmetric relation between the model’s structure and the memory footprint on the embedded processor, as the number of parameters and the memory size are mirrored. Further to the novelty of this general approach, it is shown that the proposed method for the quantization process presents improved results concerning the current state-of-the-art.

While the Gate Disclosure technique was initially proposed in [15], the current manuscript greatly extends the methodology, considering a complete workflow that takes into account the embedded platform where the technique is to be implemented and its resource limitations. In addition to the overall methodology, which is original, two major novelties can be stated:

The memory modeling of the LSTM network, cross-validated with the network accuracy, as a tool to tune the LSTM size;
Quantizing the network with almost no degradation in accuracy so that it can be stored and deployed in an embedded environment by a low-power, low-complexity microcontroller.

The remainder of the work is organized as follows. Section 2 revises the current state-of-the-art with regard to applying AI to the fall-detection problem and quantizing neural networks. Section 3 provides and analyzes a memory occupancy model for assessing the network memory footprint. Section 4 provides a sensitivity analysis of the network hyper-parameters through a grid search procedure, and Section 5 describes in detail the methodology approach for the quantization. Section 6 provides a description and a performance analysis of the results. Finally, Section 7 summarizes the main findings and addresses future work directions.

2. Related Work

In the last several years, deep learning methods have replaced threshold or state-based algorithms when it comes to fall detection among the senior population [16,17]. Since the issue dynamics are unique to each individual, gathering data from the environment is not the best option and might limit freedom and portability, as is stated in Section 1. Nevertheless, approaches with excellent accuracy are available in the computer vision literature [18]. In the context of hardware acquisition for angular velocity and/or acceleration, a wristwatch or smartphone is typically the first device that comes to mind. Nevertheless, when it comes to elderly individuals, these devices do not have the same widespread adoption and penetration potential as they have for younger [19,20] users. Despite this clear drawback, the study of the surrounding settings is still quite active [21]. While there are many techniques for learning-based fall detection in the literature, recursive neural network methods typically have an advantage in that they are more flexible to learn a high number of different features [22,23]. These techniques include the hidden Markov model [24], decision trees [25], K-Nearest Neighbor [26], and Support Vector Machine [27]. From a different perspective, one method that finds the highest accuracy when applied to the fall detection problem is the use of Attention-based Neural Networks. Although the approach reports accuracy as high as

99.87 %

, the intricacy of the network makes its inference impractical for use on a low-power embedded device [28].

Obtaining a simplification that makes it possible for the models to be inferred more quickly is essential, for the reasons mentioned above as well as the additional difficulties brought on by the growing complexity of machine learning algorithms. Quantization, the process of converting continuous real values to a smaller collection of discrete and finite data points, is the most prominent technique to reduce model’s complexity. Since the bit width of the information is closely correlated with the memory occupancy, fewer computing operations must be performed, resulting in a reduction in the Central Processing Unit (CPU) power needs, which start from 32-bit floating points after a training stage. The Binary Neural Network, which operates with only 1-bit weights and activation [29], is the ultimate example of quantization. In a neural network, the quantization process can be added either before or while training; the present work concentrates on quantization after training, which is known as Post-Training Quantization (PTQ). Using quantized hyper-parameters for training would entail Quantization-Aware Training (QAT), often utilizing fixed-point notation or half- or quarter-precision [30]. Because discretization by rounding is not continuous and, hence, has a mainly zero derivative, quantization presents a hurdle. Consequently, it is not possible to use the standard gradient-based optimization technique. Uniform quantization, in which a quantization function simply maps the input values, is the most widely used and simple method. Non-uniform quantization, on the other hand, maps the quantization steps (thresholds) as intervals with width and bins determined by a mapping function. By allocating bits and discretizing the range of parameters in an irregular manner, non-uniform quantization is often more effective in capturing the information contained in the input values. Nonetheless, implementing non-uniform quantization techniques effectively on common computer gear, such as embedded microcontrollers, is usually challenging. Because of its ease of use and effective translation to low-level coding, uniform quantization replaces its opponent [31].

The absence of gradients has been addressed in the scientific literature in a number of ways [32]. In order to enhance the quantizer with respect to uniform schemes, more recent work formulates non-uniform quantization as an optimization problem [33,34,35]. Additionally related, Stochastic Quantization (SQ) approaches use quantization to make use of stochastic characteristics [36]. While deterministic methods cannot accurately describe inputs as well as optimization and SQ methods, quantizing data depends heavily on the stochastic features of the data and computational complexity rises, which increases the number of inverse operations that an embedded microcontroller can perform [22].

Keeping in mind the previous discussion, the following study applies a uniform quantization step to LSTM network architectures. The metrics employed and provided to the LSTM comprise 3-axis accelerations obtained using an Inertial Measurement Unit (IMU), which is thought to be worn on the body as a bracelet [37]. To the authors’ knowledge, although the need for wearable use is known for these types of applications, there is no work proposing techniques that correlate AI (namely, LSTM) networks with low-power embedded processors. The methodology described and analyzed here shows a new way of deploying LSTM networks to embedded devices.

3. Memory Occupancy Model

Managing the data storage layout becomes critical when considering the embedded implementation of an LSTM network on some microcontroller. Firstly, because the storage space is limited, whenever the network structure is saved on non-volatile memory (flash, EEPROM, or other) or volatile memory such as RAM, the usually embedded microcontrollers will feature a few kilobytes or megabytes [38]. Secondly, the increasing size of the networks for gathering more knowledge and the sparsity caused by optimization tools implies higher latency and inference time [39,40]. It is thus highly pertinent to develop strategies for reducing the complexity of neural networks. Pruning is one technique that can hold in network compression. Pruning, the process of deleting parameters from an existing neural network at a post-training stage, could be a solution to minimize memory occupancy. Nonetheless, the sparsity of the matrix operation increases the latency through memory jumps, increasing floating point operations (FLOPS), a metric that is mistakenly neglected. Also, high pruning rates will usually highly affect the network accuracy [41].

Regardless of the problem they address, neural networks become increasingly complex with progressively deeper architectures, implying a high number of parameters (with a consequent high increase in memory) and growing latency. For example, an ESP32 Tensilica Xtensa LX7 dual-core 32-bit microprocessor from Espressif Systems features 512 kb of RAM and 384 kb of ROM. If it would be applied to a computer vision problem with AlexNet (winner of the ImageNet Large-Scale Visual Recognition Challenge in 2012) [42], it would only support

0.1 %

of the network 62, 378, 344 parameters with a 32-bit floating-point representation [14,43].

Let us first consider a standard recurrent neural network for time series classification. An LSTM unit comprises a forget gate, an input gate, an output gate, a candidate for the update stage, and the update stage itself (Figure 1). The cell remembers values over arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell through a cell state and a hidden state. Each of these gates is modeled as a non-linear function of the input signals, the previous hidden state, the cell state of the cell, and some constant value known as bias. The following equations model the mathematical operations:

\begin{matrix} i_{t} & = σ (U_{i} x_{t} + W_{i} h_{t - 1} + b_{i}) & (Input gate) \\ f_{t} & = σ (U_{f} x_{t} + W_{f} h_{t - 1} + b_{f}) & (Forget gate) \\ o_{t} & = σ (U_{o} x_{t} + W_{o} h_{t - 1} + b_{o}) & (Output gate) \\ {\tilde{c}}_{t} & = t a n h (U_{g} x_{t} + W_{g} h_{t - 1} + b_{g}) & (Candidate) \\ c_{t} & = f_{t} \circ c_{t - 1} + i_{t} \circ {\tilde{c}}_{t} & (Memory cell) \\ h_{t} & = o_{t} \circ t a n h (c_{t}) & (Hidden state) \end{matrix}

(1)

where

U = {[U_{i} U_{f} U_{o} U_{g}]}^{'} \in R^{4 \times h \times d}

known as the Input Weights,

W = {[W_{i} W_{f} W_{o} W_{g}]}^{'} \in R^{4 \times h \times h}

known as the Recurrent Weights, and

b = {[b_{i} b_{f} b_{o} b_{g}]}^{'}

\in R^{4 \times h}

known as Bias. The non-linearities are due to the sigmoid function

σ ()

and the hyperbolic tangent

t a n h ()

. The operator ∘ denotes the element-wise product. The cell output corresponds to the last hidden state,

y_{t} = h_{t}

, when

t = h

. The inference process of the LSTM network consists of reading a set of the acceleration (

a_{x}

a_{y}

a_{z}

) from an IMU and supplying it as a

3 \times 1

input vector, identified as

x_{t}

in (1).

The memory footprint required by the LSTM cell corresponds to storing the matrices

U_{4 \times h \times d}

W_{4 \times h \times h}

, and

b_{4 \times h}

. While U is dependent on the input size, in this case, the 3 dimensions of the acceleration and the cell size h, W, and b are only dependent on the cell size h. This parameter, also known as the Unit Size or the Cell Size, is the key feature of an LSTM and controls the amount and complexity of the information memorized by the LSTM cell [44]. The design stage of an LSTM network has to consider determining the network number of units through trial and error, which is performed in this work by applying a grid search methodology.

Given the knowledge of the LSTM cell and its structure, a general case for the network global architecture to apply to the fall-detection problem is considered here where the inputs correspond to the 3-axes accelerations (Figure 2).

The first step to consider is a normalization of the input values. This is archived by acting on the inputs by normalizing the values to prevent vanishing gradient problems [45]. Secondly, one or multiple LSTM layers acting as a feature extraction process are considered, where the number of layers and the number of units in each layer are parameters to be determined. The third block of the overall structure corresponds to one to multiple Fully-Connected (FC) layers to perform the classification, in this case, a “FALL” or an “ADL” (Activity of Daily Living). As such, the output of the FC layers will always consist of two cells, and their input will depend on the number of units in the last LSTM layer. Although, theoretically, only one FC layer would be needed, if the number of LSTM units is high, it may be advantageous for the training process to add more FC layers to make the transition smoother. Lastly, a SoftMax activation layer converts the vector of numbers from the FC output into a vector of probabilities.

The memory occupancy required by the mentioned structure can be estimated using the following expression:

\begin{matrix} M e m_{S i z e} = & \sum_{i = 1}^{L S} (S z_{B i a s, i} \cdot L S T M_{B i a s, i} + S z_{I n W, i} \cdot L S T M_{I n W, i} + \dots \\ + S z_{R e c W, i} \cdot L S T M_{R e c W, i}) + \sum_{i = 1}^{F} (S z_{F C_{W}, i} \cdot F C_{W, i} + \dots \\ + S Z_{F C_{B i a s}, i} \cdot F C_{B i a s, i}) \end{matrix}

(2)

where

S z_{L L S_{B i a s}, i}

S z_{L S_{I n W}, i}

S z_{L S_{R e c W}, i}

S z_{F C_{W}, i}

, and

S Z_{F C_{B i a s}, i}

correspond to the number of bytes used for the quantity representation of layer i, for the LSTM bias, input weights, recurrent weights, and the FC weights and bias, respectively. The sum limits

L S

and F are the number of LSTM and FC layers, respectively. Depending on the sensibility analysis stage, which determines some parameters of the network, these representations might be different; this is one of the critical features of the proposed methodology.

The expression in (2) can be further refined by considering the LSTM layers’ number of units and one FC layer as follows:

\begin{matrix} M e m_{S i z e} = & \sum_{i = 1}^{L S} (S z_{B i a s, i} \cdot 4 \cdot h_{i} + S z_{I n W, i} \cdot 4 \cdot h_{i} \cdot d + S z_{R e c W, i} \cdot 4 \cdot h_{i}^{2}) + \dots \\ + S z_{F C_{W}} \cdot 2 \cdot h_{L S} + S Z_{F C_{B i a s}} \cdot 2 \end{matrix}

(3)

where

h_{i}

is the number of units of the LSTM layer i and d is the number of inputs. Here, the FC layer will directly switch the h layers of the last LSTM to the two classification outputs. The model presented in (3) serves as a benchmark for selecting the adequate network topology (number of layers and their parameters) alongside the performance metrics of the classification problem.

Expression (3) shows that the higher order of the memory occupancy is quadratic with the cell unit number

h_{i}

, which makes the growth of the memory occupation much more important for high values of

h_{i}

. Figure 3 represents a case where two LSTM layers are considered, with a single-precision floating representation (all

S z

equals 4) and an input size

d = 3

, as it would be for the case when considering a fall-detection problem based on IMU acceleration measures. Also, to cross-reference the memory model with standard and typical embedded processors, two planes that represent

75 %

of the available volatile memory on two processors are added. The higher one regards an ultra-low power 8-bit 8051 compatible Silicon Labs C8051F98x, featuring 8 kB Flash and 512 kB RAM, and the upper one an ultra-low-power 16-bit Microchip MSP430, featuring 512 kB Flash and 66 kB RAM [46]. From the intersection line of the surfaces (plotted in red), one can see that, for an MSP430, a combination of 10 and 50 cell units of the two layers can be deployed, with middle conjunction between 30 and 40 cell units. When considering the C8051F98x, the supported network can be more complex, featuring a combination of 10 and 150 cell units, with a middle point of 100 cell units for each layer. It should be noted that the memory model is unambiguous (there is no uncertainty), as it maps the model’s parameters into a memory footprint based on the number of bits of the parameter representation and the model’s structure.

When considering two bytes for the representation of all quantities, bias, and weights of both LSTM and FC layers, it is expected for the complexity of the networks to narrow. This case could be obtained with a half-precision floating point or an integer 16-bit representation. From Figure 4, one can see that the MSP430 could support a combination of 10 and 70 cell units for both the LSTM layers, with a middle interval of 50 cell units. The C8051F98x would support a combination of 80 and 200 cell units with a mid-range of 150 cell units.

The proposed model is used in the following section, in conjunction with an evaluation of the accuracy and other related metrics obtained using a public dataset of acceleration collected with different falls and diverse daily activities. It is worth mentioning that, in the event that more models coexist, for example, to make another kind of detection, it would be enough to replicate the arithmetics. Another no less important relationship that can be taken from the model presented concerns energy consumption and latency, something very significant in the wearable context. Since the energy consumption of the CPU is directly correlated with the number of operations and can be quantified as a normalized energy cost or cycle count, the presented model can be seen as a metric to evaluate the energy and latency efficiency of the embedded device [47].

4. Sensitivity Analysis

For simplicity and to highlight the proposed methodology, the topology of the network to analyze consists of the minimal requirements for a time series classification problem, that is, one LSTM layer for the feature extraction, one FC layer that has the task of classifying the features, and finally, a SoftMax activation layer that converts the vector of numbers from the FC into a vector of probabilities (Figure 5).

Since it is intended to implement the inference of the network on a low-power, low-complexity embedded microcontroller, the network is considered for training, varying the number of cells of the LSTM layer from 1 to 200.

The SisFall dataset [37] is one of the most popular dataset options when evaluating the required data for network training. A significant duration of samples (

4.505

between 10 s and 180 s, comprising

2.707

ADLs and

1.798

falls) and a wide diversity of emulated movements (19 classes of ADLs, including basic and sports activities and 15 classes of falls) are included in the dataset. The experimental volunteers, numbering 38, span a wide age range. The number of samples, both for classifying falls and daily activities, is considered enough for the number of parameters to be tunned [48].

The first step consists of annotating the data that are used for training purposes. The SisFall dataset is publicly available with raw data, that is, samples acquired using an ADXL345 accelerometer from Analog Devices, Wilmington, MA, USA with a sampling rate of 200 Hz. Using a MatLab^® script, each data file is read keeping the sampling rate, but firstly adjusting the scale using Expression (4), considering that the accelerometer has a scale of

\pm 16

G and 13-bit resolution, obtained from the manufacturer’s datasheet.

A_{S c a l e d_{(x, y, z)}} = \frac{2^{16}}{2^{13}} \cdot A_{R a w_{(x, y, z)}}

(4)

Secondly, a 4^th-order low-pass digital Butterworth filter with a cutoff frequency of 5 Hz is applied to the data samples to remove high-frequency noise. Thirdly, batch normalization is processed by rescaling the values for a distribution with zero mean and unity standard deviation as

A_{N o r m_{(x, y, z)}} = \frac{A_{S c a l e d_{(x, y, z)}} - μ}{σ} .

(5)

Finally, a 2 swindow is taken from the total number of samples, centered on the fall event when considering falls, and centered on the interval when considering ADLs. This is because the number of samples differs for all the events, and some activities are acquired within a very long time frame. It is known that a fall event usually occurs within a window interval of 500 ms to 2 s [49]. Also, from each collection of falls and ADLs,

80 %

is annotated as training samples and

20 %

is reserved for validation.

All the simulations are carried with MatLab^® R2022a and run on the Ubuntu operating system, on an NVIDIA DGX Station computer, Santa Clara, CA, USA with 2 CPU Intel Xeon E5-2698V4 featuring 20

2.2

GHz cores, 256 GB of RAM, and 4 NVIDIA Tesla V100 GPUs. The training is performed with a batch size of 31, meaning that, with the 2852 samples, 92 iterations will occur. Also, the training persists over 15 epochs, considered sufficient for obtaining a reasonable accuracy without underfitting, and yet not enough to induce overfitting. The validation frequency is set at 46, implying two validations per epoch. The Adam optimizer is used, with a learning rate of

0.001

and a gradient decay factor of

0.9

When the number of cell units

N \in N : N \subset [1, 200]

is considered, one can see from Figure 6 that, when the number of cells is close to one, the accuracy is relatively low, starting with

A c c = 70.29 %

for

N = 1

. Nonetheless, after around

N = 5

, the accuracy enters a steady state, varying from

93.91 %

97.25 %

, as seen in Figure 6. The maximum value of

A c c = 97.25 %

occurs at

N = 100

, and, as such, there is no more extended advantage of increasing the number of cells. It should be noted that, when considering the 2 swindow over the 200 Hzsampling frequency, a total of 400 samples represent the fall or the ADL event. Thus, the LSTM is clearly fulfilling its purpose in the long-term feature since the best accuracy is archived with only 100 cells.

In addition to the accuracy and the loss metrics, in the case of a problem such as fall detection, it is significant to analyze in what way the accuracy is not reaching

100 %

of correct predictions. A FALL being classified as an ADL or vice versa has different real-life consequences. There are only four cases for any classification result:

True Positive (TP): Prediction is a FALL, and there is, in fact, a FALL. This is a desirable situation;
True Negative (TN): Prediction is an ADL, and the subject is not falling. This is also a desirable situation;
False Positive (FP): The prediction is a FALL, but the subject performs an ADL. This is a false alarm and not desirable;
False Negative (FN): The prediction is an ADL, but the subject suffered a FALL. This is the worst situation and not desirable.

While both FP and FN are not desirable, one is worse than the other, as, when dealing with FP, a second, more detailed scan will be correct. On the contrary, when dealing with an FN, the person has fallen and may be injured, but no emergency response will be triggered. The related metrics are the Precision, formulated as

P = T P / (T P + F P)

, which evaluates “How many of those who were labeled as FALL are actually FALLs”, and the Recall, formulated as

R = T P / (T P + F N)

, which evaluates “Of all the samples that are FALLs, how many of those were correctly predicted”. Figure 7 shows the Precision and the Recall versus the Accuracy.

When considering both P and R, one can see from Figure 7 that, although its behavior is not monotonous, both are increasing with the number of cells in the LSTM layer. Moreover, the

R e c a l l

reaches its maximum value at the same cell unit number as the

A c c u r a c y

, which is desirable, as explained previously.

The results obtained here are considered for implementation in the case where the number of the LSTM layer cell unit corresponds to 100. The Confusion Matrix shown in Figure 8 provides some more details on what is happening with the predictions of the chosen architecture. From the 345 actual FALLs, 338 are correctly predicted as FALLS, and only 7 are predicted as an ADL, representing

2 %

of the cases. This is why the Recall reached a value as high as

97.97 %

. In this case, the Precision is lower (

96.97 %

), but, as stated previously, this is a more tolerable situation.

Regarding the memory occupancy, as can be seen from Figure 9, although the network could be implemented on a C8051F98x, this is the only analyzed processor that could allocate the network, with a memory occupancy of

75 %

. When considering 8-bit microcontrollers such as the STM8L or the 16-bit MSP430, the first would only support a network with 12 cells, and the second would only support 40. The STM32L would need all the memory for a network of 80 cells.

5. Proposed Methodology

This section begins with implementing one LSTM layer with 100 cells and 1 FC layer with 100 inputs and 2 outputs. These are based on the memory model presented in Section 3 and the sensibility analysis given in Section 4. The math for batch normalization is not covered here because it is regarded as a pre-processing step. Similarly, the SOFTMAX layer—which is also regarded as a post-processing phase—is not involved. The purpose is to find the most effective technique to save the network hyper-parameters on an embedded microcontroller that would function as a wearable portable device.

In accordance with Figure 10, the quantization is seen as a PTQ in this instance; a trained model is quantized before being stored, pending an inference procedure. The layer computations are then carried out once the hyper-parameters have been loaded and de-quantized. The de-quantization procedure receives a control word with characteristics established during the quantization process. An alternative approach would be to carry out the training while taking into account the quantized hyper-parameters, or Quantization-Aware Training (QAT). While the method’s perception is more natural, building a stable network with optimal performance becomes tougher due to the complexity of the training process [50].

To characterize the network, the plots of the LSTM and the FC weights and bias histograms are analyzed (Figure 11). Also, a representation of an equivalent Normal Distribution is considered, with the hyper-parameters set as mean and standard deviation. Table 1 represents each layer’s hyper-parameters’ maximum and minimum values.

The results of the data analysis show that, after pre-processing with batch normalization, there is a high degree of similarity between the LSTM input weights and the LSTM recurrent weights (Figure 11). However, neither the FC weights nor the LSTM bias fit into a normal distribution. It is important to note that, when examining the LSTM bias in further depth (Figure 11b), they combine four distinct forms of bias pertaining to the forget, input, output, and cell gates. Therefore, it is essential to comprehend how Figure 11b’s distribution is set up in relation to various gates, as seen in Figure 12. It is worth mentioning that, when dealing with a large number of parameters, it is expected to have a Normal Distribution among them. This is due to the Central Limit Theorem, which states that, when the sample size tends to infinity, the sample mean will be normally distributed. On the contrary, biases are known to be more stable, alternating between stable values [51,52]. This explains why Figure 11a,c,d is quite similar to the Normal Distribution and Figure 11b has two different group values.

From Figure 12, it is clear that the unbalanced weights of Figure 11b are due to the fact that the network keeps the input gate always active, but the same does not happen for the other gates. This situation is visible in Table 2, where the mean value of the input gate bias is close to one, and the remaining gates are close to zero.

The above discussion and the feature that was identified are the motivation for a global methodology to apply to the fall-detection problem. Based on the histogram information, one can see that uniform quantization is adequate due to the uniformly distributed hyper-parameters. Moreover, when individualizing the gate bias, that is, when considering a higher granularity, one can apply a more friendly linear quantization on uniformly distributed pre-trained weights. The quantization function

Q (r)

takes the form of

Q (r) = r o u n d (r / S) - Z

(6)

where r is the 32-bit floating-point number to quantize, S is a real-valued scaling factor, and Z is an integer for zero displacement. The scaling factor S ensures that the numerical range of the original floating-point values is properly represented within the limited range of the quantized format, that is,

S = \frac{r_{m a x} - r_{m i n}}{2^{b} - 1}

(7)

where

r_{m a x}

and

r_{m i n}

correspond to the maximum and minimum values of the hyper-parameter set, respectively, and b is the number of bits used for the representation. The zero displacement Z is calculated as

Z = r o u n d (Q_{m i n} - R_{m i n} / S)

(8)

where

Q_{m i n}

corresponds to the minimum quantized value.

The overall methodology consists of calibrating the LSTM and FC layers, individualizing the gates of the LSTM, called here as Gate Disclosure. The calibration stage delivers

α

and

β

parameters for the quantization. Secondly, a uniform quantization is applied through (6), (7) and (8). The obtained network can then be stored in an embedded computer system memory.

Figure 13 resumes the proposed method. Initially, the network is quantized in an offline step. To obtain the quantized version based on the pre-trained LSTM model, a calibration is performed to obtain

α

and

β

, the minimum and maximum values of the hyper-parameters. Secondly, the scaling factor S is calculated through expressions (6), (7), and (8). Next, the hyper-parameters are quantized with expression (6) and stored on the embedded microcontroller memory (Figure 13a). From here, the inference happens online; acceleration points are acquired and fed to the quantized network after being also quantized (Figure 13b). Finally, a simulation stage is considered here to obtain performance metrics, namely, the accuracy of the de-quantized network. To that end, the hyper-parameters are again obtained with 32-bit floating point format, from the inverse of (6) or

\hat{r} = S \times Q (r) + Z

6. Results and Discussion

The proposed methodology was implemented in MatLab^® R2022a and run on an NVIDIA DGX Station computer described in Section 4, on the network described in Section 4. Further to the quantization with Gate Disclosure, and for comparison purposes, the network was also quantized when considering a calibration stage on the individual layers without discriminating the gates. The number of bits considered for the quantized representation was 16-bit, 8-bit, 4-bit, and 2-bit.

The first set of results represented in Figure 14 corresponds to the current state-of-the-art, where uniform quantization is applied to a network, with a previous calibration performed at the layer level, that is, the scaling factor is obtained individually for each layer. From the confusion matrices in Figure 14, one can see that the network suffers no degradation when quantized with a 16-bit integer representation and has barely any degradation with an 8-bit integer representation. Regarding lower-bit representations, the network has its precision degraded by lowering the number of bits. The representation considering 2-bit-wide implies the complete degradation of the network, where all samples are classified as falls. Nevertheless, a substantial decrease can be obtained when considering the 16-bit or the 8-bit quantization. It ranges from

167,208

bytes in the 32 floating point representations to

83,604

bytes for 16-bit or

41,802

in the case of 8-bit, meaning reductions of

50 %

and

75 %

, respectively.

Secondly, the proposed Gate Disclosure methodology is applied, as shown in the confusion matrices represented in Figure 15.

Figure 15 shows evidence of the augmented performance implied by the Gate Disclosure methodology. One can see that 16-bit, 8-bit, and 4-bit quantizations did not indicate any network performance degradation, as the confusion matrices are identical. When using 2-bit for the representations, the network performance slightly decreases but still keeps an accuracy of

93.5 %

. The explanation for this behavior is, as discussed in Section 5, due to the distribution of the hyper-parameters.

While, in the first case (Figure 14), the small number of combinations of the 2-bit quantization has to be distributed by a wide range of the bias, in the second case, applying the Gate Disclosure (Figure 15), the four combinations are distributed by a much more narrow space. Figure 16 illustrates this situation, where the ’red’ circles are the pre-trained 32-bit floating point values, the ’blue’ squares denote the de-quantized values, and the dashed ’magenta’ lines represent the quantization bins. This also explains the RMSE for each case summarized in Table 3. When considering both the calibration of the layers or the Gate Disclosure calibration method, as the number of bits decreases, the error increases, but at a lower rate in the second case.

To end the performance discussion, and recalling the proposed memory model, if the quantization options of the Gate Disclosure method overlap (Figure 17), one can see that the 16-bit MSP430 microcontroller could now perfectly support the specified network with 100 hidden cells. Also, the STM8L would support an LSTM with 40 hidden cells, with an accuracy of

96.52 %

7. Conclusions

This work proposes a new approach for quantizing an LSTM neural network based on a high level of granularity, that is, to the level of the LSTM gates, called Gate Disclosure. By performing detailed simulations and analyses of the network topology (1 LSTM layer + 1 FC layer), the method is then applied to an LSTM with 100 hidden cells.

A memory occupancy model was then proposed to evaluate the feasibility of using this network on an embedded device as a wearable for fall detection in senior groups. The model’s correlation with the network topology and the binary representation of the network hyper-parameters allow for the statement of its implementability.

The proposed methodology is compared with the state-of-the-art uniform quantization on different bit widths (16 to 2), demonstrating that new quantized and de-quantized networks are still accurate for predicting fall detection based on acceleration measures for fall detection among elderly people. The numerical results show that, for an LSTM network, using different quantization thresholds “per-gate” can improve the accuracy of the quantized model.

For future work, the authors will consider testing the methodology on different public datasets to strengthen the results and implement other non-uniform quantization functions for comparison. Additionally, the methodology could be applied to other problems with the same architecture (Time Series Classification), such as predicting tremors in Parkinson’s disease.

Author Contributions

Conceptualization, S.D.C. and J.P.M.-C.; methodology, S.D.C. and J.P.M.-C.; software, S.D.C. and J.P.M.-C.; validation, S.D.C., P.M.R. and J.P.M.-C.; formal analysis, S.D.C., P.M.R. and J.P.M.-C.; investigation, S.D.C. and J.P.M.-C.; data curation, S.D.C., P.M.R. and J.P.M.-C.; writing—original draft preparation, S.D.C. and J.P.M.-C.; writing—review and editing, S.D.C., P.M.R. and J.P.M.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Fundação para a Ciência e a Tecnologia under the Projects UIDB/04111/2020, and CEECINST/00147/2018/CP1498/CT0015 and UIDB/05064/202, by the program SATDAP—Capacitação da Administração Pública under grant number POCI-05-5762-FSE-000191, as well as Instituto Lusófono de Investigação e Desenvolvimento (ILIND) under the Project COFAC/ILIND/COPELABS/1/2020.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ADL	Activity of Daily Living
AI	Artificial Intelligence
CPU	Central Processing Unit
EEPROM	Electrically Erasable Programmable Read-Only Memory
FC	Fully Connected
FLOPS	Floating-Point Operations per Second
FN	False Negative
FP	False Positive
IMU	Inertial Measurement Unit
LSTM	Long Short-Term Memory
PIR	Passive Infrared Sensors
PTQ	Post-Training Quantization
QAT	Quantization-Aware Training
RAM	Random-Access Memory
RMSE	Root Mean Squared Error
ROM	Read-Only Memory
SQ	Stochastic Quantization
TN	True Negative
TP	True Positive
WHO	World Health Organization

References

World Health Organization. World Report on Ageing and Health; World Health Organization: Geneva, Switzerland, 2015. [Google Scholar]
Sri-on, J.; Tirrell, G.P.; Bean, J.F.; Lipsitz, L.A.; Liu, S.W. Revisit, subsequent hospitalization, recurrent fall, and death within 6 months after a fall among elderly emergency department patients. Ann. Emerg. Med. 2017, 70, 516–521.e2. [Google Scholar] [CrossRef] [PubMed]
Young, W.R.; Mark Williams, A. How fear of falling can increase fall-risk in older adults: Applying psychological theory to practical observations. Gait Posture 2015, 41, 7–12. [Google Scholar] [CrossRef] [PubMed]
Angal, Y.; Jagtap, A. Fall detection system for older adults. In Proceedings of the 2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT), Pune, India, 2–3 December 2016. [Google Scholar] [CrossRef]
Genovese, V.; Mannini, A.; Guaitolini, M.; Sabatini, A.M. Wearable inertial sensing for ICT management of fall detection, fall prevention, and assessment in elderly. Technologies 2018, 6, 91. [Google Scholar] [CrossRef]
Wang, Z.; Ramamoorthy, V.; Gal, U.; Guez, A. Possible life saver: A review on Human Fall Detection Technology. Robotics 2020, 9, 55. [Google Scholar] [CrossRef]
Ramachandran, A.; Karuppiah, A. A survey on recent advances in Wearable Fall Detection Systems. BioMed Res. Int. 2020, 2020, 2167160. [Google Scholar] [CrossRef]
Wang, X.; Ellul, J.; Azzopardi, G. Elderly fall detection systems: A literature survey. Front. Robot. AI 2020, 7, 71. [Google Scholar] [CrossRef]
Casilari-Pérez, E.; García-Lagos, F. A comprehensive study on the use of artificial neural networks in wearable fall detection systems. Expert Syst. Appl. 2019, 138, 112811. [Google Scholar] [CrossRef]
Khraief, C.; Benzarti, F.; Amiri, H. Elderly fall detection based on multi-stream deep convolutional networks. Multimed. Tools Appl. 2020, 79, 19537–19560. [Google Scholar] [CrossRef]
Farsi, M. Application of ensemble RNN deep neural network to the fall detection through iot environment. Alex. Eng. J. 2021, 60, 199–211. [Google Scholar] [CrossRef]
Shahzad, A.; Kim, K. FallDroid: An automated smart-phone-based fall detection system using multiple kernel learning. IEEE Trans. Ind. Inform. 2019, 15, 35–44. [Google Scholar] [CrossRef]
Fé, J.; Correia, S.D.; Tomic, S.; Beko, M. Swarm Optimization for Energy-Based Acoustic Source Localization: A Comprehensive Study. Sensors 2022, 22, 1894. [Google Scholar] [CrossRef] [PubMed]
Correia, S.D.; Fé, J.; Tomic, S.; Beko, M. Development of a Test-Bench for Evaluating the Embedded Implementation of the Improved Elephant Herding Optimization Algorithm Applied to Energy-Based Acoustic Localization. Computers 2020, 9, 87. [Google Scholar] [CrossRef]
Correia, S.D.; Matos-Carvalho, J.P.; Tomic, S. Quantization with Gate Disclosure for Embedded Artificial Intelligence Applied to Fall Detection. In Proceedings of the 2024 International Conference on Information Technology for Social Good, Bremen Germany, 4–6 September2024; pp. 84–87. [Google Scholar] [CrossRef]
Colón, L.N.V.; DeLaHoz, Y.; Labrador, M. Human fall detection with smartphones. In Proceedings of the 2014 IEEE Latin-America Conference on Communications (LATINCOM), Cartagena, Colombia, 5–7 November 2014; pp. 1–7. [Google Scholar] [CrossRef]
Hsieh, S.L.; Su, M.H.; Liu, L.F.; Jiang, W.W. A Finite State Machine-Based Fall Detection Mechanism on Smartphones. In Proceedings of the 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing, Cartagena, Colombia, 5–7 November 2014; pp. 735–739. [Google Scholar] [CrossRef]
Gutiérrez, J.; Rodríguez, V.; Martin, S. Comprehensive Review of Vision-Based Fall Detection Systems. Sensors 2021, 21, 947. [Google Scholar] [CrossRef] [PubMed]
Briede-Westermeyer, J.C.; Pacheco-Blanco, B.; Luzardo-Briceño, M.; Pérez-Villalobos, C. Mobile Phone Use by the Elderly: Relationship between Usability, Social Activity, and the Environment. Sustainability 2020, 12, 2690. [Google Scholar] [CrossRef]
Balli, S.; Sağbaş, E.A.; Peker, M. Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm. Meas. Control 2019, 52, 37–45. [Google Scholar] [CrossRef]
Islam, M.M.; Neom, N.; Imtiaz, M.S.; Nooruddin, S.; Islam, M.R.; Islam, M.R. A Review on Fall Detection Systems Using Data from Smartphone Sensors. Ingénierie Des Systèmes D Inf. 2019, 24, 569–576. [Google Scholar] [CrossRef]
Islam, M.M.; Tayan, O.; Islam, M.R.; Islam, M.S.; Nooruddin, S.; Kabir, M.N.; Islam, M.R. Deep learning based systems developed for fall detection: A review. IEEE Access 2020, 8, 166117–166137. [Google Scholar] [CrossRef]
Correia, S.D.; Tomic, S.; Beko, M. A Feed-Forward Neural Network Approach for Energy-Based Acoustic Source Localization. J. Sens. Actuator Netw. 2021, 10, 29. [Google Scholar] [CrossRef]
Thome, N.; Miguet, S.; Ambellouis, S. A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 1522–1532. [Google Scholar] [CrossRef]
Mistikoglu, G.; Gerek, I.H.; Erdis, E.; Usmen, P.M.; Cakan, H.; Kazan, E.E. Decision tree analysis of construction fall accidents involving roofers. Expert Syst. Appl. 2015, 42, 2256–2263. [Google Scholar] [CrossRef]
Liu, C.L.; Lee, C.H.; Lin, P.M. A fall detection system using k-nearest neighbor classifier. Expert Syst. Appl. 2010, 37, 7174–7181. [Google Scholar] [CrossRef]
Zhang, T.; Wang, J.; Xu, L.; Liu, P. Fall detection by wearable sensor and one-class SVM algorithm. In Intelligent Computing in Signal Processing and patTern Recognition; Springer: Berlin/Heidelberg, Germany, 2006; pp. 858–863. [Google Scholar] [CrossRef]
Bakshi, S. Attention vision transformers for human fall detection. Res. Sq. Prepr. 2022. [Google Scholar] [CrossRef]
Zhao, W.; Ma, T.; Gong, X.; Zhang, B.; Doermann, D. A Review of Recent Advances of Binary Neural Networks for Edge Computing. IEEE J. Miniaturization Air Space Syst. 2021, 2, 25–35. [Google Scholar] [CrossRef]
Wang, N.; Choi, J.; Brand, D.; Chen, C.Y.; Gopalakrishnan, K. Training deep neural networks with 8-bit floating point numbers. Adv. Neural Inf. Process. Syst. 2018, 31, 7686–7695. [Google Scholar]
Liu, Z.; Cheng, K.T.; Huang, D.; Xing, E.P.; Shen, Z. Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4942–4952. [Google Scholar]
Yang, J.; Shen, X.; Xing, J.; Tian, X.; Li, H.; Deng, B.; Huang, J.; Hua, X.s. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 7308–7316. [Google Scholar]
Jung, S.; Son, C.; Lee, S.; Son, J.; Han, J.J.; Kwak, Y.; Hwang, S.J.; Choi, C. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 4350–4359. [Google Scholar]
Yang, Z.; Wang, Y.; Han, K.; Xu, C.; Xu, C.; Tao, D.; Xu, C. Searching for low-bit weights in quantized neural networks. Adv. Neural Inf. Process. Syst. 2020, 33, 4091–4102. [Google Scholar]
Han, Q.; Hu, Y.; Yu, F.; Yang, H.; Liu, B.; Hu, P.; Gong, R.; Wang, Y.; Wang, R.; Luan, Z.; et al. Extremely low-bit convolution optimization for quantized neural network on modern computer architectures. In Proceedings of the 49th International Conference on Parallel Processing-ICPP, Edmonton, AB, Canada, 17–20 August 2020; pp. 1–12. [Google Scholar] [CrossRef]
Dong, Y.; Ni, R.; Li, J.; Chen, Y.; Su, H.; Zhu, J. Stochastic quantization for learning accurate low-bit deep neural networks. Int. J. Comput. Vis. 2019, 127, 1629–1642. [Google Scholar] [CrossRef]
Sucerquia, A.; López, J.; Vargas-Bonilla, J. SisFall: A Fall and Movement Dataset. Sensors 2017, 17, 198. [Google Scholar] [CrossRef]
Rosero-Montalvo, P.D.; Batista, V.F.; Rosero, E.A.; Jaramillo, E.D.; Caraguay, J.A.; Pijal-Rojas, J.; Peluffo-Ordóñez, D.H. Intelligence in embedded systems: Overview and applications. In Proceedings of the Future Technologies Conference (FTC), Vancouver, BC, Canada, 15–16 November2018; pp. 874–883. [Google Scholar] [CrossRef]
Shi, R.; Liu, J.; Hayden So, K.H.; Wang, S.; Liang, Y. E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System. In Proceedings of the 2019 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2–6 June2019; pp. 1–6. [Google Scholar] [CrossRef]
Correia, S.D.; Beko, M.; Tomic, S.; Da Silva Cruz, L.A. Energy-Based Acoustic Localization by Improved Elephant Herding Optimization. IEEE Access 2020, 8, 28548–28559. [Google Scholar] [CrossRef]
Blalock, D.; Gonzalez Ortiz, J.J.; Frankle, J.; Guttag, J. What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2020, 2, 129–146. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Kim, I.; Matos-Carvalho, J.P.; Viksnin, I.; Simas, T.; Correia, S.D. Particle Swarm Optimization Embedded in UAV as a Method of Territory-Monitoring Efficiency Improvement. Symmetry 2022, 14, 1080. [Google Scholar] [CrossRef]
DiPietro, R.; Hager, G.D. Deep learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer Assisted Intervention; Elsevier: Amsterdam, The Netherlands, 2020; pp. 503–519. [Google Scholar] [CrossRef]
Squartini, S.; Hussain, A.; Piazza, F. Preprocessing based solution for the vanishing gradient problem in recurrent neural networks. In Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03, Bangkok, Thailand, 25–28 May 2003; Volume 5, p. V. [Google Scholar] [CrossRef]
Correia, S.D.; Beko, M.; Da Silva Cruz, L.A.; Tomic, S. Implementation and Validation of Elephant Herding Optimization Algorithm for Acoustic Localization. In Proceedings of the 2018 26th Telecommunications Forum (TELFOR), Belgrade, Serbia, 20–21 November 2018; pp. 1–4. [Google Scholar] [CrossRef]
Yang, T.J.; Chen, Y.H.; Emer, J.; Sze, V. A method to estimate the energy consumption of deep neural networks. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; pp. 1916–1920. [Google Scholar] [CrossRef]
Casilari, E.; Santoyo-Ramón, J.A.; Cano-García, J.M. Analysis of public datasets for wearable fall detection systems. Sensors 2017, 17, 1513. [Google Scholar] [CrossRef] [PubMed]
Musci, M.; De Martini, D.; Blago, N.; Facchinetti, T.; Piastra, M. Online Fall Detection Using Recurrent Neural Networks on Smart Wearable Devices. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1276–1289. [Google Scholar] [CrossRef]
Sun, X.; Wang, N.; Chen, C.Y.; Ni, J.; Agrawal, A.; Cui, X.; Venkataramani, S.; El Maghraoui, K.; Srinivasan, V.V.; Gopalakrishnan, K. Ultra-low precision 4-bit training of deep neural networks. Adv. Neural Inf. Process. Syst. 2020, 33, 1796–1807. [Google Scholar]
Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28, 1135–1143. [Google Scholar]
Liao, L.; Zhao, Y.; Wei, S.; Wei, Y.; Wang, J. Parameter distribution balanced CNNs. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4600–4609. [Google Scholar] [CrossRef]

Figure 1. LSTM cell internal structure.

Figure 2. Neural network topology for the fall-detection problem.

Figure 3. Memory occupancy for 2 LSTM layers, 1 FC layer, 8 bytes representation.

Figure 4. Memory occupancy for 2 LSTM layers, 1 FC layer, and 4 bytes representation.

Figure 5. Deep network topology for the fall-detection problem.

Figure 6. Accuracy and loss of the validation data samples.

Figure 7. Accuracyvs. precision vs. recall.

Figure 8. Confusionmatrix for 1 LSTM layer with cell size of 100.

Figure 9. Accuracyand memory occupancy of 1 LSTM layer for different microcontrollers.

Figure 10. Workflow of the data types at the inference stage.

Figure 11. LSTM and FC weights and bias histograms vs. equivalent normal distribution.

Figure 12. Distribution of the LSTM bias between the different gates.

Figure 13. Quantization, inference, and simulation methodology.

Figure 14. Confusion matrices of the quantized networks with state-of-the-art uniform quantization.

Figure 15. Confusion matrix of the quantized networks when applying the proposed Gate Disclosure.

Figure 16. Confusion matrix of the quantized networks when applying the proposed Gate Disclosure.

Figure 17. Accuracy and memory occupancy of 1 LSTM layer for different microcontrollers and different quantization representations.

Table 1. Characterization of the network hyper-parameters.

	Minimum	Maximum	Mean	Standard Deviation
LSTM Recurrent Weights	−0.3829	0.3785	−0.001150	0.111138
LSTM Inputs Weights	−0.3472	0.3926	0.003835	0.157104
LSTM Bias	−0.0665	1.1425	0.282934	0.448151
FC Weights	−0.5948	0.5720	0.021235	0.332854
FC Bias	−0.0300	0.0300	1.86 $\times 10^{- 9}$	0.042362

Table 2. Characterization of the LSTM bias between the different gates.

	Minimum	Maximum	Mean	Standard Deviation
Forget Gate	−0.0475	0.122724	0.038394	0.0400266
Input Gate	0.9603	1.142473	1.051796	0.0439222
Output Gate	−0.0665	0.065477	−0.002498	0.0411421
Cell Gate	−0.0640	0.148889	0.044044	0.0443382

Table 3. Quantization error and accuracy vs. the number of bits for the quantized representation.

	Bytes	Metrics
	Bytes	Accuracy (%)	RMSE
Layer Quantization	b = 16	97.25	4.47 $\times 10^{- 6}$
	b = 8	97.10	0.0011
	b = 4	93.33	0.0195
	b = 2	50.00	0.1020
Gate Disclosure Method	b = 16	97.25	4.35 $\times 10^{- 6}$
	b = 8	97.25	0.0011
	b = 4	95.10	0.0190
	b = 2	93.48	0.0981

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Correia, S.D.; Roque, P.M.; Matos-Carvalho, J.P. LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors. Symmetry 2024, 16, 1296. https://doi.org/10.3390/sym16101296

AMA Style

Correia SD, Roque PM, Matos-Carvalho JP. LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors. Symmetry. 2024; 16(10):1296. https://doi.org/10.3390/sym16101296

Chicago/Turabian Style

Correia, Sérgio D., Pedro M. Roque, and João P. Matos-Carvalho. 2024. "LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors" Symmetry 16, no. 10: 1296. https://doi.org/10.3390/sym16101296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors^†

Abstract

1. Introduction

2. Related Work

3. Memory Occupancy Model

4. Sensitivity Analysis

5. Proposed Methodology

6. Results and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors †

Abstract

1. Introduction

2. Related Work

3. Memory Occupancy Model

4. Sensitivity Analysis

5. Proposed Methodology

6. Results and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection Sensors^†