Open AccessArticle

Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments

Caryn Vowles

Kate Patterson

and

T. Claire Davies

Mechanical and Materials Engineering, Queen’s University, Kingston, ON K7L 3N6, Canada

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2850; https://doi.org/10.3390/app15052850

Submission received: 13 January 2025 / Revised: 13 February 2025 / Accepted: 27 February 2025 / Published: 6 March 2025

(This article belongs to the Special Issue Assistive Technologies for Rehabilitation: Challenges and Applications)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

Children with severe motor and communication impairments (SMCIs) face significant challenges in expressing emotions, often leading to unmet needs and social isolation. The identification of emotions from physiological signals using machine learning could provide enhanced support for the emotional well-being of children with complex communication needs and increase their quality of life.

Abstract

Children with severe motor and communication impairments (SMCIs) face significant challenges in expressing emotions, often leading to unmet needs and social isolation. This study investigated the potential of machine learning to identify emotions in children with SMCIs through the analysis of physiological signals. A model was created based on the data from the DEAP online dataset to identify the emotions of typically developing (TD) participants. The DEAP model was then adapted for use by participants with SMCIs using data collected within the Building and Designing Assistive Technology Lab (BDAT). Key adaptations to the DEAP model resulted in the exclusion of respiratory signals, a reduction in wavelet levels, and the analysis of shorter-duration data segments to enhance the model’s applicability. The adapted SMCI model demonstrated an accuracy comparable to the DEAP model, performing better than chance in TD populations and showing promise for adaptation to SMCI contexts. The models were not reliable for the effective identification of emotions; however, these findings highlight the feasibility of using machine learning to bridge communication gaps for children with SMCIs, enabling better emotional understanding. Future efforts should focus on expanding the data collection of physiological signals for diverse populations and developing personalized models to account for individual differences. This study underscores the importance of collecting data from populations with SMCIs for the development of inclusive technologies to promote empathetic care and enhance the quality of life of children with communication difficulties.

Keywords:

disability; communication impairment; DEAP; machine learning; emotion; physiological signals; children

1. Introduction

Approximately 0.5% of children face chronic medical complexities affecting multiple bodily systems, leading to frequent hospitalizations and significant healthcare costs [1,2,3]. These children and their caregivers exhibit remarkable resilience, yet their emotional well-being is often undermined by factors such as stress and disrupted social interactions [4]. Children with severe motor and communication impairments (SMCI) are particularly affected, as their limited ability to communicate can lead to social isolation and challenges in expressing their needs [5].

Children with SMCIs use gestures and vocalizations to express themselves to their familiar communication partners. Children with SMCIs often require a familiar interpreter in their daily lives who can translate. However, solely relying on a familiar interpreter to identify the emotional well-being of children with SMCIs can cause bias. Even persons familiar with a child may not be able to interpret their emotions from facial expressions [6].

Technology communication methods can be categorized based on the complexity of the systems used to support communication or emotion assessment. No-tech methods rely on unaided natural communication, such as gestures, facial expressions, and body language, requiring no external devices or tools [5]. In contrast, low-tech methods employ simple, non-electronic aids, such as communication boards, which are cost-effective and easy to use but still provide structured support for communication [5]. High-tech methods use advanced electronic systems, including speech-generating devices (SGDs) accessed through eye-gaze technology or brain signals [5]. These distinctions identify the spectrum of technologies available, from basic to sophisticated, to address the diverse needs of individuals with communication and emotional assessment challenges.

The United Nations Convention on the Rights of the Child emphasized the importance of respecting every child’s preferences [7]. In this context, accurately identifying and understanding the emotions of children who have communication difficulties is crucial for ensuring that their preferences are respected and effectively met, especially in healthcare settings.

Emotional responses can be difficult to assess due to their complexity and humans’ natural ability to hide emotion. The current research using psychophysiological signals focuses on the emotional assessment of typically developed adult participants [8,9,10,11,12]. Psychologists often classify emotions into six discrete categories: happiness, sadness, fear, anger, surprise, and disgust [13]. Psychologists also use the theory of emotion to describe emotions with three dimensions that include pleasure, arousal, and dominance (PAD) [13]. Each dimension is scored on a continuous scale used to describe the emotion: pleasure, sometimes referred to as valence, describes how a feeling is positive or happy; arousal describes how actively one engages with an emotion; and dominance describes whether the environment influences an individual or vice versa (i.e., is the individual controlling the situation or is it controlling the individual). Once a stimulus is presented to an individual in a research setting, the participants respond to the stimulus with their pleasure, arousal, and dominance scores. Each value of the PAD can then be plotted on a set of axes that represent pleasure, arousal, and dominance. The responses to a specific stimulus from a number of participants allows for the creation of a three-dimensional graph with the mean at the center of the sphere, and the standard deviation defining the radius [14]. Based on the three-dimensional location of these spheres, an emotion can theoretically be identified [15].

The current standard for self-reporting emotions is the Self-Assessment Manikin (SAM) method [12,13]. This approach uses digitized images depicting various emotional states, asking users to select images across three dimensions from a scale of one to nine representing PAD: pleasure (happy—1 vs. unhappy—9), arousal (excited—1 vs. calm—9), and dominance (controlled—1 vs. in control—9). While the SAM facilitates non-verbal interaction, it still requires participants to have communication skills and a cognitive understanding of emotions to navigate and respond to the images [13].

In psychology and affective computing, many researchers use Paul Ekman’s theory, which identifies six basic emotions: fear, anger, happiness, sadness, disgust, and surprise [16]. However, researchers sometimes substitute joy for happiness [17]. Some researchers are able to differentiate among four basic emotions [18], while others suggest that they can identify twenty-seven [19]. A systematic review examining the use of physiological signals for emotion recognition determined that only 15% of studies investigated more than four emotions [20].

While PAD scores are recognized within research environments as emotion identifiers, the colloquial terminology that represents emotion is more easily understood by the general population. However, very little research exists that maps PAD scores onto emotions. Using machine learning techniques to identify emotional clusters based on PAD scores, our current group has found that six basic emotions are distinguishable [21]. In a previous paper [15], an Emotion Identification (EI) model was developed that maps PAD scores onto one of six basic emotion clusters using the subjective words representing emotions.

The autonomic nervous system responds to stimuli through psychophysiological signals, of which a person may be unaware. Emotion is interpreted via the sympathetic and parasympathetic nervous system and can cause arousing and calming feelings, respectively [22]. Arousal may be detected by an increased heart rate, increased blood flow to skin blood vessels, and/or an increased rate of breathing. Calm may be noticed by a decreased cardiac rate and respiration [22]. Electrocardiography (ECG), photoplethysmography (PPG), galvanic skin response (GSR), and respiration (RESP) sensors can allow for the detection of these signals [22]. The monitoring of physiological signals with an inexpensive wearable method could be used to detect emotions discretely without the need for an individual to explicitly express their emotions. By combining the model in this current research with the EI model that links PAD scores to basic emotions, emotions can be determined from physiological signals [15].

Many devices have been researched to detect emotion in typically developed participants in laboratory settings with good success. These devices include either single sensors or the integration of multiple sensors. Dominguez-Jiménez et al. detected only three emotions (amusement, sadness, and neutral) and relied on physiological signals from 37 typically developing volunteers using PPG and GSR. The device developed by Katsis et al. [8] integrated EMG, ECG, RESP, and EDA (electrodermal activity—similar to GSR) sensors into a driving mask and focused on real-life driving situations to identify four emotions (high stress, low stress, disappointment, and euphoria), but the test volunteers were above average with respect to their driving skills and abilities. This focus on average or above-average participants does not adequately address the need for a device for participants who have difficulty with communication.

Within affecting computing, researchers typically only categorize each emotion of pleasure, arousal, and dominance into two or three categories (high, low, and neutral) [20], but to the best of our knowledge, no researcher has attempted to categorize them into nine categories (for each of the three PAD dimensions). Most affective computing research focuses primarily on the pleasure and arousal dimensions of emotion, neglecting the dominance component. The use of nine categories would represent the full spectrum of emotions available when an individual self-identifies using the SAM [8,9,10,11,12,23,24], potentially enabling the detection of all six Ekman emotions from physiological signals. A study by Goshvarpour [10] used only the pleasure and arousal dimensions for classifying four quadrants of emotion, using wavelets for both the ECG and GSR with 100% and 94% accuracy [10], whereas Zied et al. [23] achieved 95% accuracy by fusing the features from four physiological signals and using a continuous wavelet to identify four emotions [23]. None of the current devices has been tested on participants under the age of 18 or those with medical complexities.

The specific objective of the current research was to create a model that identifies emotion using physiological data from both typically developing (TD) participants and participants who have SMCIs.

2. Materials and Methods

In machine learning, the process used involves pre-processing to clean and prepare data, extraction of data characteristics to identify features, selection of features to identify those that are relevant, model selection to determine the best algorithm, and classification to predict categorical labels and evaluate the model’s performance (Figure 1). The DEAP model was developed using data from the DEAP (Database for Emotion Analysis using Physiological Signals) dataset [15] of typically developing participants. However, the same protocols cannot be used for the collection of data from participants with SMCIs. A new data collection protocol was developed for the BDAT dataset (Building and Designing Assistive Technology Lab—author’s lab). The original DEAP machine learning model was adapted to only draw on data that were comparable to the BDAT dataset (SMCI model). The new SMCI model was retrained using the DEAP data and finally tested using the BDAT lab data for persons with SMCIs.

2.1. Datasets

Two different datasets were employed in this model: the DEAP dataset and the BDAT dataset.

2.1.1. DEAP Dataset

In the current study, the experimental component of the DEAP dataset was used (Table 1) [17]. This dataset was created by having 32 participants view 40 one-minute music video clips online. The participants were asked to record their emotional responses relative to three categories: pleasure, arousal, and dominance. PPG, GSR, and RESP were recorded for the participants while they were exposed to the audio stimuli.

2.1.2. BDAT Dataset

There were 10 participants in the BDAT dataset; 9 children with SMCIs and 1 matched, typically developing (TDM) child. This research was approved by Queen’s University’s Health and Research Ethics Board, with consent and assent of all the participants obtained. The participant group with SMCIs consisted of children with motor and communication impairments which limited their response to traditional scales (paper and pencil/verbal self-report methods). The selection of participants within the group of children with SMCIs was defined the same as in the systematic review performed with Noyek [5]. The age range (5–25 years old) was specified to be consistent with the age range of school-age children. In Ontario, Canada, school-age children encompass individuals under the age of 25 years old [25]. Further, multiple international health assessments and school-related participation guidelines specify children as under 25 years old. A minimum of five years of age was specified to account for appropriate developmental age of comprehension and expression [26]. Primary guardians of children with SMCIs were also required to be able to comprehend and communicate fluently in English. To ensure that this study was accessible to the participant group with SMCIs, this study was completed at the participants’ homes in most cases. It is recognized that this is a small group of participants, especially when implementing machine learning algorithms. However, recruitment of participants with disabilities, especially those who have severe motor and communication impairments, is extremely difficult. Strategies were undertaken to identify as many potential participants as possible, with recommendations by Collins et al. [27] followed throughout.

The IAPS (International Affective Picture System) was used to evoke an emotional response [28]. The IAPS system was chosen as it is a standardized method that uses the SAM tool to assess emotion. IAPS is a collection of photographs that have been rated by participants for pleasure, arousal, and dominance. The present protocol selected 30 IAPS images that were suitable for children. Participants were shown ten IAPS images for 6 s and then self-reported pleasure, arousal, and dominance using SAM tool on a nine-point scale. A grey screen was presented between images, and the timing for this screen depended on when the participant was ready for the next image. The same protocol was followed such that 2 or 3 rounds were conducted, depending on the willingness of the participant. During the display of images, PPG (photoplethysmography) and GSR (galvanic skin response) data were collected with a Gazepoint GP-3, (Gazepoint, Vancouver, BC, Canada), which includes a biometrics system. For children who were unable to self-report their scores on the SAM, information was collected from caregivers of their evaluation of the child’s response.

2.2. Algorithmic Approach (Figure 1)

Using the DEAP dataset, multiple models were iteratively built to improve the accuracy and speed of the model. Since the DEAP dataset used different data collection tools, the DEAP model had to be modified to enable analysis of the BDAT dataset. Unlike the DEAP that was sampled for one minute at 128 Hz, the BDAT dataset was sampled at 60Hz for six seconds. Adaptations to the model had to be implemented to account for this difference. Once the adaptions were implemented, the model was retrained with the DEAP dataset before testing it on the SMCI data.

2.2.1. Pre-Processing

All model creation was coded in python (version 3.9) using sk.learn [29] and neurokit libraries [30]. The data were cleaned using neurokit and the baseline average was subtracted from the GSR signals. The data were prepared to allow for testing using an input of all three different signals (PPG, GSR, and RESP), both individually and in combination. The duration of input data was also determined during the pre-processing stage.

2.2.2. Feature Extraction

The application of discrete wavelet transforms was used to identify statistical features from each level of the wavelet. A comparison was drawn between the level 8 wavelet method and the level 5 wavelet. Wavelet selection was based on results from Goshvarpour et al., which identified the most responsive wavelet given each signal type. For both GSR and ECG, a Daubechies wavelet was determined to be the most accurate; for RESP, a symlet wavelet was indicated [10,23].

2.2.3. Feature Selection

A systematic review [20] found that most researchers abandon the feature selection stage of physiological processing, though it can greatly decrease the computational complexity. Principal component analysis (PCA) is a dimensionality reduction technique used to simplify complex datasets by transforming the original variables into a new set of uncorrelated variables called principal components, which capture variance in the data [31]. PCA can be used to select a smaller set of the combination of features that give the most information from the larger set. By focusing on these principal components, PCA reduces the number of features while preserving the essential information. PCA was used in the present research to decrease the number of features and the computational power required. However, PCA has its weaknesses: it assumes linear relationships among variables, which may not capture the true structure of the data, and it can be sensitive to scaling, requiring standardization of data [29]. Additionally, PCA may discard important information if too many dimensions are reduced, and the new principal components are often less interpretable compared to the original features. To decrease the number of features and select the most important ones, principal component analysis (PCA) was performed using all 40 trials for each of the 32 participants using the DEAP dataset. The explained variance ratio for each was used to select the features using a cutoff of 95% of data. PCA was conducted on each signal independently as well as on the combined signals.

PCA was chosen over kernel PCA and LDA (Linear Discriminant Analysis) based on results from Goshvarpour et al. [10]. Goshvarpour et al. used wavelets to extract features and then selected features using kernel PCA, PCA, and LDA [10]. Kernel principal component analysis (Kernel PCA) is an extension of PCA that allows for nonlinear dimensionality reduction. While standard PCA is limited to linear transformations, Kernel PCA uses kernel methods to project data onto a higher-dimensional space where linear separations can capture complex, nonlinear relationships in the original data [10]. Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction and classification technique used to separate two or more classes by finding the linear combination of features that maximizes the separation between them [10]. Goshvarpour et al. compared Kernel PCA, LDA, and PCA; PCA was most accurate, with up to 100% accuracy [10].

2.2.4. Classification

The model was tested using the cross-validation method [29]. A ten-fold cross-validation was used. The dataset was divided into training and validation data. The training data were used during model development, where the model was trained with nine folds and tested on the tenth set. This process was repeated ten times, with each fold being the test set once. The results were then averaged. The DEAP model, and the new SMCI model with adaptations, were both trained using the cross-validation method with the DEAP dataset for testing.

The features extracted from each independent signal (PPG, GSR, and RESP) and the fused features were provided as inputs to nine different machine learning (ML) models that have previously shown success in research identified in the literature [20]. The models encompassed various types of trees (Decision, Random Forest, Gradient Boosting Machines, and AdaBoost), support vector machines (RBF, linear, poly, and sigmoid), and a naïve Bayes model. Decision Trees classify data points using predefined rules. Each node in a Decision Tree represents a prediction based on measurable data features, with branches indicating possible outcomes. A Random Forest model combines multiple Decision Trees, each trained on random subsets of data and features, and makes predictions by averaging the outputs of all trees. In a Random Forest, all trees carry equal weight in the final prediction. Gradient Boosting Machines and AdaBoost are akin to Random Forests but differ in how trees contribute to predictions—each tree’s impact is weighted, influencing the final prediction differently. Support Vector Machines (SVMs) seek an optimal hyperplane to separate data into classes. By mapping data onto a higher-dimensional space, SVMs identify a hyperplane maximizing margin between classes. Different kernel functions (RBF, linear, poly, and sigmoid) map data onto varied spaces. Naïve Bayes is a probabilistic model using Bayes’ theorem to predict class probabilities based on conditionally independent features. The ML model with the best performance accuracy was used as the robust DEAP model for future analyses and provided output pleasure, arousal, and dominance scores from the signals.

2.2.5. Evaluation

Affective computing researchers usually decrease model complexity by grouping the output PAD scores (nine categories in the present study) into two or three categories. The present research tested different levels of complexity. First, all ML models were tested with the cross-validation method described above to find the most accurate at identifying pleasure, arousal, and dominance scores across nine categories (the SAM self-reported integer options). A three-category version was also evaluated by separating the groups into low (1–3), neutral (4–6), and high (7–9) scores. Finally, a two-category version of each ML model that split the data into low (1–5) and high (5–9) scores was evaluated.

In addition to reducing the categories of scores in the arousal and pleasure dimensions, many researchers do not model dominance, the third dimension, further decreasing the complexity of models and improving the results. However, removal of the dominance axis is inappropriate [15], so the dominance component was included for all models in this study.

The DEAP dataset consists of two sub-studies: an online dataset in which participants self-reported both PAD scores and an emotional word (from a list of 16), and an experimental study in which participants’ physiological signals were collected and they self-reported only PAD scores. Once the PAD score is obtained using the ML model, the results can be mapped onto words that represent emotions. The current approach maps the complete PAD score and links those PAD scores to six words that represent emotions. These six words were chosen as a result of a mapping evaluation among datasets that included both PAD scores and words that represent emotions [15].

Using the physiological signals collected from typically developing individuals (the DEAP dataset with PAD scores from auditory stimuli), and from persons with SMCIs (the BDAT dataset with IAPS PAD scores), the accuracy of identified emotions was evaluated.

3. Results

Each model, the DEAP model and the SMCI adapted model, was evaluated using the five steps listed in Figure 1. Comparisons of the accuracy of the various machine learning models that mapped physiological signals to pleasure, arousal, and dominance scores were evaluated.

3.1. DEAP Model

3.1.1. Pre-Processing

The DEAP dataset included already pre-processed and reordered data with down sampling to 128Hz. The first three seconds at the beginning of the trial were removed.

3.1.2. Feature Extraction

Discrete wavelet transforms were used to extract different frequency sub-bands from the DEAP database’s PPG, GSR, and RESP signals. For the PPG and GSR signals, a Daubechies 4 level 8 wavelet was used. For the RESP signal, a symlet 2 level 6 wavelet was evaluated. From each sub-band, 12 features were extracted: entropy, percentiles (5th, 25th, 50th, 75th, and 95th), mean, standard deviation, variance, root mean square value, zero crossing, and mean crossing rate. After the wavelet transform, 108, 108, and 84 features were extracted from the PPG, GSR, and RESP signals, respectively, to create the DEAP ML models.

3.1.3. Feature Selection

A principal component analysis was performed on all the input features from each signal independently as well as on a combined model. Using the PCA decreased the time to fit the model. For example, the time to process using the Gradient Boosting model decreased by 26 s (the largest differential).

When applying the PCA to each of the signals independently, 21, 24, and 14 features were selected for the PPG, GSR, and RESP signals, respectively. Applying the PCA independently to each of the signals and then summing the total resulted in 59 selected features. However, when the PCA was applied to all three signals combined, 54 features were identified (Table 2). This could indicate that the combined PCA removed some of the features that were already identified within another principal component.

3.1.4. Classification

Although nine different ML models were compared, only the results from the models with the highest accuracy (and lowest fitting time) for each signal or grouping of signals are shown in Table 3. These models were trained using 60 s of DEAP data and used a PCA for feature identification. There was a slight improvement when using multiple signals as compared to the independent signals of PPG, GSR, and RESP (23% accuracy as compared to 21%). Overall, the SVM-rbf had the best accuracy along with the lowest fitting times, while the Gradient Boosting and SVM-poly models had the worst fitting times and inferior accuracies.

3.1.5. Evaluation—DEAP Model

The output accuracy of the scores for pleasure, arousal, and dominance for each of the ML models and each level of complexity were calculated (Table 4 reports the accuracy of the best ML model for the pleasure, arousal, and dominance scores relative to the level of complexity). For levels of complexity of nine, three (high, neutral, and low), and two (high and low) categories of PAD scores, to ensure the model was performing above chance, an accuracy higher than 11%, 33%, and 50%, respectively, was required. All the models outperformed chance, and the most accurate ML model was the SVM-rbf in all cases except the two-category arousal. However, the accuracy was not sufficient for identifying the complexities of emotions, especially across nine categories. Using nine categories, the accuracies were 20%, 21%, and 24% for pleasure, arousal, and dominance, respectively.

To apply this research to emotion recognition, the mapping of PAD scores onto emotions is an important output. The EI model described earlier [15] was used to map the participants’ self-reported PAD scores onto the terminology that represents emotions. The physiological signals were used to identify the PAD scores, which were then mapped onto emotions. These results were compared to the EI model. If the output emotion from the physiological signal matched the output emotion from the EI model, it was considered accurate. Using the DEAP dataset with the TD participants, the model correctly predicted the emotion with an accuracy level of 24%.

3.2. SMCI Model Adaptations

Once a reliable model was developed using data from members of the typically developing population (DEAP dataset), it could also be applied to data collected from a population of persons with SMCIs. Four of the participants with SMCIs were able to use an alternative communication device to provide a self-reported emotion word to express how they were feeling after each image was shown. This provided the ground-truth data for the model.

The same experimental protocol used for the DEAP dataset could not be replicated with participants who had SMCIs. Changes in the data collection were required and the duration of the trials was reduced. As a result, the DEAP model had to be modified to account for these differences (Figure 2). The following changes to the DEAP model were implemented (and are further discussed): the removal of the RESP signal, the cleaning of the BDAT data was required using Neurokit (version 2), modifying the duration of the data samples (the BDAT data was only collected for 6 s, while the DEAP data was collected for 60 s), and the use of level 5 wavelets instead of the level 8 wavelets. Instead of looking at the various model complexities with different categories, the nine categories were evaluated and the emotion word from the detected physiological signals was compared to the DEAP equivalent.

3.2.1. Pre-Processing

When creating the adapted SMCI model, the DEAP data were used for training and testing; however, the RESP signal was removed. The Gazepoint system used for the data collection of the participants with SMCIs did not collect the respiration data. The average baseline GSR was also subtracted from all the values to normalize the data and keep them within the same range as the DEAP GSR data.

The data were cleaned using the neurokit process function. This cleaning function was conducted on the pre-processed DEAP data in addition to the BDAT data. The additional cleaning phase decreased the training accuracy by less than one percent for each of the pleasure, arousal, and dominance values from the original DEAP model.

Since the stimuli in the BDAT dataset were only shown for six seconds, an adaptation to the DEAP model was required to decrease the duration of the training trials in the model. However, as the duration decreased, the number of unique outputs for validation also decreased. For the DEAP data, neurokit could not process the training data that were less than 11 s due to the limited number of peaks. Thus, the SMCI model was trained on trials of 12 s from the DEAP dataset, while the BDAT data test set for persons with SMCIs was 6 s (neurokit was able to process original data from the SMCI participants of over 6 s).

3.2.2. Feature Extraction

When extracting features with the adapted model and the SMCI data, the Daubechies 4 level 8 wavelet was not compatible. The highest wavelet transform that could be run was a level 5 wavelet. As a result, the adapted model for SMCIs used a Daubechies 4 level 5 wavelet for the PPG and GSR signals. For the PPG and GSR signals, the number of features extracted with level 5 wavelets was 72 for each.

3.2.3. Feature Selection

No significant changes were made in the PCA parameters for the adapted model, and the PCA was refit with a 95% explained variance threshold. Before the PCA, there were 144 features and after the PCA there were 16 features.

3.2.4. Classification

The nine-category classification using the adaptive SMCI model was trained and tested using the DEAP data before using the BDAT dataset for testing. The DEAP model was compared to the SMCI model on the same DEAP training data, with a maximum difference of 2% accuracy between the two models. The pleasure and dominance results of the SMCI models were similar in accuracy to the original DEAP models (Table 5).

3.2.5. Evaluation—BDAT Model

The SMCI model using physiological signals from participants with SMCIs was combined with the EI model to identify a word that represented an emotion. This was validated against the word provided by the participants with SMCIs. The accuracy of identifying an emotion from six potential outcomes was 14.81% (less than chance, one-sixth or 16%).

The most likely explanation for the poor outcomes is the data collection issues. Of the 10 participants (9 SMCIs and 1 TD), only 5 had valid (continuously recorded data throughout the trial) GSR and HR data values for at least 1 image stimulus (out of a potential 30 images). SMCI01 and SMCI07 each had valid data for only 1 image, while SMCI08 had valid data for 13 images, and SMCI09 had valid data for 9 images. The typically developing participant (TDM01) had valid data for two images. Therefore, 84.62% of the valid data came from two participants (SMCI08 and SMCI09). The model only predicted the correct emotions for two participants: TDM01 and SMCI08. The accuracy of predicting the emotion was 50% for TDM01 and 23.08% for SMCI08. The small amount of data available for validation may have led to overfitting, though the DEAP dataset was used for training while the BDAT set was used for the validation phase.

There were 20 different IAPS image stimuli that had valid GSR and HR data values that could be used to predict emotions out of the 300 IAPS image stimuli that were collected from the participants with SMCIs. While the protocol had been tested and checked, it became evident that flexibility in data collection was required when working with participants with disabilities. From the twenty images with usable data, six were viewed by two participants, while fourteen were viewed by only one. Of those, two image stimuli resulted in physiological signals that were mapped onto the intended emotion with 100% accuracy, while another two resulted in 50% accuracy. The other sixteen image stimuli resulted in physiological signals that could not be correctly mapped onto an emotion.

The model was further tested on only the “good data”. If the model could not predict the emotions of a specific participant, that participant’s data were removed. The accuracy of the model improved from 14.81% to 33.33%. It was evident that the limited number of image stimuli that had valid data and the small number of participants with valid GSR and HR data were the main factors contributing to the lower accuracy of the model.

4. Discussion

The focus of this study was to develop a model that could predict emotion from the physiological signals of children with SMCIs. Based on the DEAP model, data from typically developing individuals could be used to identify emotions with results greater than chance. However, when the model was applied to data collected from children with SMCIs, the results were less convincing. Only in one case, SMCI08, were the emotional outcomes similar in accuracy to the DEAP dataset.

During our study of the data from participants with SMCIs, their emotional responses were found to be difficult to assess due to the complexity of mixed and base emotions. A component of this may have also been influenced by the natural ability to hide or suppress emotion. A few participants tried to be explicit in the expression of their emotions so that their caregivers could easily interpret their feelings, while others chose to artificially express their feelings (based on discussions with the caregivers during breaks in the testing). Though some of the children with SMCIs did not pay any attention to our attempts to assess their emotions (in conjunction with their caregivers), some were able to focus and provide an accurate description of their feelings. If participants were given the opportunity to wear a device in a longer-term study, the novelty of “tricking the system” would diminish. This would enhance the ability to record data that are more meaningful from a research perspective as the system would be a part of their regular environment. This would lead to fewer attempts to hide emotions.

This study used the SAM tool that requires non-verbal interaction; however, the participant does require communication skills to navigate this method, and the responses are based on a cognitive understanding of emotion. Though self-reporting is essential to understanding emotion, it relies on participants being able to comprehend their emotions fully and reliably, and successfully communicate these feelings to researchers [20]. The SAM tool can add an additional barrier, as the participants have to be able to understand the nuanced components of pleasure, arousal, and dominance without prior experience. We found that using words attributable to emotions was more attenable for some of the children with SMCIs, especially when they had difficulty with the PAD concepts.

Some of the participants could not hold their hand steady enough to keep the biophysiological signals recording device secure enough to enable the recording of accurate signals. This resulted in a reduced amount of data collected. A device that attached securely to the participants’ torsos or larger limbs could be more beneficial than a hand sensor for collecting accurate data. Better data from more participants with disabilities would enhance the accuracy of the model and allow it to be more generalizable. Following suggestions from Collins et al. [27] would enhance this data collection process.

When training the model, different training durations were evaluated. Based on the standard IAPS protocol, images should be shown for six seconds, which was implemented in this study for the participants with SMCIs. However, Verduyn et al. [32] found that different emotions have different durations. It is possible that different time windows may allow for the better identification of different emotions. Since this study was focused on the six basic emotions, it is possible that the duration did not allow for accurate identification. Verduyn et al. [32] also found that women feel emotion for a longer time period than men. Since the DEAP and SMCI models were not person-specific, nuances in emotional response were not evaluated and may be considered in future work.

Another aspect that could contribute to accuracy when detecting emotions is that different events’ importance also causes different durations of feeling an emotion. Verduyn et al. [32] found that different training durations were better or worse depending on the duration of the signal evaluated. A song (which was used to collect the DEAP dataset) may elicit an emotion for longer than the IAPS image used in the BDAT dataset. This suggests that the stimulus used in the training of the model should be the same as the stimulus used for testing. Additionally, when training a model for everyday use, more environmentally suitable stimuli are required.

Kim and Andre [33] found that a person-dependent model is more accurate than an independent model for identifying emotions (using a pseudoinverse linear determinant analysis). Person-dependent models could rely on data collected over longer timeframes and that are updated based on prior experiences. Given the diversity of the population of persons with disabilities, it is more appropriate to consider a person-specific model that focuses on one individual and updates the model as it learns from changes in their physiological signals and emotions over time. By developing person-dependent models, rather than having a generalized model that applies to the typically developed population, individualized algorithms would be able to learn and develop based on the individuals for whom they were designed.

This study looked at basic emotions since initial responses are thought to be linked to neurological processes. Some words that are used for emotional expression are similar and can be mapped onto a single word, but there are some emotion words that can contain a combination of two or more basic emotions. One participant, who responded using her own assistive communication device that her mother had programmed, was limited to only the six basic emotion words. The participant very distinctly identified first surprise and subsequently fear for the same image. This furthers the idea that there is not one expression that can explicitly identify surprise but two: one is a happy surprise such as a surprise birthday party, and one is a fearful surprise like seeing a shark in the water. This is also consistent with a review by Saganowski et al., who identified that the meaning of some discrete emotions is unclear, in that case referring to the differentiation between anger and frustration [20].

To the best of our knowledge, no researchers have attempted to model all nine levels of emotional response from the PAD scores, and most researchers have not attempted to model dominance. The choice to include all the levels for pleasure, arousal, and dominance was undertaken as a first step in the design of a system that could provide information about the emotional expression of an individual who has severe motor and communication impairments. The expansion of the research to include all aspects of emotion, rather than only positive and negative, identifies the importance of the pursuit towards more useable and applicable measures of emotion.

5. Conclusions and Future Work

This study demonstrates the feasibility of using a machine learning model to identify emotions in children with severe motor and communication impairments (SMCIs) by analyzing their physiological signals. By adapting a model developed for typically developing participants from the DEAP dataset, we created a version suitable for SMCI participants using the BDAT dataset. Key adaptations, such as removing the respiratory signal and training the model on shorter duration segments, allowed for comparable accuracy levels to be identified. The model performed better than chance, suggesting the potential for future refinement and application in healthcare settings.

Future work should focus on expanding the dataset for SMCI participants, integrating additional signals like ECG for improved accuracy, and developing personalized models to account for individual variations. Incorporating the dominance dimension and recognizing a broader range of emotions could further enhance the model’s utility. This research highlights the potential of machine learning to improve the understanding of and supporting the emotional well-being of children with complex communication needs.

Data that include children with SMCIs are required to better understand how their physiological signals may be the same or differ from the typically developing population. Some research has indicated that heart, skin, and lung activity differ in populations with diseases, but the specific physiological differences have not been described relative to the various populations of persons with disabilities. This severely limits the knowledge base about how these sensors function for populations of participants with neurological disabilities, such as SMCIs.

Future studies could also introduce the collection of data from different placements of sensors to identify whether that too affects the data relative to typical developing populations. If the GSR could be optimized for other locations on the body, there may be opportunities for the collection of longer-term data. Placing the sensors on the hand creates difficulties, especially for persons with spastic movements or discomfort. Though ECG is currently not used often in daily applications for collecting HR data, with the newer models of watches these data will be more accessible in the future and should be used for assessing emotion.

The development of a device that could detect the emotions of children with SMCIs and then provide that information to an unfamiliar person, such as a doctor or nurse or even a friend at school, could greatly enhance the responses of these individuals and allow them to better cater to the needs of the child.

Author Contributions

Conceptualization, C.V. and T.C.D.; methodology, T.C.D.; software, C.V. and K.P.; validation, C.V.; formal analysis, C.V.; data curation, C.V; writing—original draft preparation, C.V. and K.P.; writing—review and editing, T.C.D.; supervision, T.C.D.; project administration, C.V. and T.C.D.; funding acquisition, T.C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Discovery Grant from the Natural Science and Engineering Research Council of Canada [NSERC RGPIN 2016-04669 and RGPIN 2023-05354] and a CREATE grant [497303-2017].

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by Queen’s University Health Sciences and Affiliated Teaching Hospitals’ Research Ethics Board (HSREB) on 24 August 2022 (MECH-76-22).

Informed Consent Statement

Informed consent was obtained from the parents of the participants involved in this study while the participants provided their informed assent.

Data Availability Statement

All the data from the development of the DEAP model are available online [16]. Permission was obtained to collect physiological data from parents and participants for the BDAT dataset, but not to share them publicly.

Acknowledgments

We would like to acknowledge the support of and data collection assistance from Mackenzie Collins and Sydney van Engelen.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cohen, E.; Kuo, D.Z.; Agrawal, R.; Berry, J.G.; Bhagat, S.K.; Simon, T.D.; Srivastava, R. Children with medical complexity: An emerging population for clinical and research initiatives. Pediatrics 2011, 127, 529–538. [Google Scholar] [CrossRef] [PubMed]
Cohen, E.; Berry, J.G.; Camacho, X.; Anderson, G.; Wodchis, W.; Guttmann, A. Patterns and costs of health care use of children with medical complexity. Pediatrics 2012, 130, e1463–e1470. [Google Scholar] [CrossRef]
Berry, J.G. What Children with Medical Complexity, Their Families, and Healthcare Providers Deserve from an Ideal Healthcare System; Lucile Packard’s Foundation for Children’s Health: Palo Alto, CA, USA, 2015. [Google Scholar]
Pollard, E.L.; Lee, P.D. Child Well-being: A Systematic Review of the Literature. Soc. Indic. Res. 2003, 61, 59–78. [Google Scholar] [CrossRef]
Noyek, S.; Vowles, C.; Batorowicz, B.; Davies, C.; Fayed, N. Direct assessment of emotional well-being from children with severe motor and communication impairment: A systematic review. Disabil. Rehabil. Assist. Technol. 2022, 17, 501–514. [Google Scholar] [CrossRef]
Noyek, S.; Davies, C.; Champagne, M.; Batorowicz, B.; Fayed, N. Emotional Well-Being of Children and Youth with Severe Motor and Communication Impairment: A Conceptual Understanding. Dev. Neurorehabilit. 2022, 25, 554–575. [Google Scholar] [CrossRef] [PubMed]
United Nations. Convention on the Rights of the Child; UN: New York, NY, USA, 1990. [Google Scholar]
Katsis, C.D.; Katertsidis, N.; Ganiatsas, G.; Fotiadis, D.I. Toward Emotion Recognition in Car-Racing Drivers: A Biosignal Processing Approach. IEEE Trans. Syst. Man. Cybern. Part A Syst. Hum. 2008, 38, 502–512. [Google Scholar] [CrossRef]
Yoo, G.; Hong, S. Emotion Evaluation Analysis and System Design of Biosignal. In Proceedings of the 2016 International Conference on Platform Technology and Service (PlatCon), Jeju, Republic of Korea, 15–17 February 2016; pp. 1–4. [Google Scholar]
Goshvarpour, A.; Abbasi, A.; Goshvarpour, A. An accurate emotion recognition system using ECG and GSR signals and matching pursuit method. Biomed. J. 2017, 40, 355–368. [Google Scholar] [CrossRef]
Domínguez-Jiménez, J.A.; Campo-Landines, K.C.; Martínez-Santos, J.C.; Delahoz, E.J.; Contreras-Ortiz, S.H. A machine learning model for emotion recognition from physiological signals. Biomed. Signal Process. Control 2020, 55, 101646. [Google Scholar] [CrossRef]
Choi, K.-H.; Kim, J.; Kwon, O.S.; Kim, M.J.; Ryu, Y.H.; Park, J.-E. Is heart rate variability (HRV) an adequate tool for evaluating human emotions? A focus on the use of the International Affective Picture System (IAPS). Psychiatry Res. 2017, 251, 192–196. [Google Scholar] [CrossRef]
Broekens, J.; Brinkman, W.-P. AffectButton: A method for reliable and valid affective self-report. Int. J. Hum. Comput. Stud. 2013, 71, 641–667. [Google Scholar] [CrossRef]
Zhang, S.; Wu, Z.; Meng, H.M.; Cai, L. Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar. In Modeling Machine Emotions for Realizing Intelligence; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Vowles, C.J. Development and Testing of a System to Interpret Emotion for Children with Severe Motor and Communication Impairments (SMCI). Ph.D. Thesis, Queen’s University, Kingston, ON, Canada, 2025. [Google Scholar]
Ekman, P. Universal Facial Expressions of Emotion. Calif. Ment. Health Res. Dig. 1970, 8, 151–158. [Google Scholar]
Koelstra, S.; Muhl, C.; Soleymani, M.; Jong-Seok, L.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
Rachael, E.J.; Garrod, O.G.B.; Schyns, P.G. Dynamic Facial Expressions of Emotion Transmit an Evolving Hierarchy of Signals over Time. Curr. Biol. 2014, 24, 187–192. [Google Scholar] [CrossRef]
Cowen, A.S.; Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. USA 2017, 114, E7900–E7909. [Google Scholar] [CrossRef] [PubMed]
Saganowski, S.; Perz, B.; Polak, A.G.; Kazienko, P. Emotion Recognition for Everyday Life Using Physiological Signals From Wearables: A Systematic Literature Review. IEEE Trans. Affect. Comput. 2023, 14, 1876–1897. [Google Scholar] [CrossRef]
Vowles, C.; Collins, M.C.; Davies, T.C. Assessing Basic Emotion via Machine Learning: Comparative Analysis of Number of Basic Emotions and Algorithms. In Proceedings of the Annual International Conference IEEE Engineering in Medicine and Biology Society, Orlando, FL, USA, 15–19 July 2024. [Google Scholar]
Agrafioti, F.; Hatzinakos, D.; Anderson, A.K. ECG Pattern Analysis for Emotion Detection. IEEE Trans. Affect. Comput. 2012, 3, 102–115. [Google Scholar] [CrossRef]
Zied, G.; Lachiri, Z.; Maaoui, C.; Pruski, A. Emotion recognition from physiological signals using fusion of wavelet based features. In Proceedings of the 7th International Conference on Modelling, Identification and Control (ICMIC), Sousse, Tunisia, 18–20 December 2015. [Google Scholar] [CrossRef]
Goshvarpour, A.; Goshvarpour, A. Evaluation of Novel Entropy-Based Complex Wavelet Sub-bands Measures of PPG in an Emotion Recognition System. J. Med. Biol. Eng. 2020, 40, 451–461. [Google Scholar] [CrossRef]
Health Canada. List of Recognized Standards for Medical Devices; Health Canada: Ottawa, ON, Canada, 2019. [Google Scholar]
Baxter, S.; Enderby, P.; Evans, P.; Judge, S. Interventions using high-technology communication devices: A state of the art review. Folia Phoniatr. Logop. 2012, 64, 137–144. [Google Scholar] [CrossRef]
Collins, M.L.; Vowles, C.; Davies, T.C. Challenges with recruitment, collection, and analysis: A research study on physiological signals and emotional experiences in youth with severe motor and communication impairments. In Beyond Tech Fixes: Towards an AI Future Where Disability Justice Thrives; El-Lahib, Y., El Morr, C., Gorman, R., Eds.; Springer: Toronto, ON, Canada, 2025. [Google Scholar]
Lang, P.J.; Bradley, M.M.; Cuthbert, B.N. International Affective Picture System (IAPS): Affective Ratings of Pictures and Instruction Manual; University of Florida: Gainesville, FL, USA, 2008. [Google Scholar]
Scikit-Learn Developers. User Guide. 2024. Available online: https://scikit-learn.org/stable/user_guide.html (accessed on 26 February 2025).
Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.A. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 2021, 53, 1689–1969. [Google Scholar] [CrossRef]
Holbrook, R.; Cook, A. Principal Component Analysis: Discover New Features by Analyzing Variation. 2025. Available online: https://www.kaggle.com/code/ryanholbrook/principal-component-analysis/data (accessed on 15 August 2024).
Verduyn, P.; Delvaux, E.; Van Coillie, H.; Tuerlinckx, F.; Van Mechelen, I. Predicting the duration of emotional experience: Two experience sampling studies. Emotions 2009, 9, 83–91. [Google Scholar] [CrossRef]
Kim, J.; Andre, E. Emotion Recognition Based on Physiological Changes in Music Listening. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 30, 2067–2083. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Algorithm development.

Figure 2. Modifications to the DEAP model when adapting for participants with SMCIs (the colours match the process steps identified in Figure 1).

Table 1. Datasets overview.

	DEAP Dataset	BDAT Dataset
Stimuli	40 Music Videos	30 images selected from IAPS
Signals	PPG, GSR, and RESP	PPG, GSR
Output	PAD	Emotion
Participants	32 Typically Developing	9 SMCIs, 1 TDM

Table 2. PCA number of features extracted.

Physiological Sensors	Features	PCA
PPG	108	21
GSR	108	24
RESP	84	14
TOTAL (individually)	300	59
ALL	300	54

Table 3. Comparative accuracy of ML models based on each signal or combination of signals.

Score		Pleasure	Arousal	Dominance	PAD Average
PPG	Model	SVM-rbf	SVM-linear	SVM-rbf
PPG	Accuracy	21%	21%	23%	21%
GSR	Model	SVM-rbf	SVM-rbf	SVM-rbf
GSR	Accuracy	20%	22%	20%	21%
RESP	Model	SVM-poly	SVM-linear	Random Forest
RESP	Accuracy	20%	21%	23%	21%
PPG, GSR, and RESP	Model	SVM-rbf	SVM-poly	SVM-rbf
PPG, GSR, and RESP	Accuracy	19%	21%	22%	21%
PPG and GSR	Model	SVM-rbf	SVM-rbf	SVM-rbf
PPG and GSR	Accuracy	20%	22%	26%	23%
Dimension Average	Accuracy	20%	21%	24%	22%

Table 4. Comparison of complexity (categories) of all three signals when fused (PPG, GSR, and RESP).

# of Categories	Scores
	Pleasure		Arousal		Dominance
	Model	Acc.	Model	Acc.	Model	Acc.
2	SVM-rbf	56%	Random Forest	60%	SVM-rbf	61%
3	SVM-rbf	47%	SVM-rbf	49%	SVM-rbf	54%
9	SVM-rbf	20%	SVM-rbf	21%	SVM-rbf	24%

Table 5. Comparison of models with different durations.

	DEAP Model (Level 8 Wavelets, 60 s Duration)		SMCI Model with Changes (Level 5 Wavelets, 12 s Duration)
	Model	Accuracy	Model	Accuracy
Pleasure	Random Forest	20%	SVM-rbf	20%
Arousal	SVM-rbf	21%	SVM-poly	19%
Dominance	SVM-rbf	22%	SVM-rbf	22%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vowles, C.; Patterson, K.; Davies, T.C. Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments. Appl. Sci. 2025, 15, 2850. https://doi.org/10.3390/app15052850

AMA Style

Vowles C, Patterson K, Davies TC. Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments. Applied Sciences. 2025; 15(5):2850. https://doi.org/10.3390/app15052850

Chicago/Turabian Style

Vowles, Caryn, Kate Patterson, and T. Claire Davies. 2025. "Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments" Applied Sciences 15, no. 5: 2850. https://doi.org/10.3390/app15052850

APA Style

Vowles, C., Patterson, K., & Davies, T. C. (2025). Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments. Applied Sciences, 15(5), 2850. https://doi.org/10.3390/app15052850

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development and Implementation of a Machine Learning Model to Identify Emotions in Children with Severe Motor and Communication Impairments

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.1.1. DEAP Dataset

2.1.2. BDAT Dataset

2.2. Algorithmic Approach (Figure 1)

2.2.1. Pre-Processing

2.2.2. Feature Extraction

2.2.3. Feature Selection

2.2.4. Classification

2.2.5. Evaluation

3. Results

3.1. DEAP Model

3.1.1. Pre-Processing

3.1.2. Feature Extraction

3.1.3. Feature Selection

3.1.4. Classification

3.1.5. Evaluation—DEAP Model

3.2. SMCI Model Adaptations

3.2.1. Pre-Processing

3.2.2. Feature Extraction

3.2.3. Feature Selection

3.2.4. Classification

3.2.5. Evaluation—BDAT Model

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI