[go: up one dir, main page]

0% found this document useful (0 votes)
82 views7 pages

A Machine Learning Emotion Detection Platform To Support Affective Well Being

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

A Machine Learning Emotion Detection Platform to


Support Affective Well Being

Michael Healy, Department Ryan Donovan, Department Paul Walsh, Department of Huiru Zheng, Department of
of Computer Science, Cork of Computer Science, Cork Computer Science, Cork Computer Science, Ulster
Insititute of Technology, Insititute of Technology, Insititute of Technology, University, Antrim,
Cork City, Ireland, Cork City, Ireland, Cork City, Ireland, Northern Ireland,
Michael.healy2@mycit.ie Brendan.donovan@mycit.ie paul.walsh@cit.ie h.zheng@ulster.ac.uk

Abstract- This paper describes a new emotional detection system To measure a phenomenon we first need to describe it [8].
based on a video feed in real-time. It demonstrates how a bespoke Yet despite the fact that scholars from multiple perspectives as
machine learning support vector machine (SVM) can be utilized far back as Plato have sought to explain emotions, nobody as yet
to provide quick and reliable classification. Features used in the has provided an agreed-upon definition. Even folk conceptions
study are 68-point facial landmarks. In a lab setting, the cannot be relied on, as people differentiate emotions based on
application has been trained to detect six different emotions by their raw conscious experience of those emotions [9]. This
monitoring changes in facial expressions. Its utility as a basis for method of subjective introspection is unsuited for objective
evaluating the emotional condition of people in situations using scientific categorization. As one prominent affective
video and machine learning is discussed.
neuroscientist wrote: “Unfortunately, one of the most significant
Keywords: Affective Computing; Machine Learning; Emotion
things ever said about emotions may be that everyone knows
Detection; what it is until they are asked to define it” [10]. Hence, the
essence of emotions still remains unclear.
I. INTRODUCTION This inability to define emotions has encouraged more
Emotions are an integral part of experiencing the world. systematic research. The component viewpoint, for example,
Functioning emotions help us to perceive, think, and act aims to identify the physical patterns that co-aside or underline
correctly. The crucial role of emotions in general well-being the experience of an emotion and what causes such responses
becomes self-evident when they become dysfunctional. [11]. These patterns can range from neuronal activity in the
Consider the fact that one of the main aims of psychotherapy is central-nervous-system, facial expressions that co-aside with
to help people deal with difficult emotions [1]; that the emotions, to general changes in behavior (e.g., fist clenching and
likelihood of experiencing psychopathology has been linked to tone of voice increasing while angry). Under this viewpoint,
the tendency to experience extreme levels of emotions [2]; that emotions are a set of sub-cortical goal-monitoring systems.
our ability to make seemingly innocuous and everyday choices,
such as what clothes to wear, becomes impaired if the areas This fits neatly with the basic emotion approach, which
related to emotions in the brain are damaged [3]. This latter separates emotions based on their ability to produce consistent
example of dysfunction is of particular concern as people yet particular forms of appraisals, action preparations,
increasingly live longer and as a result become more susceptible physiological patterns, and subjective experiences [12]. These
to neurodegenerative diseases, such as dementia. emotions are considered basic in the sense they are deeply rooted
adaptations that helped our ancestors navigate their social
The world’s aging population is increasing. In 2017, 13% of environments. Currently, there are six proposed universal basic
the general population were aged 60 or over. Some estimates emotions: Joy, Fear, Disgust, Anger, Surprise, Sadness. There
expect this percentage to double by 2050 [4]. As people get older also exists a ‘higher’ level of emotions that are more mediated
their likelihood of developing dementia sharply increases [5]. As by socio-cognitive factors (e.g. shame). One of the main
dementia becomes more prevalent, the necessity to deal with its characteristics that distinguish basic emotions from these latter
negative consequences becomes more pertinent. People with forms of emotions is the presence of universal signals, such as
dementia (PwD) tend to suffer from a variety of affective facial expressions. Based on this view, facial expressions offer
problems, which can damage their cognition, relationships, and researchers a way to measure at least a subset of key emotions.
general well-being [6]. Examples of these affective problems
are: difficulty in managing emotions, difficulty in articulating For this view to be correct, at least three strands of evidence
and expressing emotions, and increased levels of agitation and are required. First, facial expressions signaling emotions are
frustration. Furthermore, PwD are also at an increased risk to universal. Second, facial expressions are a valid or an ‘honest’
suffer from debilitating affective conditions such as depression, signal to underlying emotions. Third, we can reliably decipher
which could further damage their quality of life. In order to emotional expressions. On the first point, although there is still
understand how to effectively manage these problems, some debate on this issue, there is independent research
researchers need to be able to accurately measure emotions [7]]. indicating that facial emotional expressions are consistent cross-

978-1-5386-5488-0/18/$31.00 ©2018 IEEE 2694


culturally [13]. On the second point, in their review of the facial problem of the lack of data available to produce machine
expression literature, Schmidt & Cohn [14] came to the learning models, by leveraging existing libraries that can take
conclusion that facial expressions can be honest expressions of data from multiple different sources and combine them to
emotions. The main piece of evidence for this is that faking produce the initial model used in the system. Thus the SEV
emotional expressions is too cognitively demanding to emotion viewer was developed to analyse a live or pre-recorded
repeatedly maintain. Similarly, those who do repeatedly fake video stream while outputting emotional detection information
emotional expressions are more likely to be considered as in real-time.
having duplicitous motives. Therefore, cognitive limitations and
social pressures encourage honest emotional signaling via facial III. METHODOLOGY
expressions. On the third point, which will be the remaining
focus of this article, advances in affective computing have given The SEV emotion viewer was developed as a web service.
researchers means to quickly and accurately decipher emotional Video data is streamed to the SEV, using a standard PC or
expressions. laptop webcam. The results of the emotion detection are
feedback in real-time in a HTML webpage format. Figure 1
Affective computing is an emerging field that attempts to illustrates a high-level flow of events for the system.
model technology to detect, predict, and display emotions in the
goal of improving human-computer interactions [15], [16]. One
example of affective computing in action is the SenseCare
project, which aims to integrate multiple methods of emotion
detection, in order to provide objective insight into people’s
well-being [17], [18]. Another example is the SliceNet project
(https://5g-ppp.eu/slicenet/), which aims to detect patient
demenour, over 5G Networks in order to monitor patients in
ambulances. In this paper, we present the development of the
SliceNet Emotion Viewer (SEV), a real-time video-based Figure 1 High-Level Flow of Events in SEV
emotion detection application. A Support Vector Machines
(SVM) classifier is used to detect emotional expressions. The The discussions of the various methods and implementations
proposed and developed SEV can potentially be used to detect that were used to create the system is detailed below.
emotions in a variety of clinical settings, including ambulance.
1. Machine learning: Machine learning is a branch of
II. STATE OF THE ART artificial intelligence that gives computers the ability to learn
There have been a number of different approaches to detecting without being explicitly programmed. It was summarised by
emotion from facial detection. A team at the Jadavpur Tom M. Mitchell in his quote “A computer program is said to
University [19] proposed a way of detecting emotion by learn from experience E with respect to some class of tasks T
monitoring regions of the face such as the mouth and eyes and and performance measure P if its performance at tasks in T, as
applying fuzzy logic. Using this technique the authors achieved measured by P, improves with experience E” [22]. Machine
an accuracy of 90%. Research conducted by Philipp Michel learning algorithms are used in a wide range of applications and
adopted a SVM approach to classifying emotions [20]. The problems. Some modern uses of machine learning include spam
authors measured displacement of particular facial regions filtering, speech recognition, facial recognition, document
between a neutral and peak video frame. This displacement was classification, and natural language processing. There are many
categories of machine learning applications, the most commonly
used as an input parameter to a machine learning algorithm and
used are clustering, regression, and classification. A
an accuracy of 86% for still images and 71.8% for video streams
classification problem is a form of supervised learning which
was achieved. Other researchers focused on using the Particle aims to assign predetermined labels to unseen data based on
Swarm Optimization (PSO) algorithm for detecting action previous training examples. In this research, we apply a
muscle movements, known as action units, on the human face classification model to detect an emotion from facial
[21] which obtained an average success rate of 85%. expressions.

Much of the existing literature focuses on different feature To assist with the complex mathematics of creating a
extractions and methods of machine learning to achieve a high classification model we used a Support Vector Machine (SVM).
accuracy. There has been little or no development of software The SVM has been used in supervised learning to assist with the
tools that utilise this research to assist people in the areas of generation of a model by using built-in algorithms to find the
optimal hyperplane. The hyperplane is the largest separation of
health, well-being, and emotion functioning. Our aim is to
the classes from the training examples. New unseen examples
develop a system that is capable of analysing emotional data in are added to the same space and their class is predicted based on
real-time from either a live stream from a camera or a pre- which side of the gap they fill. The SVM used is called LIBSVM
recorded source such as Youtube. Given the evolving and [23]. LIBSVM is an integrated software library for support
dynamic nature of machine learning, the system shall be vector classification, regression, and distribution estimation. It
designed in such a way that the models used could easily be also has support for multi-class classification which enables the
replaced with new or updated models, without needing a code algorithm to compare the given data to multiple classes which
change to the system. Finally, we aimed to overcome the would be useful when attempting to classify multiple emotional

2695
states. LIBSVM was originally written in C but now has support
for a wide range of programming languages such as Java and
‫ܦ‬Ό΍ ൌ ඥሺ࢞૛  െ  ࢞૚ ሻଶ ൅  ሺ࢟૛ െ  ࢟૚ ሻଶ 
Python. Details of parameters used can be found in this paper ሺͳሻ
[24].
Where (x1, y1) is the coordinates of the first landmark and (x2,
2. Datasets Used: In order to create machine learning y2) is the coordinates of the second landmarkand D12 is this
models for the Emotion Viewer application, training data was straight line distance between them.
taken from the Cohn-Kanade database (CK+) [25] and
Multimedia Understanding Group (MUG) [26] database. Both
databases contain images of people in lab environments
displaying Ekman’s six basic emotions [12].
Table 1 Datasets Under Study

Name Subjects Images Emotions


CK+ 123 593 6
MUG 46 1496 6

3. Feature Extraction and Selection: There are many


different features which can be extracted from a human face to
be used as inputs in machine learning algorithms. In order to Figure 3 Sample of parameters inputs to the SVM classification
analyse in real-time, it is necessary to leverage existing methods model
for extracting features of a human face. One of these methods is
to use facial landmarks. Facial landmarks are defined as the IV. RESULTS & EXPERIMENTATION
detection and localisation of certain key points on a human face. This section of the paper will cover the training of the machine
They are also known as vertices or anchor points. There are learning model, graphical interface of the application, its run-
various different annotation styles available for detecting
time operations and will present examples of some ongoing
landmark points. The SEV uses the 68-point style facial
experiments currently using this software.
landmarks created by Multi-PIE [27], as illustrated in Figure 2.
Using the distances between all 68 landmark points results in
2278 features to be inputted into the SVM. This is intensive for
a CPU to compute and could result in a slow performance when
classifying multiple images in a real-time scenario. To
maximise the accuracy and efficiency of the model, we utilised
techniques outlined by Yi-Wei Chen and Chih-Jen Lin [28].
Following the steps outlined in the referenced paper, we
performed an F-1 analysis to calculate the discriminatory power
of each feature. This ranks all the features by their
discriminatory power values. Next, the highest 17th scoring
features were used in a grid search to find the hyperparameters
for “C” and “y” values which are used in the SVM algorithm.
The output of the grid search is the optimal accuracy that could
be achieved using these features as training data. This step was
repeated 8 times, each time doubling the number of features
Figure 2 Multi-PIE 68-point mark-up used (in descending order from highest scoring).
The Multi-PIE mark-up is applied to both the training data
and unseen data. It also allows the use of multiple training
databases which do not use compatible mark-ups in their own
annotations of the images. Using these facial landmarks,
classification features were chosen to be the Euclidean distance
from each point to every other point forming a mesh-like
structure. Where each straight line distance represents a
parameter. The Euclidean distance is calculated used the
straight-line formula (eq. 1).

2696
89.84%

89.54%
89.34%
88.03%
84.41%
78.47%
73.14%
64.08%
TESTING ACCURACY

17 35 71 142 284 569 1139 2278


NUMBER OF FEATURES Figure 6 Confusion Matrix
Figure 4 Accuracy versus Number of features

1. Graphical User Interface: As the project is a web service,


Figure 4 above shows the accuracy increases from 17 – 569 there is a graphical user interface in HTML. When the
features before reaching a plateau and even decreasing in application is running the user can enter the IP address and port
training accuracy. a number of the applications into a browser and they will be
brought to the home screen which can be seen in Figure 7.

Figure 5 Grid Search using 569 Features


Figure 7 Emotion Viewer Homepage
Figure 5 above shows the output from running the grid search
using 569 of the highest scoring features. As seen in the graph, The user is prompted, via browser notification, to allow the use
the hyperparameter for C is 64 and .5 for y which gives an of their webcam. If approved video from the user’s webcame
88.76% accuracy in k-10 fold cross validation. Figure 6 begins to stream to the server. There are two switch options
illustrates the confusion matrix of the trained model. All the available on the GUI, they are “Track Face” and “Track
classes perform well apart from the “Anger” class. This is Emotion”. The “Track Face” switch enables the face detection
largely due to the few samples from the “Anger” class used in feature which will draw a rectangle box around all detected
testing compared to the other classes. faces from the picture. The “Track Emotion” switch enables the
feature to receive emotional analysis for each frame. The

2697
“Track Face” option must be enabled before this feature can be testing on Youtube videos, it allowed us to test the speed at
used. which the system can classify frames from the video.

The next item on the GUI is the “Voting Count”. This was The first examples in Figure 8a and 8b are taken from a
implementended as a way to control the transitions between speech by Donald Trump during a peroid when certain news
emotions. For example, while a person is talking the emotion outlets had made accusations of wrong doing during his
detected will change repeatedly as the expressions of the face campagin. Throughout the speech Trump is visually distressed
tends to change. To overcome this, a particular emotion must which the Emotion Viewer detects. During the clip the detector
reach the user defined consecutive votes before the analysis text outputs a anger / disgust classification. This aligned closely
changes to that emotion. with the overall narrative of the speech. The full analysis is
available on Youtube with the following link.
2. Run-Time Operations: For reference purposes, the https://www.youtube.com/watch?v=NaCe8bchs9I&index=2&l
operating system that was used is Windows 10 x64 bit and the ist=PLwagddoyFHYZOCeOVoTnM2UFYKhyMuwEJ&t=0s
hardware used is as follows:
• Intel Core i7-8550U (Laptop)
• 8GB DDR4 RAM
• Nvidia GeForce MX150 (Mobile)
A Javascript file is executed which makes a web socket
connection to the SEV server once the webpage is loaded.
Immediately after the connection is established, the video
begins streaming the image data to the server. The rate at which
the images are streamed can be adjusted.

When the server receives the image data it begins an analysis.


Before the analysis can be conducted, all faces in the images
must be found. This is achieved using the Dlib library [29]
which contains a convolutional neural network trained in face
detection. Dlib is highly portable and contains very few
dependencies making it the ideal choice for the project. It also
has support for Nvidia’s CUDA library. CUDA is a parallel
computing platform that is used for general computing on
GPUs. Using CUDA requires a compatible GPU, however it
drastically speeds up the performance of the Emotion Viewer Figure 8a Screenshot from Trump analysis
application. A CUDA enabled and standard version of the
application has been developed. The second function of Dlib is
to extract the 68 landmark points from the detected face.

This information is then sent to the internal algorithms where


it is preprocessed. Then with the use of LIBSVM and the
pretrained model, a classification is made. This classification
result is returned to the client along with the image that contains
the rectangle box around the detected face. The client’s browser
displays this feed on the HTML page.

3. Experiments: The review of current literature outlined


some issues with using training data that was generated in a lab
setting. Given the emotional expressions were designed and not
spontaneous, this casts doubt about their external validity.
Therefore we decided to test the application by analysing
speeches taken from Youtube. During a speech a speaker can
exaggerate expressions or emotions as a way to engage more
and to captivate the attension of the attendees or listeners. It can
also help to drive the point they are trying to make. Also by

Figure 8b Screenshot from Trump analysis

2698
The next analysis was one that was in a sombre setting. The The last analysis featured a montage of people smiling and
video used was taken from former US president Barrack Obama laughing in a variety of different settings. Although most seem
discussing issues around gun control and referencing to school exaggerated, there is not much of a visual difference between a
shootings which he felt passionately about. The SEV detected true expression of happiness and an exaggerated expression of
two dominant emotions during this video, i.e. sadness and happiness. A screenshot can be seen in Figure 10. The full
anger. This is consistent with Obama visually shedding tears analysis video is also available on Youtube using the follow
and stating in the speech that “every time I think of those kids link:
it gets me mad” [0:54-0:58]. Some extracts from this analysis https://www.youtube.com/watch?v=pvjK5LVvz2A&index=4
can be seen in Figures 9a and 9b. The full analysis video is &list=PLwagddoyFHYZOCeOVoTnM2UFYKhyMuwEJ
available on Youtube
https://www.youtube.com/watch?v=q1VyU02wgzs&list=PLw
agddoyFHYZOCeOVoTnM2UFYKhyMuwEJ&index=2

Figure 9a Screenshot from Obama analysis


Figure 10 Screenshot from "Happy" video analysis

V. CONCLUSION
Facial expressions are a gateway to detecting emotions. The
ability to accurately make face-to-state classifications opens the
potential for researchers to investigate emotions in new settings.
In particular, this paper discussed the SEV platform, which uses
machine learning support vectors in the analysis of emotions on
real-time video. The results suggest that the prototype has
external validity, as the emotions detected were consistent with
the emotions presented by the speakers. Using the laptop which
was described in section 2 of results & experimentation, the
application could classify frames at a speed of 8 frames per
second. This could be improved by deploying the application to
a more powerful hardware and we hope to achieve classification
on a 30fps video in the future through the use of mobile edge
computing (MEC). The next step of this project will be to test
and evaluate the system in real time applications in a mobile
ambulance use case in the SliceNet project. However, given the
accuracy found in the results, the initial signs suggest that
affective computing research is close to providing a powerful
Figure 9b Screenshot from Obama analysis new tool to quickly and objectively determine fundamental
aspects of human well-being.

2699
VI. ACKNOWLEDGEMENT [16] A. K. a. P. W. M. Healy, “Prototype proof of concept for a mobile
agitation tracking system for use in elderly and dementia care use
The authors MH and PW are supported by the SliceNet cases,” in CERC, Cork, 2016.
project (Grant Number: 761913), JZ and RD are supported by [17] R. R. Bond, H. Zheng, H. Wang, M. D. Mulvenna, P. McAllister, K.
the SenseCare project (Grant Number: 690862) funded by Delaney , P. Walsh, A. Keary, R. Riestra and S. Guaylupo, “SenseCare:
European Commission Horizon 2020 Programme. using affective computing to manage and care for the emotional
wellbeing of older people,” in eHealth 360°, vol. 181, K. Giokas, B.
VII. REFERENCES Laszlo and F. Hopfgartner, Eds., Springer, 2017, pp. 352-356.
[18] P. W. Michael Healy, “Detecting Demeanor for Healthcare with
Machine,” in 2017 IEEE International Conference on Bioinformatics
[1] J. C. N. M. J. V. a. N. J. K. L. F. Campbell, “Recognition of and Biomedicine (BIBM), Missouri, 2017.
psychotherapy effectiveness: The APA resolution,” Psychotherapy, vol. [19] A. K. U. K. C. a. A. C. A. Chakraborty, “Emotion Recognition From
50, no. 1, p. 98, 2013. Facial Expressions and Its Control Using Fuzzy Logic,” IEEE
[2] K. L. D. a. J. Panksepp, The Emotional Foundations of Personality: A Transactions on Systems, Man, and Cybernetics - Part A: Systems and
Neurobiological and Evolutionary Approach, WW Norton & Company, Humans, vol. 39, no. 4, pp. 726-743, 2009.
2018. [20] P. a. R. E. K. Michel, “Real time facial expression recognition in video
[3] Y. L. P. V. a. K. S. K. J. S. Lerner, “Emotion and decision making,” using support vector machines.,” in Proceedings of the 5th
Annual review of psychology, vol. 66, 2015. international conference on Multimodal interfaces, 2003.
[4] United Nations, “Ageing,” 2017. [Online]. Available: [21] R. N. a. H. D. Bashir Mohammed Ghandi, “Real-Time System for
http://www.un.org/en/sections/issues-depth/ageing/. [Accessed 29 Facial Emotion Detection,” in 2010 IEEE Symposium on Industrial
August 2018]. Electronics and Applications, Penang, 2010.
[5] A. F. Jorm and D. Jolley, “The incidence of dementia: A meta- [22] T. M. H. Mitchell, Machine Learning, 1997.
analysis,” Neurology, vol. 51, no. 1, pp. 728-733, 1998. [23] C.-C. C. a. C.-J. Lin, “LIBSVM -- A Library for Support Vector
[6] A. Burns and S. Iliffe , “Dementia,” BMJ (Clinical Research), vol. 338, Machines,” 23 July 2018. [Online]. Available:
p. B75, 2009. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[7] M. Mulvenna, H. Zheng, R. Bond, P. McAlliser, H. Wang and R. [24] A. K. P. W. Michael Healy, “Preforming real-time emotion
Riestra, “Participatory design-based requirements elicitation involving classification using an Intel RealSense camera, multiple facial
people living with dementia towards a home-based platform to monitor expression databases and a Support Vector Machine,” in CERC,
emotional wellbeing,” in 2017 IEEE International Conference on Karlsruhe, 2017.
Bioinformatics and Biomedicine (BIBM), Kansas City, 2017. [25] J. F. C. T. K. J. S. Z. A. a. I. M. P. Lucey, “The Extended Cohn-Kanade
[8] R. B. Cattell, “The description of personality: Principles and findings in Dataset (CK+): A complete dataset for action unit and emotion-
a factor analysis,” The American Journal of Psychology, vol. 58, no. 1, specified expression,” in 2010 IEEE Computer Society Conference on
p. 69–90, 1945. Computer Vision and Pattern Recognition - Workshops, San Francisco,
[9] J. LeDoux, “Rethinking the emotional brain,” Neuron, vol. 73, no. 4, p. CA, 2010.
653–676, 2012. [26] C. P. a. A. D. N. Aifanti, “The MUG Facial Expression Database,” in
[10] J. LeDoux, The emotional brain: The mysterious underpinnings of Proc. 11th Int. Workshop on Image Analysis for Multimedia Interactive
emotional life, Simon and Schuster, 1998. Services (WIAMIS), Desenzano, 2010.
[11] K. R. Scherer, “What are emotions? And how can they be measured?,” [27] R. M. I. C. J. K. T. a. B. S. Gross, “Multi-Pie,” Image and Vision
Social science information, vol. 44, no. 4, p. 695–729, 2005. Computing, vol. 5, no. 28, pp. 807-813, 2010.
[12] P. Ekman, “Basic emotions,” Handbook of cognition and emotion, p. [28] Y.-W. Chen and C.-J. Lin, “Combining SVMs with Various Feature
45–60, 1999. Selection Strategies,” in Feature Extraction. Studies in Fuzziness and
Soft Computing, Heidelberg, Springer, 2006, pp. 315-324.
[13] M. G. F. a. H. S. H. D. Matsumoto, Nonverbal communication: Science
and applications, Sage, 2013. [29] D. E. King, “Dlib-ml: A Machine Learning Toolkit,” Journal of
Machine Learning Research, vol. 10, pp. 1755-1758, 2009.
[14] R. W. Picard, Affective computing, 1995.
[15] R. W. Picard, “Affective Computing for HCI,” presented at the HCI
(1), p. 829–833, 1999.

2700

You might also like