112346488
112346488
112346488
com
OR CLICK HERE
DOWLOAD EBOOK
(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James ISBN
9781459699816, 9781743365571, 9781925268492, 1459699815, 1743365578, 1925268497
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
ebooknice.com
(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans Heikne, Sanna
Bodemyr ISBN 9789127456600, 9127456609
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312
ebooknice.com
(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success) by Peterson's
ISBN 9780768906677, 0768906679
https://ebooknice.com/product/sat-ii-success-math-1c-and-2c-2002-peterson-s-sat-
ii-success-1722018
ebooknice.com
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT Subject Test: Math
Levels 1 & 2) by Arco ISBN 9780768923049, 0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-arco-master-
the-sat-subject-test-math-levels-1-2-2326094
ebooknice.com
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study: the United
States, 1919-41 2nd Edition by Benjamin Harrison ISBN 9781398375147, 9781398375048,
1398375144, 1398375047
https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044
ebooknice.com
(Ebook) Guide to Mobile Data Analytics in Refugee Scenarios: The 'Data for Refugees
Challenge' Study by Albert Ali Salah, Alex Pentland, Bruno Lepri, Emmanuel Letouzé
ISBN 9783030125530, 9783030125547, 303012553X, 3030125548
https://ebooknice.com/product/guide-to-mobile-data-analytics-in-refugee-
scenarios-the-data-for-refugees-challenge-study-10796392
ebooknice.com
https://ebooknice.com/product/the-wiley-handbook-of-personality-
assessment-5674904
ebooknice.com
(Ebook) Selected Sensor Circuits: From Data Sheet to Simulation by Peter Baumann
ISBN 9783658382117, 3658382112
https://ebooknice.com/product/selected-sensor-circuits-from-data-sheet-to-
simulation-46483066
ebooknice.com
(Ebook) Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by Bing Liu
ISBN 9783540378815, 3540378812
https://ebooknice.com/product/web-data-mining-exploring-hyperlinks-contents-and-
usage-data-42981538
ebooknice.com
Department of Sociology and Social Research
Master’s Degree in
Data Science
Final Dissertation
Supervisors Student
Bruno Lepri
Fausto Giunchiglia
In presenting this thesis as part of the requirements for obtaining a Master’s degree from the University
of Trento, I hereby grant permission for the University’s Libraries to make it available for inspection.
Furthermore, I consent to the potential copying of this thesis, either in its entirety or in part, for
scholarly purposes. Such permission may be granted by the professor or professors who supervised
my thesis work or, in their absence, by the Head of the Department in which my thesis research was
conducted.
I understand and acknowledge that any reproduction or publication of this thesis or its components
for commercial gain is strictly prohibited without my written consent. I also expect that proper
acknowledgment and credit will be given to both me and the University of Trento in any scholarly use
of the material from my thesis.
For inquiries or requests related to the reproduction or use of material from this thesis, whether in
whole or in part, please direct your correspondence to the Head of the Department of Sociology and
Social Research at the following address:
I extend my gratitude to Dr. Ivano Bison, my supervisor, for giving me the opportunity to pursue this
work, providing guidance along the way, and fostering an inspiring and supportive environment that
made this journey both meaningful and enjoyable. I am grateful to Dr. Bruno Lepri and Dr. Fausto
Giunchiglia, who agreed to co-supervise this work.
My heartfelt appreciation goes out to my parents for supporting my decision to pursue a master’s degree.
I am deeply grateful to my friends, whose companionship made this journey truly unforgettable.
I am grateful to all the benevolent individuals whose selfless acts of kindness, whether through their
advice, shared wisdom, or a word of encouragement, have made a profound impact on my journey.
Lastly, I would like to express my gratitude to all the professors and colleagues who played a part in
this academic endeavor. Thank you all for your contributions.
Contents
Abstract 3
1 Introduction 5
2 Literature Review 9
2.1 Evolution of Personality Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Personality Traits and the Big Five Model . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Smartphone data and Personality traits . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Rationale for Our Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Background 13
3.1 Supervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Ordinary Least Square Regression . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 LASSO Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.4 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.5 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.6 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.7 K-Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.8 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Baseline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Mean Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 ZeroR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Methods 21
4.1 Study Design and Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Ethics and privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 The Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.1 Big Five Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.2 App Usage Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.3 Bluetooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.4 Ring Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.5 Step Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.6 Touch Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.7 Music Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.8 Screen Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.9 Battery Charge Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.10 Doze Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1
4.5.1 Univariate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5.2 Multivariate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Prediction Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.1 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6.2 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Results 61
5.1 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1 Statistical Analysis (RMSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.2 Fit Line Plots Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.3 Residual Plots Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 Statistical Analysis (Accuracy) . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography 83
2
Abstract
Human behavior is inherently intricate, often challenging to explain through traditional mathematical
models. To simplify this complexity, researchers frequently develop intermediate psychological models
that capture specific facets of human behavior. These intermediate models, often derived from per-
sonality assessments, undergo validation using established survey instruments and tend to correlate
with observable behaviors. Typically, these constructs are utilized to predict specific, standardized
aspects of behavior.
The advent of novel sensing systems has ushered in an era of remarkably precise behavior tracking,
raising the intriguing question of whether the reverse process is feasible: Can we deduce psychological
constructs for individuals from their behavioral data? Modern smartphones are equipped with an
array of sensors capable of capturing, filtering, combining, and analyzing data to generate abstract
measures of human behavior. The ability to extract personal profiles or personality types directly
from mobile phone data, without requiring participant interaction, holds potential applications in
marketing, as well as in the initiation of social or health interventions.
In this study, our aim is to model a well-established personality inventory—the Big Five framework
[17]. Activities of students were observed over a 2 months period using parameters readily available
from the smartphone sensors of participants. Correlation analyses were performed and Supervised
machine learning algorithms were implemented along with cross validation to make predictions about
their personality traits using smartphone sensor data. The study illustrated that the root mean
squared error was of a magnitude that allows for actionable predictions regarding an individual’s
personality based on smartphone data.
3
1 Introduction
In today’s age of technology, smartphones have become an integral part of our daily lives. We use them
not just for communication but also for entertainment, education, productivity, navigation and so on.
What many don’t realize however is that these devices are also capable of collecting vast amounts of
data about us, even when we are not actively using them. With more than a dozen sensors housed
within them, smartphones can track our movements, measure our physiological responses, and even
monitor our environment.
This wealth of data presents a unique opportunity to gain insights into human behavior and
personality traits. In particular, it offers the potential to measure and predict personality, a crucial
aspect of human psychology that has long been studied only through self-reported questionnaires.
Traditional ways of measuring personality through questionnaires have limitations [21]. They require
a huge amount of time, resources and effort to administer the tests and there still remains the potential
for bias. Such data collection can be biased due to social desirability bias or the individual’s own lack
of self-awareness. Hence, smartphones provide a new approach for researchers to measure personality
in a more objective and passive manner leveraging vast amounts of sensor generated data [36].
Some of the conventional ways of understanding personality traits are using the Myers-Briggs
Type Indicator (MBTI) [32], Big Five Personality Inventory [17] or NEO personality inventory [9].
The MBTI approach was based on Carl Jung’s theory of personality types and includes 16 different
personality types based on four dichotomies: Extraversion vs Introversion, Sensing vs. Intuition,
Thinking vs. Feeling and Judging vs. Perceiving. This test was used to help individuals understand
their own preferences and how they interacted with others, and could be used in personal development,
team building, and leadership training. The NEO personality inventory measured an individual’s
personality across five dimensions such as openness, conscientiousness, extraversion, agreeableness,
and neuroticism. The NEO-PI consists of 240 questions. Later a more compact approach gained
popularity which was also based on the same five factor model (FFM) and was named The Big
Five Inventory (BFI). It used only 44 questions instead of 240 questions in the NEO-PI approach.
Another trade off between the two approaches is that the NEO-PI measures more specific facets within
each dimension, whereas BFI assesses only the basic level of each dimension. Though the Big Five
framework comes with its own flaws and has been subjected to criticisms on several instances for its
inability to capture overall behavioral characteristics, this framework remains one of the most widely
accepted inventory with consistent results across populations [45].
Based on the Big Five framework, every individual’s personality consists of five latent dimensions,
such as Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. The definitions
and descriptions of these Big Five personality traits are as follows:
1. Openness - This trait refers to a person’s openness to new experiences, ideas, and ways of
thinking. Open individuals tend to be imaginative, curious, and creative. People who score high
in Openness may be more willing to challenge traditional ways of doing things and to think
outside the box. People who score low on this trait tend to be more practical and focused on
the present.
2. Conscientiousness - This trait refers to a person’s level of organization, responsibility, and self-
discipline. Conscientious individuals tend to be reliable, hardworking, and detail-oriented. This
is also called the orderliness dimension. People with high Conscientiousness happen to be very
orderly and organized. People who score low on this trait tend to be more laid-back and less
focused on achieving specific goals.
3. Extraversion - This trait refers to a person’s level of sociability, outgoingness, and assertiveness.
5
They enjoy being around people and tend to be energized by social interactions. They may be
seen as talkative and enthusiastic. People who score low on this trait tend to be more introverted
and prefer quiet, solitary activities.
4. Agreeableness - This trait refers to a person’s level of cooperativeness, empathy, and kindness.
Agreeable individuals tend to be friendly, compassionate, and willing to compromise. People
who score low on this trait tend to be more competitive and may prioritize their own needs over
the needs of others.
5. Neuroticism - People who score high on this trait tend to experience more negative emotions,
such as anxiety, stress, and sadness. They may be more sensitive to criticism and tend to worry
more than others. People who score low on this trait tend to be more emotionally stable and
less prone to experiencing intense negative emotions and more frequent mood swings.
Ideally, an assessment of personality traits should be done in an unobtrusive manner to ensure unbi-
asedness. An assessment is considered unobtrusive if it does not require any attention of the person
being assessed [25, 52]. This not only makes the assessment more convenient for the person being
assessed and can remove subjective bias, but also reduces the risks of measurements being affected by
modified behavior due to the assessed person being consciously aware of the assessment [25]. This is
where modern smartphones come to the rescue, since they come loaded with a host of sensors which
can be employed to unobtrusively gather data about behavior of individuals [11]. Smartphones are a
good option for this kind of study also because they are already widely used and are routinely carried
around by people for most of their day [21]. Physical as well as logical sensors related to location,
communication, phone state (e.g. screen lit, charging status), phone orientation, connections to other
devices and to the internet can be put to use to understand activities like movements, interactions
and daily habits [24].
The purpose of this study is to establish whether individual differences in personality traits can
be detected through data collected from smartphones. For example, users who are actively using
communication applications like whatsapp and telegram may score high on Extraversion. People who
use productivity apps like calculator, calendar and to-do lists may score high on Conscientiousness.
These hypothetical examples serve as guidelines to study the correlation of data with personality.
With the use of a host of exploratory analysis, correlation analysis and machine learning algorithms
plenty of existing research has already linked behavioral indicators derived from smartphone data to
personality traits. However, as far as our knowledge goes, there is no publicly available dataset for
investigating these connections. Therefore, we utilized data from the WeNet study, which was designed
to address this and other research gaps. In our research, we performed an extensive exploratory
analysis, considering self-assessed personality traits and indicators derived from smartphone data.
Using feature selection, we determined indicators that were informative about the personality of
people. We then adopted a predictive approach using linear as well as non-linear models, with a
specific focus on whether combinations of features extracted from various smartphone sensors could
assist in predicting individuals’ personality traits.
My main hypothesis is that an individual’s personality traits, specifically those related to the Big
Five personality traits, can be predicted using real-world behavior data collected from smartphones.
6
This thesis is composed of 6 chapters. Chapter 2 furnishes the foundational context necessary for
this thesis. It covers previous research related to human behavior using diverse types of technology-
mediated data and also delves into their constraints. It also addresses various types of personality
traits. In Chapter 3, the research background is outlined which includes various machine learning
models, baseline models and model evaluation techniques. In Chapter 4 the experimental configura-
tion for gathering necessary data is outlined, along with thorough explanations of feature extraction
techniques. This chapter also elucidates the methodologies and machine learning models utilized.
Chapter 5 showcases the outcomes, while Chapter 6 offers a summary of the results and the signifi-
cant contributions made in this research.
7
2 Literature Review
2.1 Evolution of Personality Concepts
Personality traits are patterns of thought, emotion, and behavior that are relatively consistent over
time and across situations. They can be described with familiar words such as “reliable”, “sociable”, or
“cheerful”, as well as more specialized terms such as “narcissistic”, “authoritarian”, or “conscientious”.
Psychology has developed an impressive and useful technology for assessing personality traits, but
personality assessment is not limited to psychologists: Everybody does it, every day. We all make
judgments about our own personalities as well as of the personalities of people we meet, and these
judgments are consequential [13].
Several non-human animal species also exhibit individual differences in behavioral patterns, indi-
cating possible existence of personality traits that may even predate humanity [18]. It is likely that
humans have long observed these differences among members of their community, even before they
had a means to effectively record or communicate these ideas through writings. However, the specific
ways in which prehistoric people conceptualized these differences, such as those between an individual
who excelled at cave painting and one who was skilled at ensuring the safety of fellow tribe members
during hunting, are likely lost to history.
The earliest known theory of personality can be traced back to the Greek physician Hippocrates
at about 400 B.C. [30]. He suggested a classification of individual temperaments into four types:
Sanguine (people who are optimistic and hopeful), Melancholic (people who are sad or depressed),
Choleric (people who are irascible, i.e. easily angered), and Phlegmatic (people who are apathetic, i.e.
indifferent and passionless).
This four temperament theory was further developed by Wilhelm Wundt, a German physiologist
who is considered the “father of experimental psychology” [4]. According to Wundt’s model, the four
temperaments represent the extreme ends of a two-dimensional space that is defined by the emotional
vs. unemotional and changeable vs. unchangeable dimensions [30].
During the 1920s, Carl Jung, a Swiss psychiatrist, introduced the terms “extraversion” and “intro-
version” to describe different orientations of personality. Although initially ignored by academic psy-
chologists, Jung’s work has endured despite being based on introspective and interpretive techniques
within the psychoanalytic tradition established by Sigmund Freud. Extraversion and introversion are
still considered important principles in modern models of personality [49].
Personality psychology as an academic field began to take shape in the 1930s, with the estab-
lishment of the first journal, “Journal of Personality,” in 1932. Hans Eysenck, a German-British
psychologist, was an early influential figure who, in the late 1940s, introduced a three-dimensional
model of personality that consisted of “extraversion”, “neuroticism”, and “psychoticism”. Initially
personality psychologists struggled to establish personality as a relevant construct because behavior
can change depending on context. But later they found out that a person’s behavior over a long period
of time and in different situations remained consistent and were related to a person’s personality. The
lexical approach proposed by Goldberg in 1982 involved creating a comprehensive list of adjectives
to describe an individual’s character, which led to the discovery of the largely independent Big Five
factors [16]. These factors are highly stable over time and are predictive of important life outcomes
[29, 35]. As a result, the Big Five factors are now commonly employed both within and outside the
domain of personality psychology.
9
2.2 Personality Traits and the Big Five Model
For this thesis, we are defining personality traits as patterns of behavior, thought, and feeling that
remain consistent across various situations. We are using questionnaires to assess personality traits,
which are based on item response theory [19, 41, 28]. In personality psychology, items usually consist
of statements like “They like to take risks.” and the test-taker responds on a Likert scale from “Very
much like me” to “Not like me at all.” This theory assumes that answers given in a test are informative
about hypothetical latent variables that affect the answers. These latent variables cannot be measured
directly, but can be inferred based on directly observed manifest variables. Factor analysis is used to
discover the main dimensions along which people vary, and five factors consistently emerge in various
populations. These are known as the Big Five and include extraversion, openness to experience,
agreeableness, conscientiousness, and neuroticism [29].
10
the timing of calls, and the quantity of text messages sent and received. These extracted features
were then utilized to construct a social communication network. To predict the Big Five personality
traits, a supervised learning approach based on Support Vector Machines (SVM) was employed. The
results of this analysis yielded mean squared errors ranging from 0.73 to 0.86 on a 7-point scale. In
another comprehensive approach, researchers harnessed both standard mobile phone information and
GPS data to predict individuals’ personality traits [31].
They collected conventional carrier logs, including phone calls and text messages, from a group of 69
participants. From this data, they calculated the entropy of calls and texts and also assessed the inter-
event time between text messages and calls. In addition, GPS data was utilized to determine the radius
of gyration, daily travel distances, and the number of distinct places visited by each participant. The
participants’ self-reported Big Five personality traits were categorized into three classes: low, average,
and high. To build predictive models, they employed a Support Vector Machine (SVM) classifier.
Using a ten-fold cross-validation approach, they were able to identify Openness, Conscientiousness,
Extraversion, Agreeableness, and Neuroticism traits with the following accuracies: 49% for Openness,
51% for Conscientiousness, 61% for Extraversion, 51% for Agreeableness, and 63% for Neuroticism.
The researchers also noted correlations between specific personality traits and mobile phone behavior.
For instance, they found that Extraversion and Agreeableness traits were associated with the entropy
of participants’ contacts, suggesting a connection between these personality traits and the diversity
of their social interactions. Furthermore, the variance in the time intervals between phone calls was
correlated with the Conscientiousness trait, indicating a potential link between conscientiousness and
communication patterns.
A study of similar nature, but with a considerably larger sample size, incorporated Bluetooth
sensor data to further explore the prediction of Big Five personality traits [33]. This extensive study
involved the collection of mobile data, including telecommunication data (calls and texts), GPS data,
and Bluetooth sensor logs, from a substantial group of 636 students over a period of 24 months. From
this dataset, various features were extracted. These features encompassed aspects such as face-to-
face contacts or physical proximity to others, as determined by Bluetooth signal strength, as well
as geo-spatial mobility patterns, and the analysis of text messages, calls, and social network friends’
contact lists. The Big Five trait values were categorized into three classes: low, medium, and high.
Researchers utilized Support Vector Machines (SVM) to build models for predicting personality traits.
However, the study reported successful identification only for the Extraversion trait.
Touch screen swipe behaviors have been employed as a means to identify personality traits. In
a study involving 98 participants, researchers collected data on touch screen swipes and extracted
various touch/swipe-related features [1]. These features included parameters such as average velocity,
mean pressure, mean finger area, and others. To predict specific personality traits, the researchers
utilized the self-reported Eysenck personality questionnaire in combination with the extracted touch
screen swipe features. They employed machine learning classifiers, specifically K-Nearest Neighbors
(KNN) and Random Forests, in their analysis. The outcome was the prediction of the Extraversion and
Neuroticism traits with an average accuracy of 62.9%. In a related study, smartphone data, including
call logs, SMS logs, Bluetooth scans, and app usage, was used to predict Big Five Personality traits
in 83 participants over 8 months [7]. Features like Bluetooth IDs, call durations, unique contacts,
SMS length, and app usage were extracted. Personality traits were categorized as low and high,
and a Support Vector Machine binary classifier achieved accuracies of 69.3% for Openness, 74.4%
for Conscientiousness, 75.9% for Extraversion, 69.6% for Agreeableness, and 71.5% for Neuroticism.
The study also found correlations between personality traits and smartphone usage, e.g. Extraversion
correlated with internet usage, while Conscientious individuals used media apps less, and Extroverts
spent more time on calls.
Another study involved 32 participants who provided data from various sources, including appli-
cation usage logs, phone calls, SMS messages, email messages, and self-reported mood states collected
four times a week [27]. An application named Moodscope was developed as a tool for detecting and as-
sessing mood based on smartphone usage, and it successfully demonstrated that mood can be inferred
from sensor data. Researchers employed a multi-linear regression model to analyze this dataset and
determine participants’ mood. Remarkably, the study achieved a successful inference of participants’
11
mood with an accuracy rate of 66%.
A study made use of app usage data, geospatial records (university arrival time and exit time) and
behavioral parameters (such as charging time) collected from 80 students over 1461 days to estimate
the personality inventory of a participants in an unobtrusive manner without the need of parsing
the app-specific content and social media content [24]. This underscores the potential for utilizing
smartphone data to assess and monitor individuals’ emotional states and well-being.
12
3 Background
3.1 Supervised Machine Learning
13
Figure 3.1: Example of a simple OLS Regression.
Y = β0 + β1 X1 + β2 X2 + . . . + βn Xn
that minimizes the sum of squared differences between the predicted values and the actual values.
In OLS, the model can become overly complex when you have many features or when there
is multicollinearity (high correlation between independent variables). This complexity can lead to
overfitting, where the model fits the training data very closely but performs poorly on unseen data.
Ridge regression addresses overfitting by adding a regularization term to the OLS objective func-
tion. The objective function of Ridge regression can be defined as:
X
Minimize: RSS + α βi2
Where:
• RSS (Residual Sum of Squares) is the same as in OLS, measuring the sum of squared differences
between predicted and actual values.
•
P 2
βi is the sum of squared regression coefficients. The regularization term penalizes large
coefficients.
The α parameter controls the trade-off between fitting the data well (minimizing RSS) and keeping
the model simple (minimizing the sum of squared coefficients). A larger α leads to a more regularized
model with smaller coefficient values, which is helpful in reducing the impact of multicollinearity and
overfitting.
14
decision tree learning algorithm creates a model by splitting along the input dimension of greatest
variance according to a heuristic or a cost function. Essentially, decision trees learn a hierarchy of
(often true/false binary) decisions, leading to a classification of the data. A typical problem with
decision tree learning is its tendency to overfit the data, that is to model the noise in the data as
well as the underlying trend in the data itself, leading to poor model generalizability. To avoid these
overfitting problems, random forests are employed. Random forests, as shown in figure 3.2 divide the
whole data set into random small subsets (without replacement) and the decision tree is constructed
for each subset. An aggregate statistic (usually the mean or mode) of the output of the ensemble
(forest) of decision trees is taken as the actual answer. In regression tasks, Random Forest operates
as a Random Forest Regressor. Instead of predicting discrete categories, it estimates continuous
numerical values. Similar to the classification process, it constructs an ensemble of decision trees but
employs a different aggregation method. Each tree in the ensemble predicts a numerical value, and
the final prediction is obtained by averaging or taking the median of these individual tree predictions.
Random Forest Regression is highly advantageous for capturing non-linear relationships and handling
noisy data while avoiding the pitfalls of over-fitting. Additionally, it provides insights into feature
importance, allowing researchers to identify the most influential variables in predicting the target
variable. This flexibility and robustness make Random Forest a popular choice for both classification
and regression tasks. For an overview of the various decision tree architectures, learning modes and
applications see [26].
Figure 3.2: Sample Random Forests model created with three decision trees for the purpose of
demonstration.
3.1.5 XGBoost
XGBoost, which stands for Extreme Gradient Boosting, is a powerful machine learning algorithm
that falls under the category of ensemble learning. It was developed by Tianqi Chen and is known
for its effectiveness in solving various machine learning problems, especially in structured data and
tabular data scenarios. XGBoost is an implementation of the gradient boosting framework, a machine
learning technique that builds predictive models by combining the predictions of multiple weaker
models, typically decision trees. Gradient boosting works by sequentially training a series of weak
learners and adjusting their predictions to minimize a specified loss function. It’s an ensemble method,
which means it combines multiple models to improve predictive accuracy. XGBoost primarily uses
decision trees as base learners. Decision trees are simple, non-linear models that make predictions
by partitioning the input data into subsets and assigning a constant value to each subset. The
decision trees used in XGBoost are often shallow, with a limited number of nodes, which makes them
weak learners. XGBoost incorporates several techniques to control overfitting and improve model
generalization. Regularization is applied through L1 and L2 regularization terms added to the loss
function, which penalize complex models. This helps prevent the model from fitting the training data
15
Figure 3.3: Sample XGBoost model created for demonstration
too closely.
While XGBoost uses popular loss functions like mean squared error for regression and log loss for
classification by default, it also allows users to define custom loss functions, making it adaptable to a
wide range of problems. The figure 3.3 shows a demonstration of how XGBoost works.
Support Vector Machines (SVM) represent a powerful and versatile class of supervised machine learn-
ing algorithms used extensively in both classification and regression tasks. In classification, SVM aims
to find a hyperplane that best separates different classes or categories within a dataset. It does so
by maximizing the margin—the distance between the hyperplane and the nearest data points of each
class. SVM is particularly effective in scenarios with complex decision boundaries or high-dimensional
feature spaces, as it can employ various kernel functions (e.g., linear, polynomial, or radial basis
function) to map data into higher-dimensional spaces (as shown in Fig. 3.4 ) where classes become
more separable. This non-linear transformation enables SVM to handle intricate patterns and achieve
high classification accuracy, making it a popular choice in image recognition, text classification, and
bioinformatics.
In regression tasks, SVM transforms into a Support Vector Regressor, aiming to find a hyperplane
that best fits the data while minimizing prediction errors. Unlike traditional regression techniques,
SVM Regression can capture non-linear relationships by utilizing kernel functions to map input fea-
tures into a higher-dimensional space. The objective is to find a hyperplane that maintains a specified
margin around the predicted values, effectively balancing the trade-off between fitting the training
data and generalizing to unseen data points. SVM Regression excels in scenarios where data exhibits
nonlinear patterns, and it is robust to outliers due to its use of support vectors—data points that
influence the position of the hyperplane. As a result, SVM is a valuable tool in both classification and
regression domains, contributing to breakthroughs in fields such as finance, healthcare, and natural
language processing.
16
Figure 3.4: Visual representation of Support Vector Machine transforming the Non-liner separable
data in to higher dimensional space
The K-Nearest Neighbors (KNN) algorithm is a fundamental and intuitive machine learning technique
used for both classification and regression tasks. It is based on the principle of proximity, assuming
that similar data points in a feature space tend to have similar target values or belong to the same
class. KNN is considered a non-parametric and instance-based learning method because it doesn’t
make assumptions about the data distribution and makes predictions based on local information. To
make a prediction for a new, unseen data point, KNN identifies the k-nearest neighbors from the
training dataset. The distance metric (commonly Euclidean distance or Manhattan distance) is used
to measure the proximity between the new data point and all other data points.
In classification tasks, KNN assigns the class label that is most frequently represented among the
k-nearest neighbors to the new data point. This is often determined by a simple majority vote. In
regression tasks, KNN calculates the average (or weighted average) of the target values of the k-nearest
neighbors and assigns this value as the predicted target value for the new data point.The choice of
the ”k” parameter in KNN is crucial. A small ”k” makes the model sensitive to noise and outliers,
potentially leading to overfitting. A large ”k” can over smooth decision boundaries, potentially leading
to underfitting. KNN is often computationally expensive for large datasets, as it requires calculating
distances between the new data point and all training data points. Various data structures (e.g., KD-
trees) and optimizations can be used to speed up this process.Fig. 3.5 ) shows the Voronoi tessellation
having 19 samples marked with a ”+”, and the Voronoi cell surrounding each sample. A Voronoi cell
encapsulates all neighboring points that are nearest to each sample. For an overview refer to [37].
17
Figure 3.5: Voronoi tessellation showing a sample k-NN classifier
18
3.2 Baseline Models
3.2.1 Mean Model
A Mean model serves as a fundamental baseline model for regression problems. In this simplistic
model, the output value is forecasted solely as the population mean, disregarding any variations in
the input values.
For instance, Table 3.1 includes data for five participants, encompassing three input features and
an output feature known as the ”actual trait.” There is also a column labeled ”predicted trait,” which
represents the values predicted by the mean model. As demonstrated, the mean model consistently
returns a predicted trait value of 0.5 for all participants. This uniform prediction arises because the
mean of the population, in this case, is calculated to be 0.5, and the mean model simply assigns this
value as the prediction for each participant, irrespective of their individual input values.
19
allows for testing the model against unseen data without introducing potential bias that could arise
from randomly selecting a particularly favorable or unfavorable test set.
For instance, consider a ten-fold cross-validation. The dataset is initially split into ten sets. During
the first round, one set is designated as the validation set, and the model is trained on the remaining
nine sets. This process is then iterated for the remaining sets. The accuracy or error for each round
is computed separately, and the final accuracy or error for the model is determined by averaging the
results from all ten rounds. This comprehensive evaluation provides a robust measure of the model’s
performance across different subsets of the data.
3.3.2 RMSE
The root mean squared error is a quality metric for regression models. It is computed by finding
the square root of the mean of the squares of the difference between the actual values and predicted
values. Since this is an error metric, a model with lower root mean squared error is considered as a
better model. For the sample mean model data shown in Table 3.1 RMSE value is 0.28.
The Root Mean Square Error (RMSE) is calculated as follows:
v
u n
u1 X
RMSE = t (yi − ŷi )2 (3.1)
n
i=1
Where:
3.3.3 Accuracy
Accuracy serves as a crucial statistical metric for evaluating classification models. It is calculated by
taking the ratio of the number of correct predictions made by the model to the total number of tests
conducted. A model with a higher accuracy percentage is generally considered to be superior in its
predictive capabilities.
For instance, let’s consider a sample classification dataset as presented in Table 3.2. In this scenario,
the model successfully made three correct predictions out of a total of five tests. Consequently, the
accuracy of the model can be determined as 60% since it achieved a 60% accuracy rate by correctly
classifying the majority of the test cases.
20
4 Methods
4.1 Study Design and Data Collection
The data for this study was collected under a project called WeNet, one of European Union’s Horizon
2020 programs under the Grant Agreement number 823783 [51]. The data collection process spanned
a six-week period and was organized into two stages:
1. The initial synchronic data collection involved the use of three standard close-ended question-
naires. This stage enabled the collection of self-reported general information regarding social
practices.
2. The subsequent diachronic data collection took place through a smartphone app, facilitating the
observation of the daily routines of the students.
As described in Figure. 4.1, the first two weeks were dedicated to the sample recruitment. This
was performed by sending two initial questionnaires, i.e., invitation and assessment of habits. The
remaining month was entirely dedicated to the data collection through the app installed on the students
smartphone. During all the data collection a help-desk was active, and ready to support students in
all the problems which were arising.
The questionnaires were managed with the LimeSurvey platform [47]. Invitations to participate
in the online survey were sent out through LimeSurvey to the email addresses of students enrolled at
various universities. This data collection method was based on the use of Time Diaries, which are a
well-established tool in the social sciences. Time diaries ask respondents to record three key aspects
of their daily lives: the activities they engage in, the locations they visit, and the people they interact
with. Time diaries can be administered in two ways: as ”leave-behind diaries,” where respondents
record their activities in real-time as the day progresses, or as ”recall diaries,” where respondents recall
their activities from the previous day. In our study, we used the iLog app, which allowed students
to provide real-time responses. The questions and answers were structured in accordance with the
HETUS (Harmonised European Time Use Survey) standard [46, 55].
The sample was chosen from the entire student population of the University of Trento. An invita-
tion to participate in the survey was extended to all students, with an initial exclusion criterion applied
to those who did not possess a smartphone compatible with the study (specifically, only Android Op-
erating System versions greater than 5.0) or those who did not regularly attend classes. Subsequent
to the initial contact via email, the online questionnaire was dispatched to inquire about their habits
and routines. The final stage involved the transmission of a password for downloading and installing
the iLog application. This process resulted in 1042 responses. Among these responses, those from
students born after 1993 (with the aim of restricting the involvement of latecomers to the university)
and students who did not actively engage in university life were removed. From the pool of 860 eligible
candidates, a total of 318 students were selected, with the sample size adjusted proportionally to each
department’s student population. This adjustment was made to prevent any misrepresentation of
daily routines stemming from variations in schedules and university sub-communities. Data cleaning
during the data preparation phase resulted in a final dataset that includes information from only 149
participants. This reduction in size, in comparison to the previously mentioned 318 students, is the
outcome of excluding all participants with limited survey participation. Also due to computational
limitations, for this particular work, the data collected between the time period November 2020 and
December 2020 was used which contributed to the reduction in size.
21
%
Gender
Female 48.7
Male 51.3
Age
<22 47.5
22-26 52.5
Department
Stem 44.57
Non-Stem 55.42
Total 100 (N=149)
Table 4.1 shows how the sample is balanced according to the main characteristics, namely gender,
age and departments (whether stem or non-stem) in which the students were enrolled. Furthermore,
it shows the range of annotations given from the participants. The psycho-social characteristics of
the participants are described in Table 4.2. Concerning personality traits (BFI-10), the average of the
scores is between 49.15 and 76.26, with a maximum standard deviation of 23.69 reached in the case of
the Extraversion variable. The range of responses goes from 0 to 100. Students who were enrolled in
the following departments: Engineering and Applied Sciences, Natural Sciences, Medicine and veteri-
nary medicine, and Agricultural, were categorized as belonging to the STEM department. Conversely,
students enrolled in departments such as Social Sciences, Business/Economics, Law, Humanities, and
International Relations and Public Administration were categorized as non-STEM. Apart from psy-
chosocial traits, data was also collected on various other aspects, including (i) daily and extraordinary
journeys, which encompassed the times and means of transportation used; (ii) work routines; and (iii)
study and class attendance routines. In total, 27 questions were posed, resulting in the collection of
78 variables. However, it’s worth noting that for the purposes of this analysis, we did not utilize this
additional data, and therefore, I won’t be elaborating on it further.
22
mean std median min max
Total Population
Extraversion 49.15 23.69 50.00 0.00 100
Agreeableness 76.26 15.56 75.00 25.00 100
Conscientiousness 64.88 19.27 62.50 12.50 100
Neuroticism 49.70 20.70 50.00 0.00 100
Openness 71.46 18.66 75.00 6.25 100
Female Population
Extraversion 48.72 22.76 50.00 0.00 100
Agreeableness 80.15 14.27 81.25 25.00 100
Conscientiousness 63.69 20.40 62.50 12.50 100
Neuroticism 54.49 19.90 53.13 0.00 100
Openness 70.42 20.37 75.00 6.25 100
Male Population
Extraversion 49.71 24.98 50.00 0.00 100
Agreeableness 71.09 15.77 75.00 31.25 100
Conscientiousness 66.47 17.62 68.75 18.75 100
Neuroticism 43.34 20.10 43.75 0.00 100
Openness 72.84 16.11 75.00 31.25 100
more, for experiments conducted outside of Europe, the activities and results have been designed to
align with the requirements of a specific European country, as stipulated by the European Commission.
In this context, Italian legislation was chosen as the reference point. Additional details pertaining to
these compliance measures are provided in [14].
2. Software (SW) sensors, referring to all the software events that can be captured from the oper-
ating system and software applications. Examples include events related to Wi-Fi connectivity
and more. A comprehensive list of the software sensors employed in our study is provided in
Table 4.3.
Table 4.3 shows the list of sensors used in the study along with their measurement frequency
and their respective number of observations. Among all the sensors listed in the table, the Step
Counter is a software sensor whose data is derived from data recorded by a hardware sensor called
the Accelerometer. The Step Counter data provides minute-by-minute records of the number of steps
taken by users. Similarly, the Touch Event sensor contains analogous information, documenting the
number of touch events occurring every minute. For the Screen Event, Battery Charge Event, Doze
Event, and Music Event data, each row in the dataset corresponds to a change in the event, recorded
as True/False or On/Off status. The Ring Event data comprises instances when changes in ring
mode occurred, with three registered ring modes: Normal, Silent, and Vibrate. The Bluetooth sensor
captures data regarding the device name, brand, and records of devices in close proximity to the
user’s smartphone. It includes a feature called ”bond” to indicate whether the devices are paired
with the host smartphone or not. This data offers insights into the level of populated areas where
the user is situated at a given time. The data in its existing form could not be used for analysis and,
therefore, needed to be cleaned, transformed, and preprocessed. Furthermore, meaningful features
23
Exploring the Variety of Random
Documents with Different Content
psychographs, etc. Some of the chapter titles are as follows: Practical
methods of substantiating the truths of spiritualism; Testing the
spirits’ sight; Babies, children and adult spirits, reappearing as
children; The gradual development of spirit photography;
Psychographs across ordinary photographs of sitters;
Materialisations. A religious atmosphere pervades the book. The text
is supplemented by fifty-one illustrations, some of them
reproductions of spirit-photographs.
20–15944
The book is a compilation of famous communications from the
spirit world for the purpose of proving their religious significance.
The author’s object is to show that the life beyond is but a
continuation of life on earth, that we reap what we have sown, that
every character development here on earth counts beyond and that,
in a certain sense, there is a judgment day awaiting us. The contents
are in part: The necessary pre-acquired mental conditions for
securing happiness in the next world; The laws of eternal life; The
gospel of character, preached and practised in the next life; The
acquisition of the Christ-like character and conduct is everything
hereafter, and must be striven for on earth; The true spiritual
meaning of “heaven” and “hell”; The fate of the suicide—a terrible
warning; The nature of man, here and hereafter.
(Eng ed 20–16630)
(Eng ed 20–1081)
20–628
Reviewed by H. W. Boynton
“It is simply and vividly told. It reads not like fiction but like fact,
which perhaps it is.”
“Very simply, very quietly and naturally, the author builds up the
structure of events, some of them apparently trivial at the time, but
destined later to become of dreadful portent, which at the last
crushes and breaks Harry’s nerve. The logic of it all is unassailable
and perfectly convincing.”
20–21412
“Half the time we see the city through his meticulously observant
eyes, and the other half he plays Boswell to his own personality and
ideas. The result is an engaging series of vignettes, a most
understanding interpretation, and a remarkably honest human
document.” J. S. N.
20–4985
A book of selections for readings and recitations for day school and
Sunday school. Each section is prefaced by a discussion of the origin
and meaning of the special day under consideration. “A collection of
nearly a hundred literary selections is presented in connection with
the several studies. Some of these are old favorites which can never
be out of date. Others are relatively recent, furnishing an expression
of the thought and feeling of the present on the subjects discussed.”
Contents: Place of special days; New Year’s day; Lincoln’s birthday;
Washington’s birthday; Good Friday; Easter Sunday; Mother’s day;
Memorial day; Children’s day; Flag day; Commencement day;
Independence day; Labor day; Beginning school; Thanksgiving day;
Christmas day.
+ El School J 20:795 Je ’20 100w
20–12386
(Eng ed 19–19873)
20–8858
“While ‘The light heart’ is far less interesting and far less stirring
than either ‘Gudrid the fair’ or ‘The outlaw,’ it has one truly splendid
moment—that in which Thormod swears his allegiance for life and
death to King Olaf.”
“The story is good and unusual. But above all we would commend
Mr Hewlett’s short introduction on the nature of the Sagas.”
20–19506
20–4
“We cannot help wishing that he had been a great deal more
lenient with himself. For the tale, as it stands, is so exceedingly plain,
and the fights, murders, escapes and pursuits described upon so even
a breath, that it is hard to believe the great, more than life-size dolls
minded whether they were hit over the head or not. There is no
doubt that the very large number of words of one syllable help to
keep the tone low. They have a curious effect upon the reader. He
finds himself, as it were, reading aloud, spelling out the tale.” K. M.
“‘The outlaw’ is a noble tale fully and in the main nobly told.”
Ludwig Lewisohn
“In reproducing the old story Mr Hewlett mediates with his usual
skill between the Scylla of excessive modernity and the Charybdis of
an obsolete idiom. It is, however, questionable whether he might not
without harm have ventured even closer to Scylla.”
20–14759
The editor of this volume of short stories states in his preface that
he believes that the short story is the form which can best stand as
the adequate expression in fiction of American life. He says “If it
were possible to bring together in a single volume a group of these,
each one reflecting faithfully one facet of our many-sided life, would
not such a book be a truer picture of America than any single novel
could present? The present volume is an attempt to do this.”
Contents: The right Promethean fire, by George Madden Martin; The
land of heart’s desire, by Myra Kelly; The tenor, by H. C. Bunner; The
passing of Priscilla Winthrop, by William Allen White; The gift of the
Magi, by O. Henry; The gold brick, by Brand Whitlock; His mother’s
son, by Edna Ferber; Bitter-sweet, by Fannie Hurst; The riverman,
by Stewart Edward White; Flint and fire, by Dorothy Canfield; The
ordeal at Mt Hope, by Paul Laurence Dunbar; Israel Drake, by
Katherine Mayo; The struggles and triumph of Isidro de los
Maestros, by James M. Hopper; The citizen, by James F. Dwyer.
There is a sketch of the author following each story, and at the end a
List of American short stories classified by locality, and Notes and
questions for study.
“Only two stories in the volume, Myra Kelly’s ‘Just kids’ and
William Allen White’s ‘Society in our town,’ have grown instead of
being made after a model.”
20–10649
The book was written in the spring of 1917 after the author had
been in Greece, Macedonia and Serbia and constitutes another
postwar revelation. It is stated that “during the war and after our
entry into it as an ally of France and Great Britain, without our
knowledge and consent the constitution of a little, but a brave and
fine people was nullified by the joint action of two of our allies: the
neutrality of a small country was violated, the will of its people set at
naught, its laws broken, its citizens persecuted, its press muzzled. By
force a government was imposed on this free people, and by force
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com