Computer Analysis of Human Behavior
Albert Ali Salah r Theo Gevers
Editors
Computer Analysis
of Human Behavior
Editors
Albert Ali Salah
Department of Computer Engineering
Boğaziçi University
Bebek, Istanbul 34342
Turkey
salah@boun.edu.tr
Theo Gevers
Informatics Institute
University of Amsterdam
Science Park 904
1098 XH Amsterdam
Netherlands
th.gevers@uva.nl
ISBN 978-0-85729-993-2
e-ISBN 978-0-85729-994-9
DOI 10.1007/978-0-85729-994-9
Springer London Dordrecht Heidelberg New York
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2011939770
© Springer-Verlag London Limited 2011
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the
Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to
the publishers.
The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free
for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
that may be made.
Cover design: VTeX UAB, Lithuania
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Overview and Goals
Human behavior is complex, but not random. Computer analysis of human behavior
in its multiple scales and settings leads to a steady influx of new applications in
diverse domains like human-computer interaction, affective computing, social signal
processing, and ambient intelligence. We envision seamlessly integrated plug and
play devices that can be used to endow a given environment with an awareness of
the physical, functional, temporal and social organization of its internal domestic
dynamics, as well as the personalities and social relationships of its inhabitants,
providing a vast array of new services for ordinary people. We picture intuitive
tools for social scientists, psychologists and doctors to observe and quantify human
behavior. We desire realistic virtual agents and engaging robots that can analyze
and properly respond to social context. We seek intelligent algorithms to process
vast collections of multimedia information to retrieve relevant material in response
to semantic queries.
The realization of all these tools and systems requires a fundamental grasp of the
key issues, as well as knowledge and experience over the computational tools. The
goal of this book is to provide a solid foundation toward achieving this.
Most significantly, the focus of the book is in advanced pattern recognition techniques to automatically interpret complex behavioral patterns generated when humans interact with machines or with others. This is a challenging problem where
many issues are still open, including the joint modeling of behavioral cues taking
place at different time scales, the inherent uncertainty of machine detectable evidences of human behavior, the mutual influence of people involved in interactions,
the presence of long term dependencies in observations extracted from human behavior, and the important role of dynamics in human behavior understanding.
The contiguity of these problems with the field of pattern recognition is straightforward. The editors of the present volume (together with Alessandro Vinciarelli
from the Univ. of Glasgow and Nicu Sebe from the Univ. of Trento) organized the
First Workshop of Human Behavior Understanding as a satellite workshop to Int.
Conference on Pattern Recognition (ICPR) in 2010, with wide attendance. Similarly,
v
vi
Preface
the Human Communicative Behaviour Analysis Workshop is held as a satellite to
IEEE’s Computer Vision and Pattern Recognition (CVPR) Conference since 2008.
The topics presented in the book are actively researched in the pattern recognition
community.
Target Audience
The book is planned as a graduate textbook/guidebook on computer analysis of
human behavior. Starting from the preliminaries, it covers major aspects concisely,
introduces some of the most frequently used techniques and algorithms in detail,
and provides examples of real applications. Each chapter is a stand-alone treatment
of a relevant subject, discussing key issues, classic and recent approaches, as well
as open questions and future research directions. Since the subject matter is very
broad, we have restricted the number of chapters to ensure that the book can be
covered in one semester and focused on providing both a good background and a
comprehensive vision of the field. Each chapter is supplemented with educational
material, including chapter summary, glossary, questions, and online lecture slides
to help the instructor.
Organization of the Book
We have divided the book into four parts. The first part, called “The Tools of the
Trade”, is a selection of basic topics that the reader will repeatedly come across in
human behavior analysis. These chapters are all written in an intuitive and didactic
way, and pave the way for more advanced discussions.
The first of these four chapters introduces Bayesian methods for behavior analysis, because there are numerous uncertainties in measuring and assessing behavior,
and one also needs to deal with idiosyncrasies and inconsistencies. In particular,
Gaussian processes and Dirichlet processes are covered in this chapter.
Almost all human behavior is temporal, but the time scale may range from milliseconds (e.g. the movement of a facial muscle) to hours (e.g. sleep cycles) or
months (e.g. habits). The temporal dimension is the sine qua non of human behavior, and subsequently, the second chapter introduces basic methods for temporal
analysis, including Hidden Markov Models, Conditional Random Fields, and variants thereof. This chapter also gives a concise introduction to graphical models,
their factorization and how to perform inference in graphical models. Subsequently,
it complements the first chapter nicely.
The third chapter discusses how we can detect and track humans by computer
vision methods to understand their actions, and is a prerequisite for most material
presented in Parts II and III. The visual modality can provide a system with highdimensional and dynamic data, creating some of the most formidable challenges in
behavior analysis, and most analysis pipelines start with detection and tracking.
Preface
vii
The fourth chapter is an introduction to computational visual attention, and explains how humans make sense of the immensely rich perceptual input (the firehose
of experience, in the memorable expression of Ben Kuijpers), as well as how computer systems can mimic the process of attention to reduce their computation load,
especially in terms of bottom-up (data driven) and top-down (semantically driven)
approaches. The VOCUS attention system is described in detail, and applications
are given from the field of mobile robotics.
Two major application areas of human behavior analysis are activity recognition,
and analysis of social signals, including those that pertain to affect. The second part
of the book is devoted to “Analysis of Activities”, whereas the third part deals with
social and affective behavior.
The first chapter in the second part (i.e. Chap. 5) is on gait and posture analysis,
which is a relevant problem for clinical applications, surveillance, and ergonomics.
In this chapter the reader will be introduced to a host of sensors (like gyroscopes and
accelerometers) that can be used to measure useful physiological and movement related signals. Most of these sensors are easily integrated into smart phones, creating
a huge potential for mobile phone applications (and games) that are based on human
behavior analysis.
The second part continues with Chap. 6 on hand gesture analysis. Hand gestures
can be used to define natural interfaces in human-computer interaction, but they are
also rich sources of social and contextual cues during conversations. This chapter
builds heavily on temporal analysis and tracking material of the first part.
Automatic analysis of complex human behavior is, as we mentioned earlier, one
of the grand challenges of multimedia retrieval. Some behaviors are simple (e.g.
walking) and can be detected by looking at simple cues. Some behaviors are complex (e.g. flirting) and require extensive knowledge and processing of context. Chapter 7 is on semantics of human behavior in image sequences, and discusses how
environment influences the perceived activities, and how bottom-up and top-down
approaches can be integrated for recognizing events.
With the third part of the book, “Social and Affective Behaviors”, we move toward applications where pattern recognition and machine learning methods need
to be complemented with psychological background knowledge and models. Social behaviors constitute a major research area, largely overlapping with the field of
affective computing. This part opens up with a psychological treatise on social signals, written in a very accessible way for an audience with mainly computer science
background. A wide range of signals like dominance, persuasion, shame, pride, and
enthusiasm are introduced and discussed in Chap. 8.
Poggi and D’Errico define a social signal as a communicative or informative
signal that, either directly or indirectly, conveys information about social actions,
social interactions, social emotions, social attitudes and social relationships (2010).
The audio modality is more frequently used for processing such signals compared
to the visual modality. Chapter 9 is an extensive and technical discussion of voice
and speech analysis for assessing human communicative behavior.
Chapter 10 is on analysis of affect from combined audiovisual cues. It discusses
how affective signals can be measured and evaluated in a continuous manner, as
viii
Preface
well as how to perform multimodal fusion. While the discussions of data annotation
and experimental design in Chap. 10 pertain mainly to affective displays, identified issues and challenges are valid for almost the entire range of human behavior
analysis.
The last chapter of the third part of the book, Chap. 11, discusses social interactions and group dynamics. Four case studies from a meeting scenario are presented,
where the authors combine audiovisual cues to estimate the most and least dominant person, emerging leaders, and functional roles in the group, as well as the
group dynamics as a whole.
The fourth part of the book is devoted to “Selected Applications” from three
different research fields (ambient intelligence, biometrics, and gaming, respectively)
that reflect the diversity and scope of behavioral cues and their usage.
Chapter 12 describes a vision of ambient assisted living, where a smart environment monitors the activities of its inhabitants for health care purposes. The aging
population of developed countries call for technologies to allow elderly to remain
longer in their home environments, giving them a higher quality of life, as well as
reducing the costs of health care. Smart monitoring tools, provided that they deal
with acceptance and privacy issues properly, are of great value.
Behavioral biometrics is the identification of a person via behavioral cues. Chapter 13 surveys this new field, and reveals that an astonishing number of behavioral
cues, measured directly or indirectly, can be used to verify the identity of a person.
This is a joint achievement of improvements in pattern recognition methods, as well
as sensor technologies.
Finally, Chap. 14 deals with games, which are major economic drivers behind
computer scientific research. Games do not only serve entertainment; there are
games to exercise the body and mind, and sometimes a game is the means to a completely different end. Take for instance the robotics community, which uses robot
soccer as a driver behind great advances in robot mechanics, coordination, planning, and a host of other challenges. This is an application area where both real and
virtual human behavior can be analyzed for improving engagement and interaction,
as well as for teaching computers and robots skills at human level.
Taken together, these chapters cover most of the field of human behavior analysis.
It is possible to include many more tools and applications, as the field is positioned
at a confluence of many different and mature research areas. Nonetheless, we hope
that this collection will be a useful teaching tool for initiating newcomers, as well
as a timely reference work that sums up recent research in this advancing area.
Amsterdam, The Netherlands
Albert Ali Salah
Theo Gevers
Contents
Part I
Tools of the Trade
1
Bayesian Methods for the Analysis of Human Behaviour . . . . . . .
Gwenn Englebienne
2
Introduction to Sequence Analysis for Human Behavior
Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hugues Salamin and Alessandro Vinciarelli
3
21
3
Detecting and Tracking Action Content . . . . . . . . . . . . . . . .
Alper Yilmaz
41
4
Computational Visual Attention . . . . . . . . . . . . . . . . . . . . .
Simone Frintrop
69
Part II
Analysis of Activities
5
Methods and Technologies for Gait Analysis . . . . . . . . . . . . . . 105
Elif Surer and Alper Kose
6
Hand Gesture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Cem Keskin, Oya Aran, and Lale Akarun
7
Semantics of Human Behavior in Image Sequences . . . . . . . . . . 151
Nataliya Shapovalova, Carles Fernández, F. Xavier Roca, and Jordi
Gonzàlez
Part III Social and Affective Behaviors
8
Social Signals: A Psychological Perspective . . . . . . . . . . . . . . 185
Isabella Poggi and Francesca D’Errico
9
Voice and Speech Analysis in Search of States and Traits . . . . . . . 227
Björn Schuller
ix
x
Contents
10 Continuous Analysis of Affect from Voice and Face . . . . . . . . . . 255
Hatice Gunes, Mihalis A. Nicolaou, and Maja Pantic
11 Analysis of Group Conversations: Modeling Social Verticality . . . . 293
Oya Aran and Daniel Gatica-Perez
Part IV Selected Applications
12 Activity Monitoring Systems in Health Care . . . . . . . . . . . . . . 325
Ben Kröse, Tim van Oosterhout, and Tim van Kasteren
13 Behavioral, Cognitive and Virtual Biometrics . . . . . . . . . . . . . 347
Roman V. Yampolskiy
14 Human Behavior Analysis in Ambient Gaming and Playful
Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Ben A.M. Schouten, Rob Tieben, Antoine van de Ven, and David W.
Schouten
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Contributors
Lale Akarun Computer Engineering Department, Boğaziçi University, Istanbul,
Turkey, akarun@boun.edu.tr
Oya Aran Idiap Research Institute, Martigny, Switzerland, oya.aran@idiap.ch
Francesca D’Errico Roma Tre University, Rome, Italy, fderrico@uniroma3.it
Gwenn Englebienne University of Amsterdam, Science Park 904, Amsterdam,
The Netherlands, G.Englebienne@uva.nl
Carles Fernández Departament de Ciències de la Computació and Computer
Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia,
Spain, perno@cvc.uab.es
Simone Frintrop Institute of Computer Science III, Rheinische Friedrich-Wilhelms
Universität Bonn, Römerstrasse 164, 53117 Bonn, Germany,
frintrop@iai.uni-bonn.de
Daniel Gatica-Perez Idiap Research Institute, Martigny, Switzerland,
gatica@idiap.ch
Jordi Gonzàlez Departament de Ciències de la Computació and Computer Vision
Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain,
poal@cvc.uab.es
Hatice Gunes Queen Mary University of London, London, UK, haticeg@ieee.org
Tim van Kasteren Boğaziçi University, Bebek, Istanbul, Turkey,
tim0306@gmail.com
Cem Keskin Computer Engineering Department, Boğaziçi University, Istanbul,
Turkey, keskinc@cmpe.boun.edu.tr
Alper Kose Department of Biomedical Sciences, University of Sassari, Sassari,
Italy, akose@uniss.it
xi
xii
Contributors
Ben Kröse University of Amsterdam, Amsterdam, The Netherlands,
b.j.a.krose@uva.nl; Amsterdam University of Applied Science, Amsterdam, The
Netherlands
Mihalis A. Nicolaou Imperial College, London, UK, mihalis@imperial.ac.uk
Tim van Oosterhout Amsterdam University of Applied Science, Amsterdam, The
Netherlands, T.J.M.van.Oosterhout@hva.nl
Maja Pantic Imperial College, London, UK, m.pantic@imperial.ac.uk; University
of Twente, Twente, The Netherlands
Isabella Poggi Roma Tre University, Rome, Italy
F. Xavier Roca Departament de Ciències de la Computació and Computer Vision
Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain,
xavir@cvc.uab.es
Hugues Salamin School of Computing Science, University of Glasgow, Glasgow,
Scotland, hsalamin@dcs.gla.ac.uk
Ben A.M. Schouten Department of Industrial Design, Eindhoven University of
Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands,
b.a.m.schouten@tue.nl
David W. Schouten S. Nicolas Highschool, Prinses Irenestraat 21, 1077 WT Amsterdam, The Netherlands
Björn Schuller Institute for Human–Machine Communication, Technische Universität München, 80290 Munich, Germany, schuller@tum.de
Nataliya Shapovalova Departament de Ciències de la Computació and Computer
Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia,
Spain, shapovalova@cvc.uab.es
Elif Surer Department of Biomedical Sciences, University of Sassari, Sassari,
Italy, esurer@uniss.it
Rob Tieben Department of Industrial Design, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands, r.tieben@tue.nl
Antoine van de Ven Fontys University of Applied Sciences, Postbus 347, 5600 AH
Eindhoven, The Netherlands, antoine.vandeven@fontys.nl
Alessandro Vinciarelli School of Computing Science, University of Glasgow,
Glasgow, Scotland, vincia@dcs.gla.ac.uk; Idiap Research Institute, Martigny,
Switzerland
Roman V. Yampolskiy University of Louisville, Louisville, KY, USA,
roman.yampolskiy@louisville.edu
Alper Yilmaz The Ohio State University, Columbus, OH 43035, USA,
yilmaz.15@osu.edu
Acronyms
A-V
ADL
AF
ANN
ASR
BCI
BLSTM-NN
BoW
CAST
CRF
DAG
DCT
DLT
DoG
DP
DPMM
ECG
EEG
EM
EMG
EOG
FEF
FFT
FIT
FMTL
FOA
FRCS
GF
GP
GSR
HCI
Arousal-valence
Activities of daily living
Anatomical frame
Artificial neural network
Automatic speech recognition
Brain-computer interface
Bi-directional long short-term memory neural network
Bag of words
Calibrated anatomical system technique
Conditional random field
Directed acylic graph
Discrete cosine transform
Direct linear transform
Difference of Gaussians
Dirichlet process
Dirichlet process mixture model
Electrocardiogram
Electroencephalography
Expectation-Maximization algorithm
Electromyography
Electrooculogram
Frontal eye field
Fast Fourier transform
Feature Integration Theory
Fuzzy metric temporal Horn logic
Focus of attention
Functional role coding scheme
Global frame
Gaussian process
Galvanic skin response
Human–Computer interaction
xiii
xiv
HCRF
HEU
HMM
HNR
HoG
ICA
IOHMM
IOR
IR
IT
KDE
KF
kNN
LDA
LDA
LDCRF
LGN
LIP
LLD
LPC
MAP
MARG
MCMC
MDF
MFCC
MLE
MMORPG
MoG
MRF
MSE
MUD
NMF
PAD
PCA
PDF
PF
PHOG
PO
PP
RCFL
RF
RFID
RMSE
ROI
RPG
Acronyms
Hidden conditional random field
Human event understanding
Hidden Markov model
Harmonics-to-noise ratio
Histograms of oriented gradients
Independent components analysis
Input output hidden Markov model
Inhibition of return
Infrared
Infero-temporal cortex
Kernel density estimation
Kalman filter
k-Nearest neighbor
Latent Dirichlet allocation
Linear discriminant analysis
Latent dynamic conditional random field
Lateral geniculate nucleus
Lateral intraparietal area
Low-level descriptors
Linear predictive coding
Maximum a posteriori
Magnetic, angular rate and gravity
Markov chain Monte Carlo
Most discriminative features
Mel-Frequency cepstrum coefficients
Maximum likelihood estimation
Massively multiplayer online role-playing games
Mixture of Gaussians
Markov random field
Mean squared error
Massively multi user dungeon
Non-negative matrix factorization
Pleasure-arousal-dominance
Principal components analysis
Probability density function
Particle filter
Pyramid of histograms of oriented gradients
Parieto-occipital cortex
Posterior parietal cortex
Recursive coarse-to-fine localization
Random forests
Radio frequency identification
Root mean squared error
Region of interest
Role-playing video games
Acronyms
RT
SAL
SAN
SC
SGT
SIFT
SLAM
STIP
SVM
SVR
TF
TOF
V1
VDR
WoZ
WTA
xv
Reaction time
Sensitive artificial listener
Social affiliation networks
Superior colliculus
Situation graph tree
Scale invariant feature transform
Simultaneous localization and mapping
Space time interest points
Support vector machines
Support vector regression
Technical frame
Time of flight
Primary visual cortex
Visual Dominance Ratio
Wizard-of-Oz
Winner-Take-All network