[go: up one dir, main page]

Academia.eduAcademia.edu
Computer Analysis of Human Behavior Albert Ali Salah r Theo Gevers Editors Computer Analysis of Human Behavior Editors Albert Ali Salah Department of Computer Engineering Boğaziçi University Bebek, Istanbul 34342 Turkey salah@boun.edu.tr Theo Gevers Informatics Institute University of Amsterdam Science Park 904 1098 XH Amsterdam Netherlands th.gevers@uva.nl ISBN 978-0-85729-993-2 e-ISBN 978-0-85729-994-9 DOI 10.1007/978-0-85729-994-9 Springer London Dordrecht Heidelberg New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2011939770 © Springer-Verlag London Limited 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Cover design: VTeX UAB, Lithuania Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface Overview and Goals Human behavior is complex, but not random. Computer analysis of human behavior in its multiple scales and settings leads to a steady influx of new applications in diverse domains like human-computer interaction, affective computing, social signal processing, and ambient intelligence. We envision seamlessly integrated plug and play devices that can be used to endow a given environment with an awareness of the physical, functional, temporal and social organization of its internal domestic dynamics, as well as the personalities and social relationships of its inhabitants, providing a vast array of new services for ordinary people. We picture intuitive tools for social scientists, psychologists and doctors to observe and quantify human behavior. We desire realistic virtual agents and engaging robots that can analyze and properly respond to social context. We seek intelligent algorithms to process vast collections of multimedia information to retrieve relevant material in response to semantic queries. The realization of all these tools and systems requires a fundamental grasp of the key issues, as well as knowledge and experience over the computational tools. The goal of this book is to provide a solid foundation toward achieving this. Most significantly, the focus of the book is in advanced pattern recognition techniques to automatically interpret complex behavioral patterns generated when humans interact with machines or with others. This is a challenging problem where many issues are still open, including the joint modeling of behavioral cues taking place at different time scales, the inherent uncertainty of machine detectable evidences of human behavior, the mutual influence of people involved in interactions, the presence of long term dependencies in observations extracted from human behavior, and the important role of dynamics in human behavior understanding. The contiguity of these problems with the field of pattern recognition is straightforward. The editors of the present volume (together with Alessandro Vinciarelli from the Univ. of Glasgow and Nicu Sebe from the Univ. of Trento) organized the First Workshop of Human Behavior Understanding as a satellite workshop to Int. Conference on Pattern Recognition (ICPR) in 2010, with wide attendance. Similarly, v vi Preface the Human Communicative Behaviour Analysis Workshop is held as a satellite to IEEE’s Computer Vision and Pattern Recognition (CVPR) Conference since 2008. The topics presented in the book are actively researched in the pattern recognition community. Target Audience The book is planned as a graduate textbook/guidebook on computer analysis of human behavior. Starting from the preliminaries, it covers major aspects concisely, introduces some of the most frequently used techniques and algorithms in detail, and provides examples of real applications. Each chapter is a stand-alone treatment of a relevant subject, discussing key issues, classic and recent approaches, as well as open questions and future research directions. Since the subject matter is very broad, we have restricted the number of chapters to ensure that the book can be covered in one semester and focused on providing both a good background and a comprehensive vision of the field. Each chapter is supplemented with educational material, including chapter summary, glossary, questions, and online lecture slides to help the instructor. Organization of the Book We have divided the book into four parts. The first part, called “The Tools of the Trade”, is a selection of basic topics that the reader will repeatedly come across in human behavior analysis. These chapters are all written in an intuitive and didactic way, and pave the way for more advanced discussions. The first of these four chapters introduces Bayesian methods for behavior analysis, because there are numerous uncertainties in measuring and assessing behavior, and one also needs to deal with idiosyncrasies and inconsistencies. In particular, Gaussian processes and Dirichlet processes are covered in this chapter. Almost all human behavior is temporal, but the time scale may range from milliseconds (e.g. the movement of a facial muscle) to hours (e.g. sleep cycles) or months (e.g. habits). The temporal dimension is the sine qua non of human behavior, and subsequently, the second chapter introduces basic methods for temporal analysis, including Hidden Markov Models, Conditional Random Fields, and variants thereof. This chapter also gives a concise introduction to graphical models, their factorization and how to perform inference in graphical models. Subsequently, it complements the first chapter nicely. The third chapter discusses how we can detect and track humans by computer vision methods to understand their actions, and is a prerequisite for most material presented in Parts II and III. The visual modality can provide a system with highdimensional and dynamic data, creating some of the most formidable challenges in behavior analysis, and most analysis pipelines start with detection and tracking. Preface vii The fourth chapter is an introduction to computational visual attention, and explains how humans make sense of the immensely rich perceptual input (the firehose of experience, in the memorable expression of Ben Kuijpers), as well as how computer systems can mimic the process of attention to reduce their computation load, especially in terms of bottom-up (data driven) and top-down (semantically driven) approaches. The VOCUS attention system is described in detail, and applications are given from the field of mobile robotics. Two major application areas of human behavior analysis are activity recognition, and analysis of social signals, including those that pertain to affect. The second part of the book is devoted to “Analysis of Activities”, whereas the third part deals with social and affective behavior. The first chapter in the second part (i.e. Chap. 5) is on gait and posture analysis, which is a relevant problem for clinical applications, surveillance, and ergonomics. In this chapter the reader will be introduced to a host of sensors (like gyroscopes and accelerometers) that can be used to measure useful physiological and movement related signals. Most of these sensors are easily integrated into smart phones, creating a huge potential for mobile phone applications (and games) that are based on human behavior analysis. The second part continues with Chap. 6 on hand gesture analysis. Hand gestures can be used to define natural interfaces in human-computer interaction, but they are also rich sources of social and contextual cues during conversations. This chapter builds heavily on temporal analysis and tracking material of the first part. Automatic analysis of complex human behavior is, as we mentioned earlier, one of the grand challenges of multimedia retrieval. Some behaviors are simple (e.g. walking) and can be detected by looking at simple cues. Some behaviors are complex (e.g. flirting) and require extensive knowledge and processing of context. Chapter 7 is on semantics of human behavior in image sequences, and discusses how environment influences the perceived activities, and how bottom-up and top-down approaches can be integrated for recognizing events. With the third part of the book, “Social and Affective Behaviors”, we move toward applications where pattern recognition and machine learning methods need to be complemented with psychological background knowledge and models. Social behaviors constitute a major research area, largely overlapping with the field of affective computing. This part opens up with a psychological treatise on social signals, written in a very accessible way for an audience with mainly computer science background. A wide range of signals like dominance, persuasion, shame, pride, and enthusiasm are introduced and discussed in Chap. 8. Poggi and D’Errico define a social signal as a communicative or informative signal that, either directly or indirectly, conveys information about social actions, social interactions, social emotions, social attitudes and social relationships (2010). The audio modality is more frequently used for processing such signals compared to the visual modality. Chapter 9 is an extensive and technical discussion of voice and speech analysis for assessing human communicative behavior. Chapter 10 is on analysis of affect from combined audiovisual cues. It discusses how affective signals can be measured and evaluated in a continuous manner, as viii Preface well as how to perform multimodal fusion. While the discussions of data annotation and experimental design in Chap. 10 pertain mainly to affective displays, identified issues and challenges are valid for almost the entire range of human behavior analysis. The last chapter of the third part of the book, Chap. 11, discusses social interactions and group dynamics. Four case studies from a meeting scenario are presented, where the authors combine audiovisual cues to estimate the most and least dominant person, emerging leaders, and functional roles in the group, as well as the group dynamics as a whole. The fourth part of the book is devoted to “Selected Applications” from three different research fields (ambient intelligence, biometrics, and gaming, respectively) that reflect the diversity and scope of behavioral cues and their usage. Chapter 12 describes a vision of ambient assisted living, where a smart environment monitors the activities of its inhabitants for health care purposes. The aging population of developed countries call for technologies to allow elderly to remain longer in their home environments, giving them a higher quality of life, as well as reducing the costs of health care. Smart monitoring tools, provided that they deal with acceptance and privacy issues properly, are of great value. Behavioral biometrics is the identification of a person via behavioral cues. Chapter 13 surveys this new field, and reveals that an astonishing number of behavioral cues, measured directly or indirectly, can be used to verify the identity of a person. This is a joint achievement of improvements in pattern recognition methods, as well as sensor technologies. Finally, Chap. 14 deals with games, which are major economic drivers behind computer scientific research. Games do not only serve entertainment; there are games to exercise the body and mind, and sometimes a game is the means to a completely different end. Take for instance the robotics community, which uses robot soccer as a driver behind great advances in robot mechanics, coordination, planning, and a host of other challenges. This is an application area where both real and virtual human behavior can be analyzed for improving engagement and interaction, as well as for teaching computers and robots skills at human level. Taken together, these chapters cover most of the field of human behavior analysis. It is possible to include many more tools and applications, as the field is positioned at a confluence of many different and mature research areas. Nonetheless, we hope that this collection will be a useful teaching tool for initiating newcomers, as well as a timely reference work that sums up recent research in this advancing area. Amsterdam, The Netherlands Albert Ali Salah Theo Gevers Contents Part I Tools of the Trade 1 Bayesian Methods for the Analysis of Human Behaviour . . . . . . . Gwenn Englebienne 2 Introduction to Sequence Analysis for Human Behavior Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hugues Salamin and Alessandro Vinciarelli 3 21 3 Detecting and Tracking Action Content . . . . . . . . . . . . . . . . Alper Yilmaz 41 4 Computational Visual Attention . . . . . . . . . . . . . . . . . . . . . Simone Frintrop 69 Part II Analysis of Activities 5 Methods and Technologies for Gait Analysis . . . . . . . . . . . . . . 105 Elif Surer and Alper Kose 6 Hand Gesture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Cem Keskin, Oya Aran, and Lale Akarun 7 Semantics of Human Behavior in Image Sequences . . . . . . . . . . 151 Nataliya Shapovalova, Carles Fernández, F. Xavier Roca, and Jordi Gonzàlez Part III Social and Affective Behaviors 8 Social Signals: A Psychological Perspective . . . . . . . . . . . . . . 185 Isabella Poggi and Francesca D’Errico 9 Voice and Speech Analysis in Search of States and Traits . . . . . . . 227 Björn Schuller ix x Contents 10 Continuous Analysis of Affect from Voice and Face . . . . . . . . . . 255 Hatice Gunes, Mihalis A. Nicolaou, and Maja Pantic 11 Analysis of Group Conversations: Modeling Social Verticality . . . . 293 Oya Aran and Daniel Gatica-Perez Part IV Selected Applications 12 Activity Monitoring Systems in Health Care . . . . . . . . . . . . . . 325 Ben Kröse, Tim van Oosterhout, and Tim van Kasteren 13 Behavioral, Cognitive and Virtual Biometrics . . . . . . . . . . . . . 347 Roman V. Yampolskiy 14 Human Behavior Analysis in Ambient Gaming and Playful Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Ben A.M. Schouten, Rob Tieben, Antoine van de Ven, and David W. Schouten Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Contributors Lale Akarun Computer Engineering Department, Boğaziçi University, Istanbul, Turkey, akarun@boun.edu.tr Oya Aran Idiap Research Institute, Martigny, Switzerland, oya.aran@idiap.ch Francesca D’Errico Roma Tre University, Rome, Italy, fderrico@uniroma3.it Gwenn Englebienne University of Amsterdam, Science Park 904, Amsterdam, The Netherlands, G.Englebienne@uva.nl Carles Fernández Departament de Ciències de la Computació and Computer Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain, perno@cvc.uab.es Simone Frintrop Institute of Computer Science III, Rheinische Friedrich-Wilhelms Universität Bonn, Römerstrasse 164, 53117 Bonn, Germany, frintrop@iai.uni-bonn.de Daniel Gatica-Perez Idiap Research Institute, Martigny, Switzerland, gatica@idiap.ch Jordi Gonzàlez Departament de Ciències de la Computació and Computer Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain, poal@cvc.uab.es Hatice Gunes Queen Mary University of London, London, UK, haticeg@ieee.org Tim van Kasteren Boğaziçi University, Bebek, Istanbul, Turkey, tim0306@gmail.com Cem Keskin Computer Engineering Department, Boğaziçi University, Istanbul, Turkey, keskinc@cmpe.boun.edu.tr Alper Kose Department of Biomedical Sciences, University of Sassari, Sassari, Italy, akose@uniss.it xi xii Contributors Ben Kröse University of Amsterdam, Amsterdam, The Netherlands, b.j.a.krose@uva.nl; Amsterdam University of Applied Science, Amsterdam, The Netherlands Mihalis A. Nicolaou Imperial College, London, UK, mihalis@imperial.ac.uk Tim van Oosterhout Amsterdam University of Applied Science, Amsterdam, The Netherlands, T.J.M.van.Oosterhout@hva.nl Maja Pantic Imperial College, London, UK, m.pantic@imperial.ac.uk; University of Twente, Twente, The Netherlands Isabella Poggi Roma Tre University, Rome, Italy F. Xavier Roca Departament de Ciències de la Computació and Computer Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain, xavir@cvc.uab.es Hugues Salamin School of Computing Science, University of Glasgow, Glasgow, Scotland, hsalamin@dcs.gla.ac.uk Ben A.M. Schouten Department of Industrial Design, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands, b.a.m.schouten@tue.nl David W. Schouten S. Nicolas Highschool, Prinses Irenestraat 21, 1077 WT Amsterdam, The Netherlands Björn Schuller Institute for Human–Machine Communication, Technische Universität München, 80290 Munich, Germany, schuller@tum.de Nataliya Shapovalova Departament de Ciències de la Computació and Computer Vision Center, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, Spain, shapovalova@cvc.uab.es Elif Surer Department of Biomedical Sciences, University of Sassari, Sassari, Italy, esurer@uniss.it Rob Tieben Department of Industrial Design, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands, r.tieben@tue.nl Antoine van de Ven Fontys University of Applied Sciences, Postbus 347, 5600 AH Eindhoven, The Netherlands, antoine.vandeven@fontys.nl Alessandro Vinciarelli School of Computing Science, University of Glasgow, Glasgow, Scotland, vincia@dcs.gla.ac.uk; Idiap Research Institute, Martigny, Switzerland Roman V. Yampolskiy University of Louisville, Louisville, KY, USA, roman.yampolskiy@louisville.edu Alper Yilmaz The Ohio State University, Columbus, OH 43035, USA, yilmaz.15@osu.edu Acronyms A-V ADL AF ANN ASR BCI BLSTM-NN BoW CAST CRF DAG DCT DLT DoG DP DPMM ECG EEG EM EMG EOG FEF FFT FIT FMTL FOA FRCS GF GP GSR HCI Arousal-valence Activities of daily living Anatomical frame Artificial neural network Automatic speech recognition Brain-computer interface Bi-directional long short-term memory neural network Bag of words Calibrated anatomical system technique Conditional random field Directed acylic graph Discrete cosine transform Direct linear transform Difference of Gaussians Dirichlet process Dirichlet process mixture model Electrocardiogram Electroencephalography Expectation-Maximization algorithm Electromyography Electrooculogram Frontal eye field Fast Fourier transform Feature Integration Theory Fuzzy metric temporal Horn logic Focus of attention Functional role coding scheme Global frame Gaussian process Galvanic skin response Human–Computer interaction xiii xiv HCRF HEU HMM HNR HoG ICA IOHMM IOR IR IT KDE KF kNN LDA LDA LDCRF LGN LIP LLD LPC MAP MARG MCMC MDF MFCC MLE MMORPG MoG MRF MSE MUD NMF PAD PCA PDF PF PHOG PO PP RCFL RF RFID RMSE ROI RPG Acronyms Hidden conditional random field Human event understanding Hidden Markov model Harmonics-to-noise ratio Histograms of oriented gradients Independent components analysis Input output hidden Markov model Inhibition of return Infrared Infero-temporal cortex Kernel density estimation Kalman filter k-Nearest neighbor Latent Dirichlet allocation Linear discriminant analysis Latent dynamic conditional random field Lateral geniculate nucleus Lateral intraparietal area Low-level descriptors Linear predictive coding Maximum a posteriori Magnetic, angular rate and gravity Markov chain Monte Carlo Most discriminative features Mel-Frequency cepstrum coefficients Maximum likelihood estimation Massively multiplayer online role-playing games Mixture of Gaussians Markov random field Mean squared error Massively multi user dungeon Non-negative matrix factorization Pleasure-arousal-dominance Principal components analysis Probability density function Particle filter Pyramid of histograms of oriented gradients Parieto-occipital cortex Posterior parietal cortex Recursive coarse-to-fine localization Random forests Radio frequency identification Root mean squared error Region of interest Role-playing video games Acronyms RT SAL SAN SC SGT SIFT SLAM STIP SVM SVR TF TOF V1 VDR WoZ WTA xv Reaction time Sensitive artificial listener Social affiliation networks Superior colliculus Situation graph tree Scale invariant feature transform Simultaneous localization and mapping Space time interest points Support vector machines Support vector regression Technical frame Time of flight Primary visual cortex Visual Dominance Ratio Wizard-of-Oz Winner-Take-All network