[go: up one dir, main page]

Academia.eduAcademia.edu
Survey on approaches and challenges in multimodal biometric authentication systems Ho Chiung Ching1 Ng Hu2 C Eswaran3 1 Centre of Multimedia Computing, Faculty of Information Technology Multimedia University, Jalan Multimedia, 63100 Cyberjaya Selangor Darul Ehsan, Malaysia Tel: +60-3-83125412, Fax: +60-3-83125264, E-mail: ccho@mmu.edu.my 2 Centre of Virtual Reality and Computer Graphic, Faculty of Information Technology Multimedia University, Jalan Multimedia, 63100 Cyberjaya Selangor Darul Ehsan, Malaysia Tel: +60-3-83125411, Fax: +60-3-83125264, E-mail: nghu@mmu.edu.my 3 Centre of Multimedia Computing, Faculty of Information Technology Multimedia University, Jalan Multimedia, 63100 Cyberjaya Selangor Darul Ehsan, Malaysia Tel: +60-3-83125831, Fax: +60-3-83125264, E-mail: eswaran@mmu.edu.my Abstract This survey paper is intended to highlight current trends and approaches used in multimodal biometric authentication systems (MMBAS) research. Biometrics is a system of authentication using distinctive features of the body or behavioral actions of people. Multimodality can be defined as combination of different physiological or behavioral features for authentication purposes. Authentication is the process whereby a user proves his claim to identity. This paper aims to review existing MMBAS and multimodal biometric datasets commonly used for testing and benchmarking results. At the same time, a brief overview of fusion strategy and techniques will be discussed. . Keywords: Multimodal biometric authentication, Authentication, Security, Artificial Intelligence Introduction Security has stepped up worldwide due to recent unrest . Terrorist activity increases the need for a full range of visual surveillance and monitoring applications. It also increases the installation of biometric verification systems for securing building entry. Human identification or biometric refers to the automatic recognition of the individuals based on their physical and or behavioral characteristics. Biometric traits that are suitable to be used need to fulfill the four requirements: universality, distinctiveness, permanence and collectability. Recognition of a person based on his face, gait and spoken voice is something that happens naturally since the beginning of time. In the mid 19th century Alphonse Bertillon developed the first biometric authentication system - a fingerprints database for criminal identification. Since then, many biometric systems have been developed for criminal identification and also in authentication applications such as building accessing, computer system access etc. Biometrics can be categorized into two group: static and dynamic. Static biometric includes modalities such as: face, retina scanning, iris scanning, DNA, face thermograph, hand geometry, ear shape, finger print and palm print. Dynamic biometrics modalities includes: gait, voice spoken, keystroke dynamic and signature analysis (not only on signature pattern, but include the pressure, speed and characteristics). However, the performance of an authentication system by using only a single biometric trait (uni-modal biometric system) is not always consistent and perfect. For example: voice spoken might changes due to legitimate user’s emotion or health condition while face biometric captured are degraded by the surrounding environment ( poor lighting condition or dust on camera lens). A study found that 5 % [1] of the human population is not suitable for fingerprint based biometrics due to the distortion of the fingerprint by cuts or erosion or genetic factors which all causes the fingerprint to be too fine or faint to be scanned. At the same thing, recent research has shown that stealing a single biometric trait or spoofing attack on unimodal biometric system is not impossible anymore [2][3]. To overcome these issues, a Multimodal Biometric Authentication System (MMBAS) is being suggested. MMBAS involves multiple capturing systems to capture different type of biometrics datum and combine these data for authentication purposes. If any one of those sensors that used to capture is being breakdown, others working sensors in the MMBAS still can perform the biometric data acquisition function. Such a MMBAS can be used to secure access to a highly sensitive room by authenticating a person through face ,gait and voice biometric modality. As a person is walking towards the secure room, cameras are used to capture the motion of the walking pattern (gait) of the person. Upon reaching near to the door, the person will say a particular phrase into a microphone that is placed near the door and at the same time, another camera is used to capture his face. The voice data from the microphone, facial & gait data from cameras will then be used in decision making process for granting the permission for the person to open the door and entering the room. I. Multimodal Biometric Systems and Multimodal Biometric Datasets Part 1 of this section will highlight examples of MMBAS that have been attempted before. These MMBAS are grouped together based on the biometric modalities combination used. Part 2 reviews multimodal biometric datasets that have been used for benchmarking and testing multimodal biometric algorithms. 1. Multimodal biometric systems Acoustic and visual features As early as 1993, Chibelushi et al. [4] have proposed incorporating audio and visual speech, where motion of visible articulators for speaker recognition, using a simple linear combination scheme. Brunelli and Falavigna [5] have proposed a person identification system based on acoustic and visual features, where they use a HyperBF network as the best performing fusion module. Choudhury et al. [6] proposed a multi-modal person recognition system using unconstrained audio and video. The combination of the two experts is performed using a Bayesian net. Face and voice Duc et al. [7] have proposed a person identification system based on face and text-dependent speech by using simple averaging technique. Ben-Yacoub [8] attempted a multi-modal data fusion approach for person authentication, based on Support Vector Machines (SVM) to combine the results obtained from face identification and a text-dependent voice. Face, lip and voice Dieckmann et al. [9] implemented a decision level fusion scheme, based on a 2-out-of-3 majority voting which incorporates two unique biometric modalities: face and voice, to authentication a legitimate user by comparing the static face, lip motion and voice spoken with the pre-stored data. Lip motion and voice Jourlin et al. [10] used a lip tracker using visual features, and on a text-dependent speech recognition. The fused score is computed as the weighted sum of the scores generated by the two experts. Frontal face, face profile, and voice Kittler et al. [11] developed a MMBAS based on simple sum rule to incorporate frontal face, face profile, and voice. The best combination results are obtained for a simple sum rule. Pigeon [12] [13] developed a MMBAS based on simple fusion algorithms to incorporate: frontal face, face profile, and voice. Verlinde et al. [14], tested biometric system with various fusion modules that combine three biometric traits – frontal face, face profile, and voice. Face and fingerprint L. Hong et al. [15] implemented a MMBAS which incorporate two different biometrics face and fingerprints. Their approach used a fusion algorithm that operates at the expert (soft) decision level, where it combines the matching scores from the different experts (under the statistically independence hypothesis). Arun Ross et al [16] developed a MMBAS which incorporate face and fingerprint by using a fusion on simple sum rule and decision tree. 2. Multimodal Biometric Datasets BIOMET The BIOMET [17] dataset was developed by researchers from the Institute National des Télécommunications. Dept. EPH, France ; Royal Military Academy , Belgium ; DIVA Group, University of Fribourg, Informatics Dept.; Switzerland and COMMLEC Dept., Paris, France. A total of 327 individuals have participated in the data acquisition process. Five different modalities are present in the BIOMET database namely audio, face images, hand image, fingerprint and signature. For the audio and face images capturing system, three types of cameras system involved; a conventional digital camera for audio-video sequences of a talking person a infrared camera to reduce the influence of the ambient light structured light project system that consists of camera and a projector for 3D acquisition model. Hand images were scanned using a conventional off the shelf printer (HP Scanjet 5300) while the signatures was captured using an A6 sized graphical tablet, INTUOS 2 from WACOM [18] .For fingerprint images, two sensors were used: a SAGEM Morphotouch 3.0 and a GEMPLUSPC Touch 430. These were used to detect the differences between the ridges and the valleys of the fingerprints. MCYT The MCYT [19] database consists of fingerprints features and online signatures from the same individual. This is aimed for evaluation of biometrics recognition algorithms and fusion algorithms for bimodal biometrics. A total of 330 individuals have been acquired in four institutions, Universidad Politecnica de Madrid, University of Valladolid, University of the Basque Country and Escola Universitaria Politecnica de Mataro, Barcelona. In the MCYT_Fingerprint subcorpus, 12 fingerprint samples of each finger were scanned by two sensors (optical and capacitive) from each participating individual. It produced 79200 fingerprints in total. In the MCYT_Signature subcorpus, 25 legitimate signatures and 25 forgeries signatures were taken from each participant individual. It produced 16,500 signatures for the database. BANCA BANCA [20] is a bimodal biometric database which has face and speech modalities. It was developed as a part of the BANCA project, which aims to build secure and reliable access control scheme for applications over the Internet. A total of 208 people were recorded, half men and half women. These subjects were recorded in 3 different scenarios; controlled, degraded and adverse over a period of 3 months. The acquisition system for BANCA utilizes two cameras: a cheap web camera and a high-quality camera. At the same time, a poor quality microphone and a high-quality microphone was used to record speech. The cameras were left on automatic mode during acquisition. The database was recorded on a PAL DV system, with the video compressed at a 5:1 ratio while the audio was left uncompressed. Each subject recorded 12 sessions. These sessions required the subject to record 1 true client access and one impostor attack. The sessions were further recorded under degraded (where low quality video cameras and microphones were used), controlled and degraded conditions. An open set verification scheme was used to evaluate the database. In an open set verification scheme, new users are added without changing the authentication system. XM2VTS Multi Modal Verification for Teleservices and Security applications or XM2VTS [21] is a bi-modal face and speech database created by the Centre Vision, Speech and Signal Processing at the University of Surrey. It was created to help researchers test their algorithm on a large scale high quality dataset. A total of 295 people were recorded. Each person was recorded twice, once for speech and once for rotating head movement. The entire database was recorded using a Sony VX1000E digital video camera and DHR1000UX digital VCR. This captures video at a colour sampling resolution of 4:2:0 and audio at frequency 32 kHz and sampling rate of 16 bit. The camera was chosen as it could be connected to a computer via the firewire port. In addition, software has been written to allow manipulation of the video camera directly from the PC. Lighting was provided while the background area was uniformly colored to ease segmentation. A close set verification scheme was used to evaluate the database. In a closed set verification scheme, the population of clients is fixed, in this case the representation that is adopted for the client and the verification scheme is based on the same training data. Here, anyone not in the training data is considered an impostor. II. Performance evaluation on Multimodal Biometric Authentication Systems Arun Ross et al. [16] has shown that by using simple sum rule a MMBAS using face, fingerprint and hand geometry with a False Accept Rate (FAR) of 0.1 % can produce a Genuine Accept Rate (GAR) of 99.2%. Reducing the biometric modalities to just face and fingerprint only achieved GAR of 94% while in another experiment the MMBAS which uses face and hand geometry produce GAR of 87%. The study also shows that a tri-modal biometric authentication system will have far better performance than a unimodal biometric authentication system using only fingerprint (GAR of 83%), or only face (GAR of 68%) or only hand geometry (GAR of 46%) .Those results proves that three biometric modalities perform better than two biometric modalities, while suggesting that face and signature modalities are somewhat more accurate than a hand print. Other research results from Changhan Park et al. [22] shows that a face and speaker bimodal authentication system with FAR of 0.0001% managed to achieved 99.99 % verification rate, which is better that that of a unimodal biometric authentication system which uses face only (98.5%) and speech only (97.37%) modalities.. III. Performance evaluation on Multimodal Biometric Authentication Systems with Fusion Verlinde et al. [14], achieved a Total Success Rate of 99.9% after testing a MMBAS with various fusion modules that combinines the three biometric traits – user’s image profile from color-based segmentation, user’s frontal face image and user’s text independent voice. Research results from T.Ko [23] shows that by using multi-instance fusion, a MMBAS ( face with fingerprint) at FAR 0.1 % can achieve True Accept Rate of 95 % ( using Left index) , 95.5% ( using right index) , 98.8% ( max/sum or average). At the same time, the True Accept Rate for unimodal biometric are only 68% (face) and 83% (fingerprint). Kar-Ann Toh et al. [24] shows that a MMBAS (fingerprint and voice) with FAR 0.1 % acheved Authentic Acceptance Rate of 86 % ( Parzen), 90% (Optimal Weighting Method) , 91% (Neural Network), 92% (PROD), 93% (TahnNet) 94% ( SUM). In this work, True Accept Rate for using a single biometric modality ranged from 60% (voice) to 86% ( fingerprint). IV. Fusion Levels, Fusion Strategy and Fusion Techniques Part 1 of this section will discuss the various different levels of in which fusion can take place. Part 2 describes two fusion strategy that can be adopted for multimodal biometric research. Part 3 gives an introduction to a few fusion techniques. 1. Fusion Levels Fusion is the stage where the individual biometric modalities are incorporated together. Fusion can be done at different levels [16], either at feature extraction level where features extracted using 2 or more sensors are linked together or at matching score level where the matching scores obtained from multiple matchers are combined. As the features from one biometric feature are independent from features extracted from another, the feature vectors can be combined into one at the cost of a higher dimensionality. The new feature vector will hopefully be more discriminant. Fusion can also be done at matching score levels, whereby the proximity of the feature vector with the template vector is given a score. Fusion can also be done at decision level, where the accept/reject decisions of the multiple systems are consolidated. These fusion methods will be evaluated for their effectiveness and will be improved to help arrive at a novel method of fusion. 2. Fusion Strategy Fusion strategy can be divided into two. Firstly, biometric multimodality can be taken to be a classifier combination problem[25]. Work has been done using Bayesian framework to classify modalities as clients and imposters.[11]. The second approach is to treat fusion as a pattern classification problem. Here, scores given by individual expert modalities are to be given accept/reject labels[26]. 3. Fusion Technique Sum rule This is a technique whereby the weighted average of the scores from all the multiple modalities are taken. Weighted average will be taken from all possible combination of modalities. Ross’ work[16] proved that this techniques gives better results than using single modalities. Decision Trees A decision tree making use of if-then-else on the training data to give a class label to input data.. Here a feature that maximizes information gain at a particular node is determined. This technique again has been demonstrated to work better when multimodalities are used as compared to single modalities. Linear Discriminant Analysis This technique transform the 3-dimensional score vectors into a new space which maximizes the between class separation. Distance rules, such as the Mahalanobis distance rule can be used to differentiate between genuine vectors and impostor vectors. Voting Technique A global decision is made by fusing the hard decisions made by 2 modality expert. This makes use of the AND OR rule. Fusion using this technique is severe, limiting its usage to early testing. V. Challenges for Multimodal Biometric Authentication System Research 1. Are appropriate multimodal datasets available? Many multimodal datasets are available either free to be downloaded such as Human ID Gait Challenge Problem [27] or to be purchase with nominal cost, such as BANCA [20] and XM2VTS [21] . As these dataset are not by themselves research result or evaluation report, there is no standard protocol to follow during the production process. A well designed or a good quality dataset can greatly facilitate research and used to benchmarks the difference algorithms performance. However, a poorly designed dataset can be a disaster, as it can create wastage of others work and time to test and evaluate on it. Recently, many of the well known dataset, such as BANCA [20] ,XM2VTS [21], Human ID Gait Challenge Problem [27] etc. and so forth have been recognized by some major journals and being considered as published papers. However, dataset like BANCA [20] and XM2VTS [21] are not fully utilized for multimodal biometric performance evolution, instead it is only used for face authentication only [28] [29]. 2. Finding the suitable the fusion level and fusion algorithm There are several levels of fusion which can take place. These include consolidating the data at i) trait capturing level, ii) feature level, iii) match score level iv) rank level and v) decision level. There are many of fusion algorithms to choose from. The question is which is the best fusion algorithm to produce the best verification rate ? And when is the fusion algorithm should be involved during the development of the MMBAS? 3.Design for MMBAS datasets? The question on what is the smallest size sample set that captures a given level of variation has to be considered. Too few data for testing and training might not convince others when evaluating the MMBAS performance. However, time and cost that spend on building a large dataset need to be justified. Criteria like pose, lighting, face expression outfit of the subject during the biometric trait capturing process is another challenge. 4. Cost versus benefits Is the benefit of implementing a multimodal biometric system proportional to the investment? As the cost on building the dataset (for training, testing and execution in real run) for multimodal biometric is higher than the single modal biometric. How is return on the investment measured? In terms of better performance of authentication? VI. Conclusion This paper was written in an attempt to understand MMBAS and its related problem, resources and challenges. Multimodal biometric datasets are available, as are various strategy and techniques that can be build upon in order to solve multimodal biometric related problems. It is hope that this paper will be useful to multimodal biometric researchers who are looking for resources as well as results which can be compared against their own work. References [1] National Institute of Standards and Technology (NIST), 13 Nov 2003. Summary of NIST Standards for Biometric Accuracy, Tamper Resistance and Interoperability. URL http://www.itl.nist.gov/iad/894.03/NISTAPP_Nov02.pdf [2] J. Leyden. 16 May 2002. Gummi bears defeat fingerprint sensors. Citing Internet sources URL http://www.theregister.com/2002/16/gummi_bears_defeat_fingerprint_sensors. [3] D.Kingsley. 20 June 2002. Fingerprint security easy to fool. URL http://www.abc.net.au/science/news/stories/s585792.htm [4] C.C. Chibelushi, J.S. Mason, and F. Deravi. 1993. Integration of acoustic and visual speech for speaker recognition. In EUROSPEECH’93, 157–160,1993. [5] R. Brunelli et al. 1995. Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(10):955–966, October 1995. [6] T. Choudhury et al. 1999. Multimodal Person Recognition using Unconstrained Audio and Video. In Second International Conference on Audio- and Video-based Biometric Person Authentication, pages 176–181,WashingtonD. C., USA, March1999. [7] B.Duc et al. 1997. Person authentication by fusing face and speech information. In Proceedings of the First International Conference on Audio-and Video-based Biometric Person Authentication, Lecture Notes in Computer Science. SpringerVerlag. [8] S. Ben-Yacoub. 1998. Multi-Modal Data Fusion for Person Authentication using SVM. IDIAP-RR 7, IDIAP. [9] U. Dieckmann et al. 1997. Sesam: A biometric person identification system using sensor fusion. Pattern recognition letters, 18(9):827–833. [10] P. Jourlin et al. 1997. Acoustic-labial speaker verification. In Proceedings of the First International Conference on Audio- and Video-based Biometric Person Authentication, Lecture Notes in Computer Science. Springer Verlag. [11] J. Kittler et al. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, March 1998. [12] S. Pigeon et al. 1997. Profile authentication using a Chamfer matching algorithm. In Proceedings of the First International Conference on Audio- and Video-based Biometric Person Authentication, Lecture Notes in Computer Science, pages 185–192. Springer Verlag, 1997. [13] S. Pigeon. 1999. Authentification multimodale d’identit´e. PhD thesis, Universit´e Catholique de Louvain. [14] P. Verlinde et al. July 2000. Multi-Modal Identity Verification Using Expert Fusion. Information Fusion, 1(1):17-33. [15] L. Hong et al. 1998. Integrating Faces and Fingerprints for Personal Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1295–1307 [16] Arun Ross and Anil Jain. 2003. Information Fusion in Biometrics. Elsevier Science B. V. , Pattern Recognition Letters 24 ( 2003) pp 2115 – 2125. [17] Sonia Garcia-Salicetti et al. 2003. BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities. 4th International Conference, AVBPA 2003 Guildford, UK, Proceedings 845 – 853. [18] Wacom Technology Co. Citing Internet sources URL http://www.wacom.com [19] J. Ortega-Garcia et al. 2003. MCYT baseline corpus: a bimodal biometric database. IEE Proceeding Vision Image Signal Process, Vol 150. No 6. [20] Enrique Bailly-Bailliére et al. 2003. The BANCA Database and Evaluation Protocol. In Proceedings of the 4th International Conference on Audio-Video Based Biometric Person Authentication, 625-638. [21] K. Messer et al. 1999. XM2VTSDB: Extended M2VTS Database. In Proceedings of the 2nd International Conference on Audio-Video Based Biometric Person Authentication, 72-77. [22] Changhan Park et al Jan 2006. Multi-Modal Human Verification Using Face and Speech. Proceeding of the fourth IEEE International Conference on Computer Vision Systems (ICVS 2006).):54 - 54 [23] Teddy Ko, 19-21 Oct. 2005. Multimodal Biometric Identification for Large User Population Using Fingerprint, Face and Iris Recognition. Proceeding of 34th Applied Imagery and Pattern Recognition Workshop (AIP05). [24] Kar-Ann Toh et al. April 2004. Combination of Hyperbolic Functions for Multimodal Biometrics data Fusion. IEEE Transactions On Systems, Man and Cybernetics – Part B: Cybernetics, vol. 34, No 2.: 1196 - 1209. [25] L.Xu et al.,1992. Methods of combining Multiple Classifiers and Their Application to Handwritting Recognition”, IEEE Transaction on Systems, Man and Cyber, vol. 22, no.3 418-435. [26] S. Theodoridis et al. Pattern Recognition, Academic Press 1999. [27] S. Sarkar et al. , 2005. The Human ID Gait Challenge Problem: Data Sets, Performance, and Analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence, VOL.27,NO.2. : 162 – 171. [28] K. Messer et a.l. , 2003. Face Verification Competition on XM2VTS Database. In 4th International Conference on Audio- and Video-Based Biometric Person Authentication, AVBPA. Guildford, UK. [29] K. Messer et al., 2004. Face authentication competition on the BANCA database . In International Conference on Biometric Authentication, ICBA ( 2004).