Improving the Head Pose Variation Problem in Face Recognition for Mobile Robots
<p>Typical pipeline for face recognition (FR) systems.</p> "> Figure 2
<p>Embedding distance comparison between various poses of the same individual using ArcFace with its standard distance threshold of <math display="inline"><semantics> <mrow> <mn>0.4</mn> </mrow> </semantics></math>. The distance between (<b>a</b>,<b>b</b>) is <math display="inline"><semantics> <mrow> <mn>0.412</mn> </mrow> </semantics></math>, which is considered a negative match. On the other side, the distance between (<b>b</b>,<b>c</b>) is <math display="inline"><semantics> <mrow> <mn>0.347</mn> </mrow> </semantics></math>, which is considered a positive match.</p> "> Figure 3
<p>Schema of the HPE system stacking three methods, where <math display="inline"><semantics> <mover> <mi>x</mi> <mo>¯</mo> </mover> </semantics></math> stands for the average of the three estimated poses.</p> "> Figure 4
<p>Log-in view of the developed application. It is the first window appearing when the tool is launched.</p> "> Figure 5
<p>Interface of the interactive application for collecting face images. The interface is composed of 5 parts: the collection state (<b>①</b>), the camera feed (<b>②</b>), the control buttons (<b>③</b>), a black pointer (<b>④</b>), and a colorbar (<b>⑤</b>).</p> "> Figure 6
<p>Estimated pitch and yaw results from a single individual in the MAPIR Faces dataset.</p> "> Figure 7
<p>Face similarity results of ArcFace distributed across the selected head pose space. The blue samples represent correct identifications, while the red ones represent false rejections. The hue of the color is proportional to the similarity scores. The images on the left are examples of: a selected pose (<b>a</b>), a correct identification (<b>b</b>), and a false rejection (<b>c</b>).</p> "> Figure 8
<p>Face similarity results of ArcFace using a non-frontal face image. The gallery image depicts a head pose with an estimated pitch of 20° and a yaw of −37.2°. These samples appear in white at the top-left of the figure. The images on the left are examples of: a selected pose (<b>a</b>), a correct identification (<b>b</b>), and a false rejection (<b>c</b>).</p> "> Figure 9
<p>Length of a combination against the number of combinations of said length in trillions.</p> "> Figure 10
<p>Metrics for the top-1 accuracy configurations found. (<b>a</b>) Accuracy against the number of poses. (<b>b</b>) Average distance to the nearest true match against the number of poses. (<b>c</b>) Average distance to the nearest false match against the number of poses.</p> "> Figure 11
<p>Top-5 accuracy configurations for all lengths of the gallery set. For example, the first row in the 1–10 column reports, as black squares, the 5 most optimal poses to be stored in the gallery set from a <math display="inline"><semantics> <mrow> <mn>7</mn> <mo>×</mo> <mn>7</mn> </mrow> </semantics></math> grid of them.</p> "> Figure 12
<p>Example of the top-1 combination with 3 poses. (<b>a</b>) shows the poses in black in a <math display="inline"><semantics> <mrow> <mn>7</mn> <mo>×</mo> <mn>7</mn> </mrow> </semantics></math> grid. (<b>b</b>) shows sample images for each of the 3 poses for an individual.</p> ">
:1. Introduction
- It requires minimal interaction from the user as long as the robot has a sufficiently clear view of their face. This feature is imprescindible in HRI to achieve more natural interactions, not relying as much on human cooperation. In comparison, other methods are more intrusive on the user’s daily lives, for example, voice recognition always requires the user to speak, and fingerprint recognition requires the user to physically touch a sensor.
- The only peripheral needed for FR is a camera, which is a common component in most mobile robots.
- Since the robot is mobile, it can reposition itself to find the right perspective of the face. This is not always possible due to obstacles and requires the robot to estimate the pose of the head.
- Since the robot is equipped with speakers to interact with humans, it can talk with the user in an effort to get a frontal face image. This is more intrusive in the users’ daily lives, and if these occurrences are too common they might annoy the users.
2. Related Works
2.1. Face Recognition
2.2. Head Pose Estimation
2.3. Related Datasets
- Biwi Kinect Head Pose Database [37]:
- This repository has been a major source of inspiration for this paper. It contains a large amount of images per individual () in various poses, each annotated with their corresponding Euler angle. The major downside of this dataset is the fact that the poses available for each individual vary greatly from each other, lacking some of the more extreme poses for most individuals.
- UcoHead dataset [38]:
- Dataset that shares many commonalities with our desired dataset: it provides many images from a set of individuals and a uniform distribution of pitch and yaw poses. Despite all these similarities, some of its characteristics makes it unfit to analyze FR algorithms, namely: the low resolution of its images (), and the fact that they are all grayscale.
- UMDFaces dataset [39]:
- One of the largest datasets available with fine-grained pose annotations. Nevertheless, they are gathered from public media (e.g., newspapers, internet, etc.), and therefore the poses available for each individual vary considerably from one another.
3. Face Collection Tool
3.1. Head Pose Estimation
- 3D Dense Face Alignment (3DDFA) [34]
- is a pose estimation method that attempts to fit a 3D morphable face model using a cascaded CNN and a set of face landmarks. A Pytorch implementation of the method described is provided as a GitHub repository ( This implementation iterates over the original paper and provides various models pretrained on the 300W and 300W-LP datasets [34,40].
- Hopenet [35]
- is a CNN based HPE method which aims to compute the 3D pose components directly from an RGB image instead of estimating face landmarks as an intermediate step. Some interesting contributions in this paper are: use of multiple loss functions (one per angle component); and a study on the effects of low resolution images and the usage of landmarks in pose estimation. A public implementation is available at Github (, which contains some models pre-trained on 300W-LP.
- FSA-Net [36]
- is one of the most recent HPE publications available. The authors introduce some interesting ideas to the field of HPE, e.g., they borrow some ideas from age estimation systems, and they use fine-grained spacial structures. At the time of this work, they allegedly surpass most state-of-the-art methods in HPE benchmarks such as Biwi [37] and AFLW-2000 [34]. A public implementation is available in Github (, which also contains some models pre-trained on 300W-LP and Biwi.
3.2. Interactive Application
- Log-in.
- As the user starts the application, they are prompted to follow a number of initial configuration steps: entering an identifier, selecting a storage location, and choosing a camera; as seen in Figure 4. The application has been developed with the usage of an RGB-D camera in mind. At the time of this writing, the application supports the usage of the Orbecc Astra RGB-D camera through the Openni library ( Additionally, RGB camera support is provided via the OpenCV library ( This enables a greater number of devices to use the application, since RGB-D cameras have yet to receive a widespread adoption.
- Image collection.
- The user is presented with the main view of the application (Figure 5). This view contains three main components for the user to control and receive feedback about the collection process—the collection state (Figure 5①), the camera feed (Figure 5②) and the control buttons (Figure 5③). The camera feed and the control buttons are very straightforward: the former shows the video feed provided by the camera, while the latter is used to pause or finish the collection process. The collection state contains a 2D grid (Figure 5①) which divides evenly a pitch-yaw space—the yaw ranges between and the pitch ranges between . The limits of this 2D space have been defined in consideration towards preserving the accuracy of the methods selected in Section 3.1, as both their accuracy and stability decay significantly outside these bounds. A black pointer (Figure 5④) shows the current estimation provided by the system in real-time. The images stored in the dataset are chosen in consideration of their estimated head pose. Currently, the application stores a single image for each cell. The distance from the current yaw-pitch pose to the center of the cell is used as a scoring function. The nearer a pose is to the center, the more useful it is considered to the system. In this way, the color of each cell represents the value of this score at the current moment. The colorbar on the right (Figure 5⑤) shows the user the range of scores, where the blue colored cells represent that a pose close to the center of each cell is already stored within the system.
4. MAPIR Faces Dataset
- roll: Roll component of the estimation in degrees.
- pitch: Pitch component of the estimation in degrees.
- yaw: Yaw component of the estimation in degrees.
- rgb_image: Name of the corresponding intensity image (from the current folder).
- depth_image: Name of the corresponding depth image (from current folder) if any is available.
- bbox: Contains the position of the face present in the image in the form of a bounding box (an array containing the [left, top, right, bottom] positions in the image in pixels). This estimated bounding box has been computed using MTCNN [43]—a common choice accompanying many state-of-the-art FR methods.
5. Evaluation on Face Recognition Algorithms
- A gallery set of known faces chosen from the complete dataset. This subset will be used as representative images for each of the individuals. Most commonly, the gallery set is composed by frontal images (center of the grid described in Section 3.2), although it can contain more images—a topic discussed more in-depth in Section 6.
- A query set composed by the remaining images in the dataset will be used to evaluate the FR system according to the different metrics. This process is carried out by matching the images of the query set to the most similar face images in the gallery set. This comparison commonly has the following requirements: (i) the face embeddings computed by a DNN, (ii) a comparison function, and (iii) a distance threshold used to accept or reject the match.
- FaceNet [11].
- One of the most influential FR papers in recent years. It introduced fundamental concepts such as direct optimization of the embeddings and the Triplet Loss function. This optimization technique attempts to learn embeddings with smaller distances for all pairs of images of the same person (positive pairs) compared to the distance for different persons (negative pairs). This work uses a community implementation ( of FaceNet based on Tensorflow. It provides various pretrained models, particularly a model trained on the VGGFace2 dataset [20] is used, as it is the most accurate.
- ArcFace [12].
- One of the more recent FR systems which shows significant performance upgrades across most common benchmarks in comparison to FaceNet. It introduces the ArcFace Loss function which, following the steps of [13,44], optimizes the angular distance between classes using a modified softmax loss function. The official implementation ( uses MXNet and provides various models trained on a cleaned version of the MS-Celeb-1M dataset [21]. This work employs the provided model LResNet100E-IR.
- Probabilistic Face Embeddings (PFE) [45].
- A recent state-of-the-art FR approach that represents the usual face embeddings as Gaussian distributions. This method implies that some of the feature space is wasted to take into account for unreliable features such as noise, blur, and so forth—all of which can be mitigated by probabilistic embeddings. An official code implementation ( based on Tensorflow is provided to accompany the paper. Particularly, this work uses the model trained on MS-Celeb-1M dataset.
6. Optimization of Face Recognition Algorithms
- Top-1 accuracy.
- It measures how many images are correctly identified by the nearest neighbors classification among the face embeddings. Additionally, the distance between the nearest pair found is thresholded according to the FR method used. This is a typical requirement in these pipelines in order to filter out unknown individuals if any exists.
- Distance to nearest true pair.
- It measures how well the FR system maps the face images to the deep face embeddings. Each sample embedding is compared to the same-individual embeddings in the gallery set. The resulting metric is the distance to the nearest embedding within them, this being inversely correlated to the efficacy of FR.
- Distance to the nearest false pair.
- It measures the separation between embeddings of different individuals in the dataset. The sample embeddings are compared to all different-individual embeddings in the gallery set. The smaller the distance between the sample embedding and the nearest different-individual embedding, the more probable it is to find false positives—particularly when the distance is inferior to the threshold.
- An initial population is crafted from the top-30 accuracy individuals of the last population found by exhaustive search—mostly the configurations of length . To fill each individual up to the required length, the remaining spots are filled by random poses.
- The mutation operator for each individual of length N can change each of the poses with a probability of . As each individual is considered a set, the poses used for the substitution are selected from the ones not already in the set.
- No crossover operator has been used.
- The most promising individuals from the population are chosen using binary tournament selection.
7. Discussion
Future Directions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Orlandini, A.; Kristoffersson, A.; Almquist, L.; Björkman, P.; Cesta, A.; Cortellessa, G.; Galindo, C.; Gonzalez-Jimenez, J.; Gustafsson, K.; Kiselev, A.; et al. ExCITE Project: A review of forty-two months of robotic telepresence technology evolution. Presence Teleoperators Virtual Environ. 2016, 25, 204–221. [Google Scholar] [CrossRef] [Green Version]
- Borghese, N.A.; Bulgheroni, M.; Miralles, F.; Savanovic, A.; Ferrante, S.; Kounoudes, T.; Cid Gala, M.; Loutfi, A.; Cangelosi, A.; Gonzalez-Jimenez, J.; et al. Heterogeneous Non Obtrusive Platform to Monitor, Assist and Provide Recommendations to Elders at Home: The MoveCare Platform. In Ambient Assisted Living; Lecture Notes in Electrical Engineering; Casiddu, N., Porfirione, C., Monteriù, A., Cavallo, F., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 53–69. [Google Scholar] [CrossRef]
- Sabhanayagam, T.; Venkatesan, V.P.; Senthamaraikannan, K. A Comprehensive Survey on Various Biometric Systems. Int. J. Appl. Eng. Res. 2018, 13, 2276–2297. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments; Technical report; University of Massachusetts: Amherst, MA, USA, 2007. [Google Scholar]
- Klare, B.F.; Klein, B.; Taborsky, E.; Blanton, A.; Cheney, J.; Allen, K.; Grother, P.; Mah, A.; Burge, M.; Jain, A.K. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1931–1939, ISSN: 1063-6919. [Google Scholar] [CrossRef]
- Zheng, T.; Deng, W. Cross-Pose LFW: A Database for Studying Cross-Pose Face Recognition in Unconstrained Environments; Technical Report 18-01; Beijing University of Posts and Telecommunications: Beijing, China, 2018. [Google Scholar]
- Zou, X.; Kittler, J.; Messer, K. Illumination Invariant Face Recognition: A Survey. In Proceedings of the 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems, Crystal City, VA, USA, 27–29 September 2007; pp. 1–8. [Google Scholar] [CrossRef]
- Hong, S.; Im, W.; Ryu, J.; Yang, H.S. SSPP-DAN: Deep domain adaptation network for face recognition with single sample per person. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 825–829, ISSN: 2381-8549. [Google Scholar] [CrossRef] [Green Version]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5265–5274. [Google Scholar]
- González-Jiménez, J.; Galindo, C.; Ruiz-Sarmiento, J.R. Technical improvements of the Giraff telepresence robot based on users’ evaluation. In Proceedings of the 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, Paris, France, 9–13 September 2012; pp. 827–832. [Google Scholar] [CrossRef]
- Prince, S.J. Computer Vision: Models, Learning, and Inference; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, Present, and Future of Face Recognition: A Review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
- Turk, M.; Pentland, A. Eigenfaces for Recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [Google Scholar] [CrossRef]
- Belhumeur, P.; Hespanha, J.; Kriegman, D. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef] [Green Version]
- Bledsoe, W.W. The Model Method in Facial Recognition; Panoramic Research Inc.: Palo Alto, CA, USA, 1964. [Google Scholar]
- Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. VGGFace2: A Dataset for Recognising Faces across Pose and Age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 67–74. [Google Scholar] [CrossRef] [Green Version]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. In Computer Vision—ECCV 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 87–102. [Google Scholar] [CrossRef] [Green Version]
- Kemelmacher-Shlizerman, I.; Seitz, S.M.; Miller, D.; Brossard, E. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4873–4882. [Google Scholar]
- Nech, A.; Kemelmacher-Shlizerman, I. Level Playing Field for Million Scale Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7044–7053. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9, ISSN: 1063-6919. [Google Scholar] [CrossRef] [Green Version]
- Huang, R.; Zhang, S.; Li, T.; He, R. Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis. arXiv 2017, arXiv:1704.04086. [Google Scholar]
- Cao, K.; Rong, Y.; Li, C.; Tang, X.; Change Loy, C. Pose-Robust Face Recognition via Deep Residual Equivariant Mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5187–5196. [Google Scholar]
- Saragih, J.M.; Lucey, S.; Cohn, J.F. Deformable Model Fitting by Regularized Landmark Mean-Shift. Int. J. Comput. Vis. 2011, 91, 200–215. [Google Scholar] [CrossRef]
- Zhu, X.; Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2879–2886. [Google Scholar] [CrossRef]
- Masi, I.; Chang, F.; Choi, J.; Harel, S.; Kim, J.; Kim, K.; Leksut, J.; Rawls, S.; Wu, Y.; Hassner, T.; et al. Learning Pose-Aware Models for Pose-Invariant Face Recognition in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 379–393. [Google Scholar] [CrossRef]
- Cootes, T.; Edwards, G.; Taylor, C. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 681–685. [Google Scholar] [CrossRef] [Green Version]
- Matthews, I.; Baker, S. Active Appearance Models Revisited. Int. J. Comput. Vis. 2004, 60, 135–164. [Google Scholar] [CrossRef] [Green Version]
- Zhu, X.; Lei, Z.; Liu, X.; Shi, H.; Li, S.Z. Face Alignment Across Large Poses: A 3D Solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 146–155. [Google Scholar]
- Ruiz, N.; Chong, E.; Rehg, J.M. Fine-Grained Head Pose Estimation Without Keypoints. arXiv 2018, arXiv:1710.00925. [Google Scholar]
- Yang, T.Y.; Chen, Y.T.; Lin, Y.Y.; Chuang, Y.Y. FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1087–1096. [Google Scholar]
- Fanelli, G.; Dantone, M.; Gall, J.; Fossati, A.; Van Gool, L. Random Forests for Real Time 3D Face Analysis. Int. J. Comput. Vis. 2013, 101, 437–458. [Google Scholar] [CrossRef] [Green Version]
- Muñoz-Salinas, R.; Yeguas-Bolivar, E.; Saffiotti, A.; Medina-Carnicer, R. Multi-camera head pose estimation. Mach. Vis. Appl. 2012, 23, 479–490. [Google Scholar] [CrossRef]
- Bansal, A.; Nanduri, A.; Castillo, C.; Ranjan, R.; Chellappa, R. UMDFaces: An Annotated Face Dataset for Training Deep Networks. arXiv 2017, arXiv:1611.01484. [Google Scholar]
- Sagonas, C.; Tzimiropoulos, G.; Zafeiriou, S.; Pantic, M. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 397–403. [Google Scholar]
- Savran, A.; Alyüz, N.; Dibeklioğlu, H.; Çeliktutan, O.; Gökberk, B.; Sankur, B.; Akarun, L. Bosphorus Database for 3D Face Analysis. In Biometrics and Identity Management; Lecture Notes in Computer Science; Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 47–56. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: (accessed on 19 January 2021).
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Shi, Y.; Jain, A.K. Probabilistic Face Embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning, 1st ed.; Addison-Wesley Longman Publishing Co., Inc.: Reading, MA, USA, 1989. [Google Scholar]
Roll | Pitch | Yaw | Average | |
3DDFA | ||||
Hopenet | ||||
FSA-Net | ||||
Stacking |
CPU | GPU | |
3DDFA | ||
Hopenet | ||
FSA-Net | ||
Stacking |
Method | ||||
FaceNet [11] | ||||
ArcFace [12] | ||||
PFE [45] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Baltanas, S.-F.; Ruiz-Sarmiento, J.-R.; Gonzalez-Jimenez, J. Improving the Head Pose Variation Problem in Face Recognition for Mobile Robots. Sensors 2021, 21, 659.
Baltanas S-F, Ruiz-Sarmiento J-R, Gonzalez-Jimenez J. Improving the Head Pose Variation Problem in Face Recognition for Mobile Robots. Sensors. 2021; 21(2):659.
Chicago/Turabian StyleBaltanas, Samuel-Felipe, Jose-Raul Ruiz-Sarmiento, and Javier Gonzalez-Jimenez. 2021. "Improving the Head Pose Variation Problem in Face Recognition for Mobile Robots" Sensors 21, no. 2: 659.
APA StyleBaltanas, S.-F., Ruiz-Sarmiento, J.-R., & Gonzalez-Jimenez, J. (2021). Improving the Head Pose Variation Problem in Face Recognition for Mobile Robots. Sensors, 21(2), 659.