Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment

Jie Zhang^19,20,
Shiguang Shan¹⁹,
Meina Kan¹⁹ &
…
Xilin Chen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8690))

Included in the following conference series:

European Conference on Computer Vision

19k Accesses
3 Altmetric

Abstract

Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region. Deep network seems a good choice to model the nonlinearity, but it is nontrivial to apply it directly. In this paper, instead of a straightforward application of deep network, we propose a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto-encoder Networks (SANs). Specifically, the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically. The following SANs then progressively refine the landmark by taking as input the local features extracted around the current landmarks (output of the previous SAN) with higher and higher resolution. Extensive experiments conducted on three challenging datasets demonstrate that our CFAN outperforms the state-of-the-art methods and performs in real-time(40+fps excluding face detection on a desktop).

Download to read the full chapter text

Chapter PDF

DC-EDN: densely connected encoder-decoder network with reinforced depthwise convolution for face alignment

Article 06 January 2021

CED-Net: contextual encoder–decoder network for 3D face reconstruction

Article 05 May 2022

CLN: Complementary Learning Network for 3D Face Reconstruction and Alignment

Keywords

References

300 faces in-the-wild challenge, http://ibug.doc.ic.ac.uk/resources/300-W/
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3451 (2013)
Google Scholar
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 545–552 (2011)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2(1), 1–127 (2009)
Article MATH MathSciNet Google Scholar
Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: IEEE International Conference on Computer Vision, ICCV (2013)
Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2887–2894 (2012)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 23(6), 681–685 (2001)
Article Google Scholar
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Computer Vision and Image Understanding (CVIU) 61(1), 38–59 (1995)
Article Google Scholar
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: British Machine Vision Conference (BMVC), vol. 17, pp. 929–938 (2006)
Google Scholar
Cristinacce, D., Cootes, T.F.: Boosted regression active shape models. In: British Machine Vision Conference (BMVC), pp. 1–10 (2007)
Google Scholar
Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585 (2012)
Google Scholar
Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1078–1085 (2010)
Google Scholar
Grangier, D., Bottou, L., Collobert, R.: Deep convolutional networks for scene parsing. In: International Conference on Machine Learning Workshops, vol. 3 (2009)
Google Scholar
Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance models. Image and Vision Computing (IVC) 23(12), 1080–1093 (2005)
Article Google Scholar
Gu, L., Kanade, T.: A generative shape regularization model for robust face alignment. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 413–426. Springer, Heidelberg (2008)
Chapter Google Scholar
Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: International Conference on Audio-and Video-based Biometric Person Authentication (AVBPA), pp. 90–95 (2001)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)
Google Scholar
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012)
Chapter Google Scholar
Liu, X.: Discriminative face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 31(11), 1941–1954 (2009)
Article Google Scholar
Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2480–2487 (2012)
Google Scholar
Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision (IJCV) 60(2), 135–164 (2004)
Article Google Scholar
Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: Xm2vtsdb: The extended m2vts database. In: International Conference on Audio and Video-based Biometric Person Authentication (AVBPA), vol. 964, pp. 965–966 (1999)
Google Scholar
Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008)
Chapter Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 896–903 (2013)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: IEEE International Conference on Computer Vision (ICCV), pp. 1034–1041 (2009)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Google Scholar
Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using boosted regression and graph models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2729–2736 (2010)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. I–511 (2001)
Google Scholar
Wu, Y., Wang, Z., Ji, Q.: Facial feature tracking under varying facial expressions and face poses based on restricted boltzmann machines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3452–3459 (2013)
Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar
Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: IEEE International Conference on Computer Vision, ICCV (2013)
Google Scholar
Zhao, X., Kim, T.K., Luo, W.: Unified face analysis by iterative multi-output random forests. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Google Scholar
Zhao, X., Shan, S., Chai, X., Chen, X.: Locality-constrained active appearance model. In: Asian Conference on Computer Vision (ACCV), pp. 636–647 (2013)
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Jie Zhang, Shiguang Shan, Meina Kan & Xilin Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Jie Zhang

Authors

Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Meina Kan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg, 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Shan, S., Kan, M., Chen, X. (2014). Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8690. Springer, Cham. https://doi.org/10.1007/978-3-319-10605-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-10605-2_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10604-5
Online ISBN: 978-3-319-10605-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics