[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Efficient and robust deep learning with Correntropy-induced loss function

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from \(l_0\) to \(l_2\)) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Hinton G, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  2. Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160

  3. Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144

  4. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  5. Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919

  6. Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM

  7. Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655

  8. Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315

    Article  Google Scholar 

  9. Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland

  10. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MathSciNet  MATH  Google Scholar 

  11. Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM

  12. Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192

  13. Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

  14. Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics

  15. Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82

  16. Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189

  17. Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  18. Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358

    Google Scholar 

  19. Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763

    Article  Google Scholar 

  20. Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350

    Article  Google Scholar 

  21. Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924

  22. Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY

    Google Scholar 

  23. Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23

    Article  MATH  Google Scholar 

  24. Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298

    Article  MathSciNet  Google Scholar 

  25. He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100

    Article  MATH  Google Scholar 

  26. Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017

  27. Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6

  28. Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884

    Google Scholar 

  29. Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494

    Article  Google Scholar 

  30. Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE

  31. Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE

  32. Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453

    Article  MATH  Google Scholar 

  33. Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE

  34. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  35. Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New York

    Book  Google Scholar 

Download references

Acknowledgments

This work was supported by 973 Program (No. 2015CB351703), 863 Project (No. 2014AA01A701) and National Natural Science Foundation of China (Nos. 61372152, 61371087).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Badong Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Qu, H., Zhao, J. et al. Efficient and robust deep learning with Correntropy-induced loss function. Neural Comput & Applic 27, 1019–1031 (2016). https://doi.org/10.1007/s00521-015-1916-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-1916-x

Keywords

Navigation