Abstract
We present a new system to optimize feature extraction from 2D-topological data like images in the context of deep learning using correlation among training samples and curriculum learning optimization (CLO). The system treats every sample as 2D random variable, where a pixel contained in the sample is modelled as an independent and identically distributed random variable (i.i.d) realization. With this modelling we utilize information-theoretic and statistical measures of random variables to rank individual training samples and relationship between samples to construct syllabus. The rank of each sample is then used when the sample is fed to the network during training. Comparative evaluation of multiple state-of-the-art networks, including, ResNet, GoogleNet, and VGG, on benchmark datasets demonstrate a syllabus that ranks samples using measures such as Joint Entropy between adjacent samples, can improve learning and significantly reduce the amount of training steps required to achieve desirable training accuracy. We present results that indicate our approach can produce robust feature maps that in turn contribute to reduction of loss by as much as factors of 9 compared to conventional, no-curriculum, training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks (2017). 10 p.
Kim, T.-H., Choi, J.: ScreenerNet: learning self-paced curriculum for deep neural networks. arXiv:1801.00904 Cs, January 2018
Hinton, G.E.: To recognize shapes, first learn to generate images. Prog. Brain Res. 165, 535–547 (2007)
Zhou, H.-Y., Gao, B.-B., Wu, J.: Adaptive feeding: achieving fast and accurate detections by adaptively combining object detectors. arXiv:1707.06399 Cs, July 2017
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. arXiv:1704.03003 Cs, April 2017
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1189–1197. Curran Associates, Inc. (2010)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv:1511.05952 Cs, November 2015
Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. arXiv:1511.06343 Cs Math, November 2015
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. arXiv:1703.04730 Cs Stat, March 2017
Ghebrechristos, H., Alaghband, G.: Expediting training using information theory-based patch ordering algorithm (2018). 6 p.
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948). 55 p.
Bonev, B.I.: Feature selection based on information theory (2010). 200 p.
Feixas, M., Bardera, A., Rigau, J., Xu, Q., Sbert, M.: Information theory tools for image processing. Synthesis Lectures on Computer Graphics and Animation, vol. 6, no. 1, pp. 1–164 (2014)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2006). 774 p.
Horé, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM, pp. 2366–2369 (2010)
Deming, W.E., Morgan, S.L.: The Elements of Statistical Learning. Elsevier, Amsterdam (1993)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. arXiv:1605.08695 Cs, May 2016
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009). 60 p.
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv:1409.0575 Cs, September 2014
Szegedy, C., et al.: Going deeper with convolutions. arXiv:1409.4842 Cs, September 2014
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv:1512.00567 Cs, December 2015
ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://image-net.org/challenges/LSVRC/. Accessed 29 Apr 2017
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:14091556 Cs, September 2014
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:170404861 Cs, April 2017
TensorFlow: TensorFlow. https://www.tensorflow.org/. Accessed 14 Mar 2019
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995). 498 p.
Lang, K.J., Hinton, G.E.: Dimensionality reduction and prior knowledge in E-set recognition (1990). 8 p.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ghebrechristos, H., Alaghband, G. (2020). Information Theory-Based Curriculum Learning Factory to Optimize Training. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-41404-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)