A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Wenbin Jiang¹,
Yiming Chen¹,
Hai Jin¹,
Ran Zheng¹ &
…
Ye Chi¹

630 Accesses
Explore all metrics

Abstract

In recent years, convolutional neural networks (CNNs) as important parts of deep neural networks (DNNs) have achieved great successes in the field of computer vision. However, Convolution always takes much computation time in the DNNs. In order to improve the efficiency of CNNs, many solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, different from traditional GPU-based algorithms, a novel algorithm based on look-up table is proposed to speed up the CNNs with small filters by applying GPU. By transforming complex matrix multiplications operations in the convolution computation to some table-based simple summation operations, the overhead of convolution computation can be considerably reduced. The process of creating a table and looking up values in the table is very appropriate for parallelization on a GPU. The experimental results show that the proposed approach can improve the speed of convolution computation by 20–30 %, compared with existing state-of-the-art works with less accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Accelerating CNNs Using Optimized Scheduling Strategy

References

Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.
Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., & Graf, H. P. (2009). A massively parallel coprocessor for convolutional neural networks. In Proceedings of 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09) (pp. 53–60).
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1097–1105).
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., & Ng, A. (2012). Large scale distributed deep networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1223–1231).
Szegedy, C., Liu, W., Jia, Y., Semanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going Deeper with Convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15) (pp. 1–9).
Simonyan, K., & Zisserman A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Deng, C. W., Huang, G. B., Xu, J., & Tang, J. X. (2015). Extreme learning machines: new trends and applications. Science China Information Sciences, 58(2), 1–16.
Article Google Scholar
Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.
Article Google Scholar
Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multi-core embedded systems with inaccurate information. Journal of System Architecture, 57(9), 840–849.
Article Google Scholar
Krizhevsky, A. (2014). Cudaconvnet2, available in https://code.google.com/p/cuda-convnet2/
Jia, Y., Shelhamer, E., Donahue, J., Karayev, Long, S., J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (Multimedia’14) (pp. 675–678).
Mathieu, M., Henaff, M., & LeCun, Y. (2013). Fast training of convolutional networks through FFTs. arXiv preprint arXiv:1312.5851.
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. (2013). Predicting parameters in deep learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13) (pp. 2148–2156).
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of Advances in Neural Information Processing Systems (NIPS’14) (pp. 1269–1277).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Peled, A., & Liu, B. (1974). A new hardware realization of digital filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(6), 456–462.
Article Google Scholar
Krizhevsky, A., Nair, V., & Hinton, G. (2009). CIFAR-10 and CIFAR-100 datasets, available in http://www.cs.toronto.edu/~kriz/cifar.html.
Li, F., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.
Article Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under grant No.61133008, National High-tech Research and Development Program of China (863 Program) under grant No.2012AA010905, and Scientific Research Foundation of Ministry of Education of China-China Mobile under grant No.MCM20122041. We gratefully acknowledge the support of NVIDIA Corporation with the excellent GPU Titan Z used for this research.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Wenbin Jiang, Yiming Chen, Hai Jin, Ran Zheng & Ye Chi

Authors

Wenbin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Ran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ye Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbin Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, W., Chen, Y., Jin, H. et al. A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters. J Sign Process Syst 86, 313–325 (2017). https://doi.org/10.1007/s11265-016-1129-2

Download citation

Received: 01 November 2015
Revised: 28 December 2015
Accepted: 28 February 2016
Published: 28 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11265-016-1129-2

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Accelerating CNNs Using Optimized Scheduling Strategy

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Accelerating CNNs Using Optimized Scheduling Strategy

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now