Abstract
In recent years, convolutional neural networks (CNNs) as important parts of deep neural networks (DNNs) have achieved great successes in the field of computer vision. However, Convolution always takes much computation time in the DNNs. In order to improve the efficiency of CNNs, many solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, different from traditional GPU-based algorithms, a novel algorithm based on look-up table is proposed to speed up the CNNs with small filters by applying GPU. By transforming complex matrix multiplications operations in the convolution computation to some table-based simple summation operations, the overhead of convolution computation can be considerably reduced. The process of creating a table and looking up values in the table is very appropriate for parallelization on a GPU. The experimental results show that the proposed approach can improve the speed of convolution computation by 20–30 %, compared with existing state-of-the-art works with less accuracy loss.














Similar content being viewed by others
References
Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.
Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., & Graf, H. P. (2009). A massively parallel coprocessor for convolutional neural networks. In Proceedings of 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09) (pp. 53–60).
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1097–1105).
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., & Ng, A. (2012). Large scale distributed deep networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1223–1231).
Szegedy, C., Liu, W., Jia, Y., Semanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going Deeper with Convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15) (pp. 1–9).
Simonyan, K., & Zisserman A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Deng, C. W., Huang, G. B., Xu, J., & Tang, J. X. (2015). Extreme learning machines: new trends and applications. Science China Information Sciences, 58(2), 1–16.
Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.
Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multi-core embedded systems with inaccurate information. Journal of System Architecture, 57(9), 840–849.
Krizhevsky, A. (2014). Cudaconvnet2, available in https://code.google.com/p/cuda-convnet2/
Jia, Y., Shelhamer, E., Donahue, J., Karayev, Long, S., J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (Multimedia’14) (pp. 675–678).
Mathieu, M., Henaff, M., & LeCun, Y. (2013). Fast training of convolutional networks through FFTs. arXiv preprint arXiv:1312.5851.
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. (2013). Predicting parameters in deep learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13) (pp. 2148–2156).
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of Advances in Neural Information Processing Systems (NIPS’14) (pp. 1269–1277).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Peled, A., & Liu, B. (1974). A new hardware realization of digital filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(6), 456–462.
Krizhevsky, A., Nair, V., & Hinton, G. (2009). CIFAR-10 and CIFAR-100 datasets, available in http://www.cs.toronto.edu/~kriz/cifar.html.
Li, F., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.
Acknowledgments
This work is supported by National Natural Science Foundation of China under grant No.61133008, National High-tech Research and Development Program of China (863 Program) under grant No.2012AA010905, and Scientific Research Foundation of Ministry of Education of China-China Mobile under grant No.MCM20122041. We gratefully acknowledge the support of NVIDIA Corporation with the excellent GPU Titan Z used for this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, W., Chen, Y., Jin, H. et al. A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters. J Sign Process Syst 86, 313–325 (2017). https://doi.org/10.1007/s11265-016-1129-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-016-1129-2