Abstract
Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this paper, we utilize and analyze the binarized neural network in doing human detection on infrared images. Our results show comparable algorithmic performance of binarized versus 32bit floating-point networks, with the added benefit of greatly simplified computation and reduced memory overhead. In addition, we present a system architecture designed specifically for computation using binary representation that achieves at least 4× speedup and the energy is improved by three orders of magnitude over GPU.
Similar content being viewed by others
References
Krizhevsky, A. et. al. (2012). ImageNet classification with deep convolutional neural networks. In: Proc Neural Information Processing Systems (NIPS), pp. 1097–1105.
Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013). Pedestrian detection with unsupervised multi-stage feature learning. In Proc Comput Vision Pattern Recog (CVPR), pp. 3626–3633, IEEE.
Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In Proc Int Conf Machine Learning (ICML), pp. 1017–1024.
Collobert, R., et al. (2011). Natural language processing (almost) from scratch. J. Machine Learning Research, 12, 2493–2537.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Vanhoucke, V., & Senior, A. (2011). Improving the speed of neural networks on CPUs. In Proc Deep Learn Unsupervised Feature Learn. NIPS: Workshop.
Kung, J., Kim, D. & Mukhopadhyay, S. (2015). A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses. In Int Symp Low Power Electron, Design (ISLPED), pp. 85–90.
Sarwar, S. S., Venkataramani, S., Raghunathan, A., & Roy, K. (2016). Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Proc Design Automat, Test in Europe (DATE), pp. 145–150.
Gong, Y., Liu, L., Yang, M. & Bourdev, L. (2014). Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115.
Han, S. et al. (2016). EIE: Efficient inference engine on compresses deep neural network. arXiv preprint arXiv:1602:01528.
Courbariaux, M. et al. (2016). Binarized neural networks: Training neural networks with weights and activations constrained to +1 or −1. arXiv preprint arXiv:1602.02830.
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-net: ImageNet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279.
Zhou, S. et al. (2016). DoReFa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the gap to human-level performance in face verification. In Proc Comput Vision Pattern Recog (CVPR), pp. 1701–1708, IEEE.
Coates, A. et al. (2013). Deep learning with COTS HPC systems. In Proc. Int. Conf. Machine Learning (ICML), pp. 1337–1345.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.
Srivastava, N., et al. (2014). Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learn. Research, 15, 1928–1958.
Wan, L. et al. (2013). Regularization of neural networks using DropConnect. In Proc Int Conf Machine Learning (ICML), pp. 1058–1066.
Chai, S., et al. (2016). Low precision neural network using subband decomposition. In Cognitive Architecture (CogArch).
Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus R. (2014). Training convolutional networks with noisy labels. arXiv:1406.2080.
Wu, Z., Fuller, N., Theriault, D., & Betke, M. (2014). A thermal infrared video benchmark for visual analysis. In Proc IEEE Workshop on Perception Beyond the Visible Spectrum (PBVS).
Zhang, D. et al. (2016). Unsupervised underwater fish detection fusing flow and objectiveness. In Proc Winter Appl Comput Vision Workshops (WACVW), pp. 1–7.
van de Sande, K. E. A., Uijlings, J. R. R., Gevers, T., & Smeulders, A. W. M. (2011). Segmentation as selective search for object recognition. In Proc Int Conf Comput Vision (ICCV).
Horowitz, M. Energy table for 45nm process. Stanford VLSI wiki. https://sites.google.com/site/seecproject/energy-table.
Synopsys 32/28nm generic library. https://www.synopsys.com/.
Acknowledgments
This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory (AFRL), and NSF (#1526399). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kung, J., Zhang, D., van der Wal, G. et al. Efficient Object Detection Using Embedded Binarized Neural Networks. J Sign Process Syst 90, 877–890 (2018). https://doi.org/10.1007/s11265-017-1255-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-017-1255-5