Abstract
Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Akansu, A. N., Haddad, P. A., Haddad, R. A., & Haddad, P. R. (2001). Multiresolution signal decomposition: Transforms, subbands, and wavelets. New York: Academic Press.
Akansu, A. N., & Liu, Y. (1991). On-signal decomposition techniques. Optical Engineering, 30(7), 912–921.
Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1611.01704
Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. Technical Report. arXiv preprint arXiv:1802.01436
Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2019). Neural image compression via non-local attention optimization and improved context modeling. Technical Report. arXiv preprint arXiv:1910.06244
Christopoulos, C., Skodras, A., & Ebrahimi, T. (2000). The JPEG2000 still image coding system: An overview. IEEE Transactions on Consumer Electronics, 46(4), 1103–1127.
Dejean-Servières, M., Desnos, K., Abdelouahab, K., Hamidouche, W., Morin, L., & Pelcat, M. (2017). Study of the impact of standard image compression techniques on performance of image classification with a convolutional neural network. Technical Report. hal-01725126. https://hal.archives-ouvertes.fr/hal-01725126
Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In International conference on quality of multimedia experience (pp. 1–6). IEEE.
Duan, L., Liu, J., Yang, W., Huang, T., & Gao, W. (2020). Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing, 29, 8680–8695.
Gomez, A. N., Ren, M., Urtasun, R., & Grosse, R. B. (2017). The reversible residual network: Backpropagation without storing activations. In: Advances in neural information processing systems (pp. 2214–2224).
Goutsias, J., & Heijmans, H. J. (2000). Nonlinear multiresolution signal decomposition schemes (I) Morphological pyramids. IEEE Transactions on Image Processing, 9(11), 1862–1876.
He, C., Shi, Z., Qu, T., Wang, D., & Liao, M. (2019). Lifting scheme-based deep neural network for remote sensing scene classification. Remote Sensing, 11(22), 2648.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Heijmans, H. J., & Goutsias, J. (2000). Nonlinear multiresolution signal decomposition schemes (II) Morphological wavelets. IEEE Transactions on Image Processing, 9(11), 1897–1913.
Hu, Y., Yang, S., Yang, W., Duan, L. Y., & Liu, J. (2020). Towards coding for human and machine vision: A scalable image coding approach. In ICME (pp. 1–6). IEEE.
Huang, G., Liu, Z., Van Der Maaten L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).
Jacobsen, J. H., Smeulders, A., & Oyallon, E. (2018). i-Revnet: Deep invertible networks. Technical Report. arXiv preprint arXiv:1802.07088
Johnston, P., Elyan, E., & Jayne, C. (2018). Spatial effects of video compression on classification in convolutional neural networks. In IJCNN (pp. 1–8).
Kwaśnicka, H., & Jain, L. C. (2018). Bridging the semantic gap in image and video analysis. Berlin: Springer.
Latif, A., Rasheed, A., Sajid, U., Ahmed, J., Ali, N., Ratyal, N. I., et al. (2019). Content-based image retrieval and feature extraction: A comprehensive review. Mathematical Problems in Engineering, 2019, 1–21.
Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1809.10452
Li, M., Zhang, K., Zuo, W., Timofte, R., & Zhang, D. (2020). Learning context-based non-local entropy modeling for image compression. Technical Report. arXiv preprint arXiv:2005.04661
Lo, S. C., Li, H., & Freedman, M. T. (2003). Optimization of wavelet decomposition for image compression and feature preservation. IEEE Transactions on Medical Imaging, 22(9), 1141–1151.
Ma, H., Liu, D., Xiong, R., & Wu, F. (2019). iWave: CNN-based wavelet-like transform for image compression. IEEE Transactions on Multimedia, 22, 1667–1679.
Ma, H., Liu, D., Yan, N., Li, H., & Wu, F. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2020.3026003.
Ma, S., Zhang, X., Wang, S., Zhang, X., Jia, C., & Wang, S. (2018). Joint feature and texture coding: Toward smart video representation via front-end intelligence. IEEE Transactions on Circuits and Systems for Video Technology, 29(10), 3095–3105.
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. Technical Report. arXiv preprint arXiv:1306.5151.
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.
Marpe, D., Schwarz, H., & Wiegand, T. (2003). Context-based adaptive binary arithmetic coding in the h.264/AVC video compression standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 620–636.
Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (pp. 10771–10780).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer A. (2017). Automatic differentiation in PyTorch. Technical report. OpenReview.net, https://openreview.net/forum?id=BJJsrmfCZ.
Poyser, M., Atapour-Abarghouei, A., & Breckon, T. P. (2020). On the impact of lossy image and video compression on the performance of deep convolutional neural network architectures. Technical Report. arXiv preprint arXiv:2007.14314.
Rodriguez, M. X. B., Gruson, A., Polania, L., Fujieda, S., Prieto, F., Takayama, K., & Hachisuka, T. (2020). Deep adaptive wavelet network. In IEEE Winter conference on applications of computer vision (pp. 3111–3119).
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. Technical Report. arXiv preprint arXiv:1706.05098
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. Technical Report. arXiv preprint arXiv:1703.00810
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Technical Report. arXiv preprint arXiv:1409.1556.
Sweldens, W. (1998). The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, 29(2), 511–546.
Taubman, D. (2000). High performance scalable image compression with EBCOT. IEEE Transactions on Image Processing, 9(7), 1158–1170.
Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. Technical Report. arXiv preprint arXiv:physics/0004057.
Tishby, N., & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. In IEEE information theory workshop (pp. 1–5).
Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Technical Report. arXiv preprint arXiv:1511.06085
Torfason, R., Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., & Van Gool, L. (2018). Towards image understanding from deep compression without decoding. Technical Report. arXiv preprint arXiv:1803.06131
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report. CNS-TR-2011-001, California Institute of Technology.
Wang, S., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2019). Scalable facial image compression with deep feature reconstruction. In ICIP (pp. 2691–2695). IEEE.
Xia, S., Liang, K., Yang, W., Duan, L. Y., & Liu, J. (2020). An emerging coding paradigm VCM: A scalable coding approach beyond feature and signal. In ICME (pp. 1–6). IEEE
Yan, N., Liu, D., Li, H., & Wu, F. (2020). Semantically scalable image coding with compression of feature maps. In ICIP, IEEE (pp. 3114–3118).
Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2016). A joint compression scheme of video feature descriptors and visual content. IEEE Transactions on Image Processing, 26(2), 633–647.
Zhao, J., Peng, Y., & He, X. (2020). Attribute hierarchy based multi-task learning for fine-grained image classification. Neurocomputing, 395, 150–159.
Zhao, Z. Q., Zheng, P., & Xu St, Wu X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.
Zhou, L., Sun, Z., Wu, X., & Wu, J. (2019). End-to-end optimized image compression with attention mechanism. In CVPR workshops (pp. 1–4).
Acknowledgements
This work was supported by the National Key Research and Development Program of China under Grant 2018YFA0701603, by the Natural Science Foundation of China under Grant 61772483, and by the Fundamental Research Funds for the Central Universities under Contract WK3490000005. We acknowledge the support of the GPU cluster built by MCC Lab of the School of Information Science and Technology of USTC.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Dong Xu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, K., Liu, D., Li, L. et al. Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations. Int J Comput Vis 129, 2605–2621 (2021). https://doi.org/10.1007/s11263-021-01491-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01491-7