Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

Kang Liu¹,
Dong Liu ORCID: orcid.org/0000-0001-9100-2906¹,
Li Li¹,
Ning Yan¹ &
…
Houqiang Li¹

2910 Accesses
4 Altmetric
Explore all metrics

Abstract

Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end optimized image compression with the frequency-oriented transform

Article 07 February 2024

Optimizing Image Compression via Joint Learning with Denoising

Wavelet-based self-supervised learning for multi-scene image fusion

Article 26 April 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://bellard.org/bpg/.

References

Akansu, A. N., Haddad, P. A., Haddad, R. A., & Haddad, P. R. (2001). Multiresolution signal decomposition: Transforms, subbands, and wavelets. New York: Academic Press.
MATH Google Scholar
Akansu, A. N., & Liu, Y. (1991). On-signal decomposition techniques. Optical Engineering, 30(7), 912–921.
Article Google Scholar
Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1611.01704
Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. Technical Report. arXiv preprint arXiv:1802.01436
Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.
Article Google Scholar
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2019). Neural image compression via non-local attention optimization and improved context modeling. Technical Report. arXiv preprint arXiv:1910.06244
Christopoulos, C., Skodras, A., & Ebrahimi, T. (2000). The JPEG2000 still image coding system: An overview. IEEE Transactions on Consumer Electronics, 46(4), 1103–1127.
Article Google Scholar
Dejean-Servières, M., Desnos, K., Abdelouahab, K., Hamidouche, W., Morin, L., & Pelcat, M. (2017). Study of the impact of standard image compression techniques on performance of image classification with a convolutional neural network. Technical Report. hal-01725126. https://hal.archives-ouvertes.fr/hal-01725126
Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In International conference on quality of multimedia experience (pp. 1–6). IEEE.
Duan, L., Liu, J., Yang, W., Huang, T., & Gao, W. (2020). Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing, 29, 8680–8695.
Article Google Scholar
Gomez, A. N., Ren, M., Urtasun, R., & Grosse, R. B. (2017). The reversible residual network: Backpropagation without storing activations. In: Advances in neural information processing systems (pp. 2214–2224).
Goutsias, J., & Heijmans, H. J. (2000). Nonlinear multiresolution signal decomposition schemes (I) Morphological pyramids. IEEE Transactions on Image Processing, 9(11), 1862–1876.
Article MathSciNet Google Scholar
He, C., Shi, Z., Qu, T., Wang, D., & Liao, M. (2019). Lifting scheme-based deep neural network for remote sensing scene classification. Remote Sensing, 11(22), 2648.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Heijmans, H. J., & Goutsias, J. (2000). Nonlinear multiresolution signal decomposition schemes (II) Morphological wavelets. IEEE Transactions on Image Processing, 9(11), 1897–1913.
Article MathSciNet Google Scholar
Hu, Y., Yang, S., Yang, W., Duan, L. Y., & Liu, J. (2020). Towards coding for human and machine vision: A scalable image coding approach. In ICME (pp. 1–6). IEEE.
Huang, G., Liu, Z., Van Der Maaten L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).
Jacobsen, J. H., Smeulders, A., & Oyallon, E. (2018). i-Revnet: Deep invertible networks. Technical Report. arXiv preprint arXiv:1802.07088
Johnston, P., Elyan, E., & Jayne, C. (2018). Spatial effects of video compression on classification in convolutional neural networks. In IJCNN (pp. 1–8).
Kwaśnicka, H., & Jain, L. C. (2018). Bridging the semantic gap in image and video analysis. Berlin: Springer.
Book Google Scholar
Latif, A., Rasheed, A., Sajid, U., Ahmed, J., Ali, N., Ratyal, N. I., et al. (2019). Content-based image retrieval and feature extraction: A comprehensive review. Mathematical Problems in Engineering, 2019, 1–21.
Article Google Scholar
Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1809.10452
Li, M., Zhang, K., Zuo, W., Timofte, R., & Zhang, D. (2020). Learning context-based non-local entropy modeling for image compression. Technical Report. arXiv preprint arXiv:2005.04661
Lo, S. C., Li, H., & Freedman, M. T. (2003). Optimization of wavelet decomposition for image compression and feature preservation. IEEE Transactions on Medical Imaging, 22(9), 1141–1151.
Article Google Scholar
Ma, H., Liu, D., Xiong, R., & Wu, F. (2019). iWave: CNN-based wavelet-like transform for image compression. IEEE Transactions on Multimedia, 22, 1667–1679.
Article Google Scholar
Ma, H., Liu, D., Yan, N., Li, H., & Wu, F. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2020.3026003.
Article Google Scholar
Ma, S., Zhang, X., Wang, S., Zhang, X., Jia, C., & Wang, S. (2018). Joint feature and texture coding: Toward smart video representation via front-end intelligence. IEEE Transactions on Circuits and Systems for Video Technology, 29(10), 3095–3105.
Article Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. Technical Report. arXiv preprint arXiv:1306.5151.
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.
Article Google Scholar
Marpe, D., Schwarz, H., & Wiegand, T. (2003). Context-based adaptive binary arithmetic coding in the h.264/AVC video compression standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 620–636.
Article Google Scholar
Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (pp. 10771–10780).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer A. (2017). Automatic differentiation in PyTorch. Technical report. OpenReview.net, https://openreview.net/forum?id=BJJsrmfCZ.
Poyser, M., Atapour-Abarghouei, A., & Breckon, T. P. (2020). On the impact of lossy image and video compression on the performance of deep convolutional neural network architectures. Technical Report. arXiv preprint arXiv:2007.14314.
Rodriguez, M. X. B., Gruson, A., Polania, L., Fujieda, S., Prieto, F., Takayama, K., & Hachisuka, T. (2020). Deep adaptive wavelet network. In IEEE Winter conference on applications of computer vision (pp. 3111–3119).
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. Technical Report. arXiv preprint arXiv:1706.05098
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. Technical Report. arXiv preprint arXiv:1703.00810
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Technical Report. arXiv preprint arXiv:1409.1556.
Sweldens, W. (1998). The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, 29(2), 511–546.
Article MathSciNet Google Scholar
Taubman, D. (2000). High performance scalable image compression with EBCOT. IEEE Transactions on Image Processing, 9(7), 1158–1170.
Article MathSciNet Google Scholar
Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. Technical Report. arXiv preprint arXiv:physics/0004057.
Tishby, N., & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. In IEEE information theory workshop (pp. 1–5).
Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Technical Report. arXiv preprint arXiv:1511.06085
Torfason, R., Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., & Van Gool, L. (2018). Towards image understanding from deep compression without decoding. Technical Report. arXiv preprint arXiv:1803.06131
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report. CNS-TR-2011-001, California Institute of Technology.
Wang, S., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2019). Scalable facial image compression with deep feature reconstruction. In ICIP (pp. 2691–2695). IEEE.
Xia, S., Liang, K., Yang, W., Duan, L. Y., & Liu, J. (2020). An emerging coding paradigm VCM: A scalable coding approach beyond feature and signal. In ICME (pp. 1–6). IEEE
Yan, N., Liu, D., Li, H., & Wu, F. (2020). Semantically scalable image coding with compression of feature maps. In ICIP, IEEE (pp. 3114–3118).
Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2016). A joint compression scheme of video feature descriptors and visual content. IEEE Transactions on Image Processing, 26(2), 633–647.
Article MathSciNet Google Scholar
Zhao, J., Peng, Y., & He, X. (2020). Attribute hierarchy based multi-task learning for fine-grained image classification. Neurocomputing, 395, 150–159.
Article Google Scholar
Zhao, Z. Q., Zheng, P., & Xu St, Wu X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.
Article Google Scholar
Zhou, L., Sun, Z., Wu, X., & Wu, J. (2019). End-to-end optimized image compression with attention mechanism. In CVPR workshops (pp. 1–4).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China under Grant 2018YFA0701603, by the Natural Science Foundation of China under Grant 61772483, and by the Fundamental Research Funds for the Central Universities under Contract WK3490000005. We acknowledge the support of the GPU cluster built by MCC Lab of the School of Information Science and Technology of USTC.

Author information

Authors and Affiliations

CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, University of Science and Technology of China, Hefei, 230027, China
Kang Liu, Dong Liu, Li Li, Ning Yan & Houqiang Li

Authors

Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Ning Yan
View author publications
You can also search for this author in PubMed Google Scholar
Houqiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dong Liu or Houqiang Li.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, K., Liu, D., Li, L. et al. Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations. Int J Comput Vis 129, 2605–2621 (2021). https://doi.org/10.1007/s11263-021-01491-7

Download citation

Received: 20 December 2020
Accepted: 09 June 2021
Published: 22 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11263-021-01491-7

Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end optimized image compression with the frequency-oriented transform

Optimizing Image Compression via Joint Learning with Denoising

Wavelet-based self-supervised learning for multi-scene image fusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end optimized image compression with the frequency-oriented transform

Optimizing Image Compression via Joint Learning with Denoising

Wavelet-based self-supervised learning for multi-scene image fusion

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now