CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Arpita Dutta¹,
Samit Biswas¹ &
Amit Kumar Das¹

705 Accesses
12 Citations
Explore all metrics

Abstract

Most of the recent research works on comic document images have focused on the reading and distribution of comics digitally due to the evolution of technologies. In this work, the extraction of narrative text boxes and speech balloons, which contain the conversations among comic characters along with their feelings, is presented. Due to the huge variety of drawing styles, the shape of these speech balloons is complex, and extraction is difficult. We present a shape-aware dual-stream convolutional neural network for the segmentation of narrative text boxes and speech balloons of various shapes. In our dual-stream architecture, an added shape module processes edge information of the speech balloons and narrative texts with the main module. Later, the concatenation of these two modules produces more accurate segmentation of speech balloons and narrative text boxes. The proposed method achieves significant performance improvements in terms of both region accuracy (mIOU) and boundary accuracy (F-measure and Hausdorff distance) compared to other state-of-the-art methods on various publicly available comic datasets (namely eBDtheque, DCM and Manga 109 dataset subset) in different languages. In addition, we have developed a new dataset (BCBId) for comics in Bangla, the eighth most spoken language in the world, and propose a method for the development of ground-truth images in a semiautomatic way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-Independent Speech Balloon Segmentation for Comics and Manga

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

An R-CNN Based Method to Localize Speech Balloons in Comics

Notes

Codes and data are available at https://github.com/Arpi07/Arpi07-2/tree/Speech_balloon_segmentation.

References

BCBID: sites.google.com/view/banglacomicbookdataset. Accessed 8 Sept 2020
Christophe Rigaud|Gitlab. https://git.univ-lr.fr/u/crigau02. Accessed 7 Jan 2020
Digital Comic Museum. https://digitalcomicmuseum.com/. Accessed 29 May 2019
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. IJIP 4(6), 669–676 (2011)
Google Scholar
Augereau, O., Iwata, M., Kise, K.: An overview of comics research in computer science. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 54–59. IEEE (2017)
Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4(7), 87 (2018)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Cao, Y., Pang, X., Chan, A.B., Lau, R.W.: Dynamic manga: animating still manga via camera movement. IEEE Trans. Multimedia 19(1), 160–172 (2016)
Article Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)
Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. arXiv preprint arXiv:1902.08137 (2019)
Dubuisson, M.P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 566–568. IEEE (1994)
Dunst, A., Laubrock, J., Wildfeuer, J.: Empirical Comics Research: Digital, Multimodal, and Cognitive Methods. Routledge, Milton Park (2018)
Book Google Scholar
Dutta, A., Biswas, S.: CNN based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE (2019)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019)
Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149. IEEE (2013)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: DAS, 2012, pp. 424–428. IEEE (2012)
Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Article Google Scholar
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)
Article Google Scholar
Li, L., Wang, Y., Gao, L., Tang, Z., Suen, C.Y.: Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 299–308. IEEE (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2017)
Article Google Scholar
Matsui, Y., Yamasaki, T., Aizawa, K.: Interactive manga retargeting. In: SIGGRAPH Posters, p. 35 (2011)
Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Article Google Scholar
Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recognit. IJDAR 22(3), 265–284 (2019)
Article Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. arXiv:1803.08670 (2018)
Osserman, R., et al.: The isoperimetric inequality. Bull. Am. Math. Soc. 84(6), 1182–1238 (1978)
Article MathSciNet Google Scholar
Prewitt, J.M.: Object enhancement and extraction. Picture Process. Psychopictorics 10(1), 15–19 (1970)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6489 (2019)
Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: International Workshop on Graphics Recognition, pp. 133–147. Springer (2015)
Rigaud, C., Burie, J.C., Ogier, J.M., Karatzas, D., Van de Weijer, J.: An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2013)
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. IJDAR 18(3), 199–221 (2015)
Article Google Scholar
Rigaud, C., Le Thanh, N., Burie, J.C., Ogier, J.M., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355. IEEE (2015)
Rigaud, C., Nguyen, V., Burie, J.C.: Confidence criterion for speech balloon segmentation. In: 13th IAPR International Workshop on Graphics Recognition (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, W., Kise, K.: Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp. 1075–1079. IEEE (2011)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boca Raton (2008)
MATH Google Scholar
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87(6), 1370–1376 (2004)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS. Curran Associates (2014)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, India
Arpita Dutta, Samit Biswas & Amit Kumar Das

Authors

Arpita Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Samit Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpita Dutta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutta, A., Biswas, S. & Das, A.K. CNN-based segmentation of speech balloons and narrative text boxes from comic book page images. IJDAR 24, 49–62 (2021). https://doi.org/10.1007/s10032-021-00366-4

Download citation

Received: 13 February 2020
Revised: 03 March 2021
Accepted: 18 March 2021
Published: 21 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10032-021-00366-4

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text-Independent Speech Balloon Segmentation for Comics and Manga

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

An R-CNN Based Method to Localize Speech Balloons in Comics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text-Independent Speech Balloon Segmentation for Comics and Manga

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

An R-CNN Based Method to Localize Speech Balloons in Comics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation