Localization of Text in Photorealistic Images

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11622))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1755 Accesses

Abstract

Detection and localization of text in photorealistic images is a difficult, and not yet completely solved, problem. We propose the approach to solving this problem based on the method of semantic image segmentation. In this interpretation, text characters are treated as objects to be segmented. In this paper proposes the network architecture for text localization, describes the procedure for the formation of the training set, and considers the algorithm for pre-processing images, reducing the amount of processed data and simplifying the segmentation of the object “background”. The network architecture is a modification of well-known DeepLabv3 network and takes into account the specifics of images of text characters. The proposed method is able to determine the location of text characters in the images with acceptable accuracy. Experimental results of assessing the quality of text localization by the IoU criterion (Intersection over Union) showed that the obtained accuracy is sufficient for further text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Article 17 November 2020

TextPolar: irregular scene text detection using polar representation

Article 23 May 2021

Aggregating Local Context for Accurate Scene Text Detection

References

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.254
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: A fast text detector with a single deep neural network. In: 31st AAAI Conference on Artificial Intelligence, pp. 4161–4167. AAAI, San Francisco (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
Article MathSciNet Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE, Barcelona, Spain (2011). https://doi.org/10.1109/ICCV.2011.6126402
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Chapter Google Scholar
Grishkin, V.: Document image segmentation based on wavelet features. In: 2015 Computer Science and Information Technologies (CSIT), pp. 82–84. IEEE, Yerevan, Armenia (2015). https://doi.org/10.1109/CSITechnol.2015.7358255
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE, San Francisco, CA (2010). https://doi.org/10.1109/CVPR.2010.5540041
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049. IEEE, Columbus, OH (2014). https://doi.org/10.1109/CVPR.2014.515
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Chapter Google Scholar
Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. https://arxiv.org/abs/1707.08831. Accessed 20 Mar 2019
Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: An end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231. IEEE, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.242
Chen, L., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4545–4554. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.492
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. https://arxiv.org/abs/1412.7062. Accessed 10 Mar 2019
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587. Accessed 20 Mar 2019
Pascal VOC data set mirror. https://pjreddie.com/projects/pascal-voc-dataset-mirror/. Accessed 27 Feb 2019
Karatzas, D. et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE, Washington, DC (2013). https://doi.org/10.1109/ICDAR.2013.221
The Street View Text Dataset (SVT). http://tc11.cvc.uab.es/datasets/SVT_1. Accessed 3 Mar 2019
The Chars74K dataset. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/. Accessed 10 Feb 2019

Download references

Acknowledgment

The authors acknowledge Saint-Petersburg State University for a research grant 39417687.

Author information

Authors and Affiliations

Saint-Petersburg State University, St.Petersburg, 199034, Russia
Grishkin Valery, Ebral Alexander & Iakushkin Oleg

Authors

Grishkin Valery
View author publications
You can also search for this author in PubMed Google Scholar
Ebral Alexander
View author publications
You can also search for this author in PubMed Google Scholar
Iakushkin Oleg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grishkin Valery .

Editor information

Editors and Affiliations

Covenant University, Ota, Nigeria
Sanjay Misra
University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Saint Petersburg State University, Saint Petersburg , Russia
Vladimir Korkhov
Polytechnic University of Bari, Bari, Italy
Carmelo Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valery, G., Alexander, E., Oleg, I. (2019). Localization of Text in Photorealistic Images. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11622. Springer, Cham. https://doi.org/10.1007/978-3-030-24305-0_63

Download citation

DOI: https://doi.org/10.1007/978-3-030-24305-0_63
Published: 29 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24304-3
Online ISBN: 978-3-030-24305-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics