Abstract
Detection and localization of text in photorealistic images is a difficult, and not yet completely solved, problem. We propose the approach to solving this problem based on the method of semantic image segmentation. In this interpretation, text characters are treated as objects to be segmented. In this paper proposes the network architecture for text localization, describes the procedure for the formation of the training set, and considers the algorithm for pre-processing images, reducing the amount of processed data and simplifying the segmentation of the object “background”. The network architecture is a modification of well-known DeepLabv3 network and takes into account the specifics of images of text characters. The proposed method is able to determine the location of text characters in the images with acceptable accuracy. Experimental results of assessing the quality of text localization by the IoU criterion (Intersection over Union) showed that the obtained accuracy is sufficient for further text recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.254
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: A fast text detector with a single deep neural network. In: 31st AAAI Conference on Artificial Intelligence, pp. 4161–4167. AAAI, San Francisco (2017)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE, Barcelona, Spain (2011). https://doi.org/10.1109/ICCV.2011.6126402
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Grishkin, V.: Document image segmentation based on wavelet features. In: 2015 Computer Science and Information Technologies (CSIT), pp. 82–84. IEEE, Yerevan, Armenia (2015). https://doi.org/10.1109/CSITechnol.2015.7358255
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE, San Francisco, CA (2010). https://doi.org/10.1109/CVPR.2010.5540041
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049. IEEE, Columbus, OH (2014). https://doi.org/10.1109/CVPR.2014.515
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. https://arxiv.org/abs/1707.08831. Accessed 20 Mar 2019
Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: An end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231. IEEE, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.242
Chen, L., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4545–4554. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.492
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. https://arxiv.org/abs/1412.7062. Accessed 10 Mar 2019
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587. Accessed 20 Mar 2019
Pascal VOC data set mirror. https://pjreddie.com/projects/pascal-voc-dataset-mirror/. Accessed 27 Feb 2019
Karatzas, D. et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE, Washington, DC (2013). https://doi.org/10.1109/ICDAR.2013.221
The Street View Text Dataset (SVT). http://tc11.cvc.uab.es/datasets/SVT_1. Accessed 3 Mar 2019
The Chars74K dataset. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/. Accessed 10 Feb 2019
Acknowledgment
The authors acknowledge Saint-Petersburg State University for a research grant 39417687.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Valery, G., Alexander, E., Oleg, I. (2019). Localization of Text in Photorealistic Images. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11622. Springer, Cham. https://doi.org/10.1007/978-3-030-24305-0_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-24305-0_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24304-3
Online ISBN: 978-3-030-24305-0
eBook Packages: Computer ScienceComputer Science (R0)