Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames
<p>Example of an Arabic video frame including scene and artificial texts (<b>a</b>). Decomposition of an Arabic word into characters (<b>b</b>).</p> "> Figure 2
<p>Typical samples from ICDAR2003 (<b>a</b>), MSRA-TD500 (<b>b</b>), NEOCR (<b>c</b>) and KAIST (<b>d</b>) datasets.</p> "> Figure 3
<p>Some examples of text detection systems [<a href="#B18-jimaging-04-00032" class="html-bibr">18</a>,<a href="#B19-jimaging-04-00032" class="html-bibr">19</a>,<a href="#B20-jimaging-04-00032" class="html-bibr">20</a>] showing the evolution of this area of research over ten years.</p> "> Figure 4
<p>Architecture of AcTiV 2.0 and statistics of the detection (D) and recognition (R) datasets.</p> "> Figure 5
<p>Typical video frames from AcTiV-D dataset. From left to right: Examples of RussiaToday Arabic, France24 Arabe, TunisiaNat1 (El Wataniya 1) and AljazeeraHD frames.</p> "> Figure 6
<p>Example of text images from AcTiV-R depicting typical characteristics of video text images.</p> "> Figure 7
<p>AcTiV-GT open-source tool displaying a labeled frame.</p> "> Figure 8
<p>A part of a global XML annotating a video sequence of Aljazeera TV. This figure contains ground-truth information about three text-boxes from a total of 17.</p> "> Figure 9
<p>Example of AcTiV 2.0 specific XML files: (<b>a</b>) a part of the detection XML file of France24 TV; and (<b>b</b>) a recognition ground-truth file and its corresponding textline image.</p> "> Figure 10
<p>AcTiV-D evaluation tool. The user can apply the evaluation procedure to the current frame “Evaluate CF” button or to all video frames “Evaluate All” button (<b>a</b>). The “Performance Value” button displays precision, recall and F-score values (<b>b</b>).</p> "> Figure 11
<p>Example of CRR and WRR computation based on output errors.</p> "> Figure 12
<p>Pipeline of the text detection algorithm. Two passes are performed, one for each text polarity (Dark text on Light background or Light text on Dark background).</p> ">
Abstract
:1. Introduction
- Text patterns variability: unknown font-size and font-family, different colors and alignment (even in the same TV channel).
- Background complexity: text-like objects in video frames, such as fences, bricks and signs, can be confused with text characters.
- Video quality: acquisition conditions, compression artifacts and low resolution.
2. Literature Review
3. Proposed Datasets
3.1. Data Characteristics and Statistics
- AcTiV-D represents a dataset of non-redundant frames used to build and evaluate methods for detecting text regions in HD/SD frames. A total of 4063 frames have been hand-selected with a particular attention to achieve a high diversity in depicted text regions. Figure 5 provides examples from AcTiV-D for typical problems in video text detection. To test the systems’ ability to locate texts under different situations, the proposed dataset includes some frames which contain the same text region but with different backgrounds and some others without any text component.
- AcTiV-R is a dataset of textline images that can be utilized to build and evaluate Arabic text recognition systems. Different fonts (more than 6), sizes, backgrounds, colors, contrasts and occlusions are represented in the dataset. Figure 6 illustrates typical examples from AcTiV-R. The collected text images cover a broad range of characteristics that distinguish video frames from scanned documents. AcTiV-R consists of 10,415 textline images, 44,583 words and 259,192 characters. To have an easily accessible representation of Arabic text, it is transformed into a set of Latin labels with a suffix that refers to the letter’s position in the word, _B: Begin, _M: Middle; _E: End; and _I: Isolate. An example is shown in Figure 1. During the annotation process, we have considered 164 Arabic character forms:
- -
- 125 letters, i.e., taking into account this “positioning” variability;
- -
- 15 additional characters, i.e., combined with the diacritic sign “Chadda”;
- -
- 10 digits; and
- -
- 14 punctuation marks including the white space.
The different character labels can be observed in Table 3. The same table gives for each character its frequency in the dataset.
3.2. Annotation Guidelines
- position: x, y, width and height.
- content: text strings, text color, background color, background type (transparent, opaque).
- Interval: apparition interval of the textline (Frame_S (Start), Frame_E (End)).
4. Evaluation Protocols and Metrics
4.1. Detection Protocols and Metrics
- Protocol 1 aims to measure the performance of single-frame based methods to detect texts in HD frames.
- Protocol 4 is similar to Protocol 1, differing only by the channel resolution. All SD (720 × 576) channels in our database can be targeted by this protocol which is split in four sub-protocols: three channel-dependent (Protocols 4.1, 4.2 and 4.3) and one channel-free (Protocol 4.4).
- Protocol 4bis is dedicated to the new added resolution (480 × 360) for the Tunisia Nat1 TV channel. The main idea of this protocol is to train a given system with SD (720 × 576) data i.e., Protocol 4.3 and test it with different data resolution and quality.
- Protocol 7 is the generic version of the previous protocols where text detection is evaluated regardless of data quality.
4.2. Recognition Protocols and Metrics
- Protocol 3 aims to evaluate the performance of OCR systems to recognize texts in HD frames.
- Protocol 6 is similar to Protocol 3, differing only by the channel resolution. All SD (720 × 576) channels in our dataset can be targeted by this protocol which is split in four sub-protocols: three channel-dependent (Protocols 6.1, 6.2 and 6.3) and one channel-free (Protocol 6.4).
- Protocol 6bis is dedicated to the new added resolution (480 × 360) for the Tunisia Nat1 TV channel. The main idea of this protocol is to train a given system with SD (720 × 576) data i.e., Protocol 6.3 and test it with different data resolution and quality.
- Protocol 9 is the generic version of Protocols 3 and 6 where text recognition is assessed without considering data quality.
5. Application of AcTiV Datasets
5.1. LADI Detector
- -
- Gradient direction dp is calculated, at each edge pixel p, which is roughly perpendicular to the stroke orientation.
- -
- A search ray starting from an edge pixel p along the gradient direction dp is shot until we find another edge pixel q. If these two edge pixels have nearly opposite gradient orientations, the ray is considered valid. All pixels inside this ray are labeled by the length .
5.2. SID OCR
5.3. Experimental Results
5.3.1. Comparison with Other Methods
- Pixel-color clustering using k-means to form pairs of thresholds for each RGB channel.
- Creation of binary map for each pair of thresholds.
- Extraction of CCs.
- Preliminary filtering according to “area stability” criterion.
- Second filtering based on a set of statistical and geometric rules.
- Horizontal merging of the remaining components to form textlines.
- Textline segmentation into words by thresholding gaps between CCs.
- Over-segmentation of characters into primitive segments.
- Character recognition using 64-dimensional feature vector of chain code histogram and the modified quadratic discriminant function.
- Word recognition by dynamic programming using total likelihood of characters as objective function.
- False word reduction by measuring the average of the character likelihoods in a word and comparing it to a predefined threshold.
5.3.2. Training with AcTiV 2.0
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Lu, T.; Palaiahnakote, S.; Tan, C.L.; Liu, W. Video Text Detection; Springer Publishing Company, Incorporated: London, UK, 2014. [Google Scholar]
- Ye, Q.; Doermann, D. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1480–1500. [Google Scholar] [CrossRef] [PubMed]
- Yin, X.C.; Zuo, Z.Y.; Tian, S.; Liu, C.L. Text Detection, Tracking and Recognition in Video: A Comprehensive Survey. IEEE Trans. Image Process. 2016, 25, 2752–2773. [Google Scholar] [CrossRef] [PubMed]
- Lienhart, R. Video OCR: A survey and practitioner’s guide. In Video Mining; Springer: Boston, MA, USA, 2003; pp. 155–183. [Google Scholar]
- Yang, H.; Quehl, B.; Sack, H. A framework for improved video text detection and recognition. Multimed. Tools Appl. 2014, 69, 217–245. [Google Scholar] [CrossRef]
- Poignant, J.; Bredin, H.; Le, V.B.; Besacier, L.; Barras, C.; Quénot, G. Unsupervised speaker identification using overlaid texts in TV broadcast. In Proceedings of the Interspeech 2012—Conference of the International Speech Communication Association, Portland, OR, USA, 9–13 September 2012; p. 4. [Google Scholar]
- Märgner, V.; El Abed, H. Guide to OCR for Arabic Scripts; Springer: Berlin, Germany, 2012. [Google Scholar]
- Touj, S.M.; Amara, N.E.B.; Amiri, H. Arabic Handwritten Words Recognition Based on a Planar Hidden Markov Model. Int. Arab J. Inf. Technol. 2005, 2, 318–325. [Google Scholar]
- Lorigo, L.M.; Govindaraju, V. Offline Arabic handwriting recognition: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 712–724. [Google Scholar] [CrossRef] [PubMed]
- Chammas, E.; Mokbel, C.; Likforman-Sulem, L. Arabic handwritten document preprocessing and recognition. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 451–455. [Google Scholar]
- Jamil, A.; Siddiqi, I.; Arif, F.; Raza, A. Edge-based features for localization of artificial Urdu text in video images. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1120–1124. [Google Scholar]
- Halima, M.B.; Karray, H.; Alimi, A.M. Arabic text recognition in video sequences. arXiv, 2013; arXiv:preprint/1308.3243. [Google Scholar]
- Yousfi, S.; Berrani, S.A.; Garcia, C. Arabic text detection in videos using neural and boosting-based approaches: Application to video indexing. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 3028–3032. [Google Scholar]
- Zayene, O.; Hennebert, J.; Touj, S.M.; Ingold, R.; Amara, N.E.B. A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 996–1000. [Google Scholar]
- Elagouni, K.; Garcia, C.; Mamalet, F.; Sébillot, P. Text recognition in videos using a recurrent connectionist approach. In Proceedings of the International Conference on Artificial Neural Networks, Lausanne, Switzerland, 11–14 September 2012; pp. 172–179. [Google Scholar]
- Khare, V.; Shivakumara, P.; Raveendran, P. A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video. Expert Syst. Appl. 2015, 42, 7627–7640. [Google Scholar] [CrossRef]
- Lucas, S.M.; Panaretos, A.; Sosa, L.; Tang, A.; Wong, S.; Young, R.; Ashida, K.; Nagai, H.; Okamoto, M.; Yamamoto, H.; et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. Int. J. Doc. Anal. Recognit. (IJDAR) 2005, 7, 105–122. [Google Scholar] [CrossRef]
- Lucas, S.M. ICDAR 2005 text locating competition results. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, Seoul, Korea, 31 August–1 September 2005; pp. 80–84. [Google Scholar]
- Huang, W.; Lin, Z.; Yang, J.; Wang, J. Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1241–1248. [Google Scholar]
- Zhu, Y.; Yao, C.; Bai, X. Scene text detection and recognition: Recent advances and future trends. Front. Comput. Sci. 2016, 10, 19–36. [Google Scholar] [CrossRef]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 2016, 116, 1–20. [Google Scholar] [CrossRef]
- Shahab, A.; Shafait, F.; Dengel, A. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1491–1496. [Google Scholar]
- Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. In Proceedings of the 2017 AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4161–4167. [Google Scholar]
- Karatzas, D.; Shafait, F.; Uchida, S.; Iwamura, M.; i Bigorda, L.G.; Mestre, S.R.; Mas, J.; Mota, D.F.; Almazan, J.A.; de las Heras, L.P. ICDAR 2013 robust reading competition. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1484–1493. [Google Scholar]
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on robust reading. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1156–1160. [Google Scholar]
- Yao, C.; Bai, X.; Liu, W.; Ma, Y.; Tu, Z. Detecting texts of arbitrary orientations in natural images. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 1083–1090. [Google Scholar]
- Liu, Z.; Li, Y.; Qi, X.; Yang, Y.; Nian, M.; Zhang, H.; Xiamixiding, R. Method for unconstrained text detection in natural scene image. IET Comput. Vis. 2017, 11, 596–604. [Google Scholar] [CrossRef]
- Wang, K.; Belongie, S. Word spotting in the wild. In Proceedings of the European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 591–604. [Google Scholar]
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2298–2304. [Google Scholar] [CrossRef] [PubMed]
- Mishra, A.; Alahari, K.; Jawahar, C. Unsupervised refinement of color and stroke features for text binarization. Int. J. Doc. Anal. Recognit. (IJDAR) 2017, 20, 105–121. [Google Scholar] [CrossRef]
- Lee, S.; Cho, M.S.; Jung, K.; Kim, J.H. Scene text extraction with edge constraint and text collinearity. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 3983–3986. [Google Scholar]
- Zhu, Y.; Zhang, K. Text segmentation using superpixel clustering. IET Image Process. 2017, 11, 455–464. [Google Scholar] [CrossRef]
- Nagy, R.; Dicker, A.; Meyer-Wegener, K. NEOCR: A configurable dataset for natural image text recognition. In Proceedings of the International Workshop on Camera-Based Document Analysis and Recognition, Beijing, China, 22 September 2011; pp. 150–163. [Google Scholar]
- Veit, A.; Matera, T.; Neumann, L.; Matas, J.; Belongie, S. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv, 2016; arXiv:preprint/1601.07140. [Google Scholar]
- Gomez, R.; Shi, B.; Gomez, L.; Numann, L.; Veit, A.; Matas, J.; Belongie, S.; Karatzas, D. ICDAR2017 Robust Reading Challenge on COCO-Text. In Proceedings of the 2017 International Conference on Document Analysis and Recognition, Kyoto, Japan, 10–15 November 2017; pp. 1435–1443. [Google Scholar]
- Ch’ng, C.K.; Chan, C.S. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition. In Proceedings of the 2017 International Conference on Document Analysis and Recognition, Kyoto, Japan, 10–15 November 2017; pp. 935–942. [Google Scholar]
- Pechwitz, M.; Maddouri, S.S.; Märgner, V.; Ellouze, N.; Amiri, H. IFN/ENIT-database of handwritten Arabic words. In Proceedings of the Colloque International Francophone sur l’Ecrit et le Document (CIFED), Hammamet, Tunisia, 21–23 October 2002; pp. 127–136. [Google Scholar]
- Mahmoud, S.A.; Ahmad, I.; Al-Khatib, W.G.; Alshayeb, M.; Parvez, M.T.; Märgner, V.; Fink, G.A. KHATT: An open Arabic offline handwritten text database. Pattern Recognit. 2014, 47, 1096–1112. [Google Scholar] [CrossRef]
- Slimane, F.; Ingold, R.; Kanoun, S.; Alimi, A.M.; Hennebert, J. A new arabic printed text image database and evaluation protocols. In Proceedings of the 2009 International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 946–950. [Google Scholar]
- Kherallah, M.; Tagougui, N.; Alimi, A.M.; El Abed, H.; Margner, V. Online Arabic handwriting recognition competition. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1454–1458. [Google Scholar]
- Halima, M.B.; Alimi, A.; Vila, A.F.; Karray, H. Nf-SAVO: Neuro-fuzzy system for arabic video OCR. arXiv, 2012; arXiv:preprint/1211.2150. [Google Scholar]
- Moradi, M.; Mozaffari, S. Hybrid approach for Farsi/Arabic text detection and localisation in video frames. IET Image Process. 2013, 7, 154–164. [Google Scholar] [CrossRef]
- Yousfi, S.; Berrani, S.A.; Garcia, C. Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition, Tunis, Tunisia, 23–26 August 2015; pp. 1026–1030. [Google Scholar]
- Yousfi, S.; Berrani, S.A.; Garcia, C. ALIF: A dataset for Arabic embedded text recognition in TV broadcast. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1221–1225. [Google Scholar]
- Zayene, O.; Hajjej, N.; Touj, S.M.; Ben Mansour, S.; Hennebert, J.; Ingold, R.; Amara, N.E.B. ICPR2016 Contest on Arabic Text Detection and Recognition in Video Frames AcTiVComp. In Proceedings of the 23th International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 187–191. [Google Scholar]
- Zayene, O.; Seuret, M.; Touj, S.M.; Hennebert, J.; Ingold, R.; Amara, N.E.B. Text detection in arabic news video based on SWT operator and convolutional auto-encoders. In Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 11–14 April 2016; pp. 13–18. [Google Scholar]
- Zayene, O.; Touj, S.M.; Hennebert, J.; Ingold, R.; Amara, N.E.B. Semi-automatic news video annotation framework for Arabic text. In Proceedings of the 2014 4th International Conference on Image Processing Theory, Tools and Applications, Paris, France, 14–17 October 2014; pp. 1–6. [Google Scholar]
- Zayene, O.; Touj, S.M.; Hennebert, J.; Ingold, R.; Amara, N.E.B. Data, protocol and algorithms for performance evaluation of text detection in Arabic news video. In Proceedings of the 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Monastir, Tunisia, 21–23 March 2016; pp. 258–263. [Google Scholar]
- Graves, A. Offline arabic handwriting recognition with multidimensional recurrent neural networks. In Guide to OCR for Arabic Scripts; Springer: Berlin, Germany, 2012; pp. 297–313. [Google Scholar]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 369–376. [Google Scholar]
- Zayene, O.; Essefi, S.A.; Amara, N.E.B. Arabic Video Text Recognition Based on Multi-Dimensional Recurrent Neural Networks. In Proceedings of the International Conference on Computer Systems and Applications (AICCSA), Hammamet, Tunisia, 30 October–3 November 2017; pp. 725–729. [Google Scholar]
- Gaddour, H.; Kanoun, S.; Vincent, N. A New Method for Arabic Text Detection in Natural Scene Image Based on the Color Homogeneity. In Proceedings of the International Conference on Image and Signal Processing, Trois-Rivières, QC, Canada, 30 May–1 June 2016; pp. 127–136. [Google Scholar]
- Iwata, S.; Ohyama, W.; Wakabayashi, T.; Kimura, F. Recognition and transition frame detection of Arabic news captions for video retrieval. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 4005–4010. [Google Scholar]
Dataset (Year) | Category | Source | Task | # of Images (Train/Test) | # of Text (Train/Test) | Script | Best Scores |
---|---|---|---|---|---|---|---|
ICDAR’03 [18] (2003) | Scene text | Camera | D/R | 509 (258/251) | 2276 (1110/1156) | English | 93.1% (R) |
KAIST [31] (2010) | Scene text | Camera, mobile phone | D/S | 3000 | >5000 | English, Korean | 88% (S) |
SVT [28] (2010) | Scene text | Google Street View | D/S/R | 350 (100/250) | 904 (257/647) | English | 80.8% (R) 90% (S) |
NEOCR [33] (2011) | Scene text | Camera | D/R | 659 | 5238 | Eight languages | |
ICDAR’11 [22] (2011) | Scene text | Camera | D/R | 485 | 1564 | English | 82% (D) |
MSRA-TD500 [26] (2012) | scene text | Camera | D | 500 (300/200) | _ | English, Chinese | 75% |
ICDAR’13 [24] (2013) | Scene text Artificial text Video scene | Camera Web Camera | D/S/R D/S/R D/T/R | 229/233 410/141 28 videos | 848/1095 3564/1439 _ | Spanish, French, English | |
ALIF [44] (2015) | Artificial text | Video frames | R | 6532 (4152/2199) | Arabic | 55.03% | |
COCO-Text [34] (2016) | Scene text | MS COCO dataset | D/R | 63,686 (43.6k/10k) | 173,000 | English | 67.16% (D) |
Total-Text [36] (2017) | Curved scene text | web | D/R | 1555 (1255/300) | 9330 (words) | English |
#Resolution | #Videos | #Frames | #Cropped Images | |
---|---|---|---|---|
AcTiV 1.0 | 2 | 80 | 1843 | - |
AcTiV 2.0 | 3 | 189 | 4063 | 10,415 |
Protocol | TV Channel | Training-Set 1 # Frames | Training-Set 2 # Frames | Test-Set 1 # Frames | Test-Set 2 # Frames | Closed-Set # Frames |
---|---|---|---|---|---|---|
1 | AlJazeeraHD | 337 | 610 | 87 | 196 | 103 |
4 | France24 | 331 | 600 | 80 | 170 | 104 |
Russia Today | 323 | 611 | 79 | 171 | 100 | |
TunisiaNat1 | 492 | 788 | 116 | 205 | 106 | |
All SD | 1146 | 1999 | 275 | 546 | 310 | |
4bis | TunisiaNat1+ | - | - | - | 149 | 150 |
7 | All | 1483 | 2609 | 362 | 891 | 563 |
Protocol | TV Channel | Training-Set | Test-Set | Closed Test-Set | ||||||
---|---|---|---|---|---|---|---|---|---|---|
#Lns | #Wds | #Chars | #Lns | #Wds | #Chars | #Lns | #Wds | #Chars | ||
3 | AlJazeeraHD | 1909 | 8110 | 46,563 | 196 | 766 | 4343 | 262 | 1082 | 6283 |
6 | France24 | 1906 | 5683 | 32,085 | 179 | 667 | 3835 | 191 | 734 | 4600 |
Russia Today | 2127 | 13,462 | 78,936 | 250 | 1483 | 8749 | 256 | 1598 | 9305 | |
TunisiaNat1 | 2001 | 9338 | 54,809 | 189 | 706 | 4087 | 221 | 954 | 5597 | |
All SD | 6034 | 28,483 | 165,830 | 618 | 2856 | 16,671 | 668 | 3286 | 19,502 | |
6bis | TunisiaNat1+ | - | - | - | 320 | 1487 | 8726 | 311 | 1148 | 6645 |
9 | All | 7943 | 36,593 | 212,393 | 814 | 3622 | 21,014 | 930 | 4368 | 25,785 |
Parameters | Values |
---|---|
MDLSTM Size | 2, 10 and 50 |
Feed-forward Size | 6 and 20 |
InputBlock Size | 2 × 4 |
HiddenBlock Sizes | 1 × 4 and 1 × 4 |
Learn rate | 10−4 |
Momentum | 0.9 |
Protocol | System | Precision | Recall | Fmeasure |
---|---|---|---|---|
1 | LADI [46] | 0.86 | 0.84 | 0.85 |
SysA [14] | 0.77 | 0.76 | 0.76 | |
Gaddo [52] | 0.52 | 0.49 | 0.51 | |
4.1 | LADI [46] | 0.74 | 0.76 | 0.75 |
SysA [14] | 0.69 | 0.6 | 0.64 | |
Gaddo [52] | 0.47 | 0.61 | 0.54 | |
4.2 | LADI [46] | 0.8 | 0.75 | 0.77 |
SysA [14] | 0.66 | 0.55 | 0.6 | |
Gaddo [52] | 0.41 | 0.5 | 0.45 | |
4.3 | LADI [46] | 0.85 | 0.82 | 0.83 |
SysA [14] | 0.68 | 0.71 | 0.69 | |
Gaddo [52] | 0.34 | 0.49 | 0.41 | |
4.4 | LADI [46] | 0.71 | 0.76 | 0.73 |
SysA [14] | 0.5 | 0.49 | 0.49 | |
Gaddo [52] | - | - | - |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zayene, O.; Masmoudi Touj, S.; Hennebert, J.; Ingold, R.; Essoukri Ben Amara, N. Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames. J. Imaging 2018, 4, 32. https://doi.org/10.3390/jimaging4020032
Zayene O, Masmoudi Touj S, Hennebert J, Ingold R, Essoukri Ben Amara N. Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames. Journal of Imaging. 2018; 4(2):32. https://doi.org/10.3390/jimaging4020032
Chicago/Turabian StyleZayene, Oussama, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, and Najoua Essoukri Ben Amara. 2018. "Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames" Journal of Imaging 4, no. 2: 32. https://doi.org/10.3390/jimaging4020032