Jihad El-Sana

Ben Gurion University of the Negev, Computer Science, Faculty Member

Followers

Following

Public Views

Researcher of Computer Science, interested in Image Processing, Computer Graphics, Computer Vision, Augmented Reality, and Historical Document Images
Address: Israel

less

InterestsView All (17)

Uploads

Papers by Jihad El-Sana

Writer Identification for Historical Arabic Documents

2014 22nd International Conference on Pattern Recognition, 2014

Download

Text line segmentation for gray scale historical document images

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, 2011

Download

SHREC'10 track: Robust shape retrieval

Eurographics Workshop on 3D Object Retrieval, EG 3DOR, 2010

Download

Is a Deep Learning Algorithm Effective for the Classification of Medieval Hebrew Scripts?

Jewish Studies in the Digital Age

Download

Learning Free Line Detection in Manuscripts using Distance Transform Graph

2019 International Conference on Document Analysis and Recognition (ICDAR), 2019

We present a fully automated learning free method, for line detection in manuscripts. We begin by... more We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.

View-Dependent Rendering for Polygonal Datasets

Text line extraction using fully convolutional network and energy minimization

Text lines are important parts of handwritten document images and easier to be analyzed by furthe... more Text lines are important parts of handwritten document images and easier to be analyzed by further applications. Despite recent progress in text line detection, text line extraction from a handwritten document remains an unsolved task. This paper proposes to use a fully convolutional network for text line detection and energy minimization for text line extraction. Detected text lines are represented by blob lines that strike through the text lines. These blob lines assist an energy function for text line extraction. The detection stage can locate arbitrarily oriented text lines. Furthermore, the extraction stage is capable of finding out the pixels of text lines with various heights and interline proximity independent of their orientations. Besides, it can finely split the touching and overlapping text lines without an orientation assumption. We evaluate the proposed method on VML-AHTE, VML-MOC and Diva-HisDB datasets. The first contains overlapping, touching and close text lines wi...

Download

Unsupervised Learning of Text Line Segmentation by Differentiating Coarse Patterns

Document Analysis and Recognition – ICDAR 2021, 2021

Download

Engaging Students in Covariational Reasoning within an Augmented Reality Environment

Augmented Reality in Educational Settings, 2019

Segmentation-Free Online Arabic Handwriting Recognition

International Journal of Pattern Recognition and Artificial Intelligence, 2011

Arabic script is naturally cursive and unconstrained and, as a result, an automatic recognition o... more Arabic script is naturally cursive and unconstrained and, as a result, an automatic recognition of its handwriting is a challenging problem. The analysis of Arabic script is further complicated in comparison to Latin script due to obligatory dots/stokes that are placed above or below most letters. In this paper, we introduce a new approach that performs online Arabic word recognition on a continuous word-part level, while performing training on the letter level. In addition, we appropriately handle delayed strokes by first detecting them and then integrating them into the word-part body. Our current implementation is based on Hidden Markov Models (HMM) and correctly handles most of the Arabic script recognition difficulties. We have tested our implementation using various dictionaries and multiple writers and have achieved encouraging results for both writer-dependent and writer-independent recognition.

Download

Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents

Lecture Notes in Computer Science, 2014

Download

Hierarchical On-line Arabic Handwriting Recognition

2009 10th International Conference on Document Analysis and Recognition, 2009

Download

Statistics Arabic Language Statistics for Efficient Script Recognition

Arabic script is naturally cursive and unconstrained. As a result, automatic recognition of its h... more Arabic script is naturally cursive and unconstrained. As a result, automatic recognition of its handwriting is a challenging problem. In comparison to Latin script, analysis of Arabic script is further complicated by both cursiveness and the obligatory dots or stokes that are placed above or below most letters. The naturally inherited cursiveness and the large number and positions of additional strokes discourage the segmentation-free approach of analysis because of the anticipated huge number of combinations needed to produce different words. At the same time, segmentation of Arabic script to individual characters is almost impossible and often, responsible to many misclassified items in Arabic script recognizers. This paper presents statistics on the Arabic language using a very large corpus of Arabic words. These statistical results, which could are used to improve the efficiency and accuracy of Arabic script recognizers also indicate that a holistic approach is computationally a...

Download

Out-of-core algorithms for scientific visualization and computer graphics

IEEE Visualization, 2003

This course will focus on describing techniques for handling datasets larger than main memory in ... more This course will focus on describing techniques for handling datasets larger than main memory in scientific visualization and computer graphics. Recently, several external memory techniques have been developed for a wide variety of graphics and visualization problems, including surface simplification, volume rendering, isosurface generation, ray tracing, surface reconstruction, and so on. This work has had significant impact given that in recent years there has been a rapid increase in the raw size of datasets. Several ...

Download

Digital Hebrew Paleography: Script Types and Modes

Journal of Imaging

Paleography is the study of ancient and medieval handwriting. It is essential for understanding, ... more Paleography is the study of ancient and medieval handwriting. It is essential for understanding, authenticating, and dating historical texts. Across many archives and libraries, many handwritten manuscripts are yet to be classified. Human experts can process a limited number of manuscripts; therefore, there is a need for an automatic tool for script type classification. In this study, we utilize a deep-learning methodology to classify medieval Hebrew manuscripts into 14 classes based on their script style and mode. Hebrew paleography recognizes six regional styles and three graphical modes of scripts. We experiment with several input image representations and network architectures to determine the appropriate ones and explore several approaches for script classification. We obtained the highest accuracy using hierarchical classification approach. At the first level, the regional style of the script is classified. Then, the patch is passed to the corresponding model at the second lev...

Download

Deep learning for paleographic analysis of medieval Hebrew manuscripts

Download

Unsupervised text line segmentation

ArXiv, 2020

We present an unsupervised text line segmentation method that is inspired by the relative varianc... more We present an unsupervised text line segmentation method that is inspired by the relative variance between text lines and spaces among text lines. Handwritten text line segmentation is important for the efficiency of further processing. A common method is to train a deep learning network for embedding the document image into an image of blob lines that are tracing the text lines. Previous methods learned such embedding in a supervised manner, requiring the annotation of many document images. This paper presents an unsupervised embedding of document image patches without a need for annotations. The main idea is that the number of foreground pixels over the text lines is relatively different from the number of foreground pixels over the spaces among text lines. Generating similar and different pairs relying on this principle definitely leads to outliers. However, as the results show, the outliers do not harm the convergence and the network learns to discriminate the text lines from th...

Download

Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text

2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), 2017

Download

Synthesizing versus Augmentation for Arabic Word Recognition with Convolutional Neural Networks

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 2018

In this paper, we present a sub-word recognition method for historical Arabic manuscripts, using ... more In this paper, we present a sub-word recognition method for historical Arabic manuscripts, using convolutional neural networks. We investigate the benefit of extending training set with synthetically created samples in comparison to augmentation. We show that annotating around ten pages of a manuscript and extending it, is sufficient for successful sub-word recognition in the whole manuscript. In addition, we show the contribution of using different combinations of training sets and compare their sub-word recognition performance in the whole manuscript.

Download

Automatic Synthesis of Historical Arabic Text for Word-Spotting

2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016

We present a novel framework for automatic and efficient synthesis of historical handwritten Arab... more We present a novel framework for automatic and efficient synthesis of historical handwritten Arabic text. The main purpose of this framework is to assist word spotting and keyword searching in handwritten historical documents. The proposed framework consists of two main procedures: building a letter connectivity map and synthesizing words. A letter connectivity map includes multiple instances of the various shape of each letter, since a letter in Arabic usually has multiple shapes depends in its position in the word. Each map represents one writer and encodes the specific handwriting style. The letter connectivity map is used to guide the synthesis of any Arabic continuous subword, word, or sentence. The proposed framework automatically generates the letter connectivity map annotation from a several pages historical pages previously annotated. Once the letter connectivity map is available our framework can synthesis the pictorial representation of any Arabic word or sentence from their text representation. The writing style of the synthesized text resembles the writing style of the input pages. The synthesized words can be used in word-spotting and many other historical document processing applications. The proposed approach provides an intuitive and easy-to-use framework to search for a keyword in the rest of the manuscript. Our experimental study shows that our approach enables accurate results in word spotting algorithms.