Skip to main content

Oriol Terrades

espanolLa vision por computador es la disciplina informatica que se encarga de disenar algoritmos que interpretan las imagenes digitales. Cuando las imagenes corresponden a documentos digitalizados, nos encontramos en la subdisciplina del... more
espanolLa vision por computador es la disciplina informatica que se encarga de disenar algoritmos que interpretan las imagenes digitales. Cuando las imagenes corresponden a documentos digitalizados, nos encontramos en la subdisciplina del analisis y reconocimiento de imagenes de documentos. En este articulo hacemos un repaso de la situacion actual de esta tecnologia y de sus posibilidades de aplicacion en la resolucion tanto de los problemas mas tradicionales de lectura optica (ROC), como de los que estan asociados a otras tipologias de documentos tales como los manuscritos, especialmente los historicos, y los graficos. En primer lugar, hacemos un repaso del estado de la cuestion de la tecnologia. A continuacion describimos dos casos practicos a raiz de proyectos llevados a cabo en el Centro de Vision por Computador de la UAB y de relevancia en el ambito archivistico: el analisis masivo de documentos administrativos y de documentos demograficos historicos manuscritos. EnglishCompute...
ABSTRACT In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then,... more
ABSTRACT In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to create a final text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.
We successfully combine Expectation-Maximization algorithm and variational approaches for parameter learning and computing inference on Markov random felds. This is a general method that can be applied to many computer vision tasks. In... more
We successfully combine Expectation-Maximization algorithm and variational approaches for parameter learning and computing inference on Markov random felds. This is a general method that can be applied to many computer vision tasks. In this paper, we apply it to handwritten text line segmentation. We conduct several experiments that demonstrate that our method deal with common issues of this task, such as complex document layout or non-latin scripts. The obtained results prove that our method achieve state-of-the-art performance on different benchmark datasets without any particular fine tuning step.
Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. Banknotes and identity documents are two common objects of counterfeiting. Aiming to detect these counterfeits, the present survey covers a wide... more
Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. Banknotes and identity documents are two common objects of counterfeiting. Aiming to detect these counterfeits, the present survey covers a wide range of anti-counterfeiting security features, categorizing them into three components: security substrate, security inks and security printing. respectively. From the computer vision perspective, we present works in the literature covering these three categories. Other topics, such as history of counterfeiting, effects on society and document experts, counterfeiter types of attacks, trends among others are covered. Therefore, from non-experienced to professionals in security documents, can be introduced or deepen its knowledge in anti-counterfeiting measures.
Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. In this paper, an efficient interactivepredictive transcription prototype called GIDOC (Gimp-based Interactive transcription... more
Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. In this paper, an efficient interactivepredictive transcription prototype called GIDOC (Gimp-based Interactive transcription of old text DOCuments) is presented. GIDOC is a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. It is based on GIMP and uses advanced techniques and tools for language and handwritten text modelling. Results are given on a real transcription task on a 764-page Spanish manuscript from 1891.
This paper proposes a new approach to localize symbol in the graphical documents using sparse representations of local descriptors over learning dictionary. More specifically, a training database, being local descriptors extracted from... more
This paper proposes a new approach to localize symbol in the graphical documents using sparse representations of local descriptors over learning dictionary. More specifically, a training database, being local descriptors extracted from documents, is used to build the learned dictionary. Then, the candidate regions into documents are defined following the similarity property between sparse representations of local descriptors. A vector model for candidate regions and for a query symbol is constructed based on the sparsity in a visual vocabulary where the visual words are columns in the learned dictionary. The matching process is performed by comparing the similarity between vector models. The first evaluation on SESYD database demonstrates that the proposed method is promising.
In this paper we present a knowledge base of architectural documents aiming at improving existing methods of floor plan classification and understanding. It consists of an ontological definition of the domain and the inclusion of real... more
In this paper we present a knowledge base of architectural documents aiming at improving existing methods of floor plan classification and understanding. It consists of an ontological definition of the domain and the inclusion of real instances coming from both, automatically interpreted and manually labeled documents. The knowledge base has proven to be an effective tool to structure our knowledge and to easily maintain and upgrade it. Moreover, it is an appropriate means to automatically check the consistency of relational data and a convenient complement of hard-coded knowledge interpretation systems.
Confocal Laser Endomicroscopy (CLE) is an emerging imaging technique that allows the in-vivo acquisition of cell patterns of potentially malignant lesions. Such patterns could discriminate between inflammatory and neoplastic lesions and,... more
Confocal Laser Endomicroscopy (CLE) is an emerging imaging technique that allows the in-vivo acquisition of cell patterns of potentially malignant lesions. Such patterns could discriminate between inflammatory and neoplastic lesions and, thus, serve as a first in-vivo biopsy to discard cases that do not actually require a cell biopsy.
We successfully combine Expectation-Maximization algorithm and variational approaches for parameter learning and computing inference on Markov random felds. This is a general method that can be applied to many computer vision tasks. In... more
We successfully combine Expectation-Maximization algorithm and variational approaches for parameter learning and computing inference on Markov random felds. This is a general method that can be applied to many computer vision tasks. In this paper, we apply it to handwritten text line segmentation. We conduct several experiments that demonstrate that our method deal with common issues of this task, such as complex document layout or non-latin scripts. The obtained results prove that our method achieve state-of-the-art performance on different benchmark datasets without any particular fine tuning step.
In this paper we evaluate the use of Relative Location Features (RLF) on a historical document segmentation task, and compare the quality of the results obtained on structured and unstructured documents using RLF and not using them. We... more
In this paper we evaluate the use of Relative Location Features (RLF) on a historical document segmentation task, and compare the quality of the results obtained on structured and unstructured documents using RLF and not using them. We prove that using these features improve the final segmentation on documents with a strong structure, while their application on unstructured documents does not show significant improvement. Although this paper is not focused on segmenting unstructured documents, results obtained on a benchmark dataset are equal or even overcome previous results of similar works.
This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of... more
This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.
We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical... more
We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector, as new signature of AG in a lower dimensional vectorial space. We focus to adapt the structured learning algorithm via 1_slack formulation with a suitable risk function, called Graph Edit Distance (GED). It defines the dissimilarity of the ground truth and predicted graph labels. It determines by the error tolerant graph matching using bipartite graph matching algorithm. We apply Structured Support Vector Machines (SSVM) to process classification task. During our experiments, we got our results on the GREC dataset.
In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise... more
In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising.
This paper is focused on the detection of counterfeit photocopy banknotes. The main difficulty is to work on a real industrial scenario without any constraint about the acquisition device and with a single image. The main contributions of... more
This paper is focused on the detection of counterfeit photocopy banknotes. The main difficulty is to work on a real industrial scenario without any constraint about the acquisition device and with a single image. The main contributions of this paper are twofold: first the adaptation and performance evaluation of existing approaches to classify the genuine and photocopy banknotes using background texture printing analysis, which have not been applied into this context before. Second, a new dataset of Euro banknotes images acquired with several cameras under different luminance conditions to evaluate these methods. Experiments on the proposed algorithms show that mixing SIFT features and sparse coding dictionaries achieves quasi perfect classification using a linear SVM with the created dataset. Approaches using dictionaries to cover all possible texture variations have demonstrated to be robust and outperform the state-of-the-art methods using the proposed benchmark.
... have lower distances in the direction from this symbol to the other ones. Finally, we have carried out an experiment to test the robustness of the representation obtained with the Radon transform to shape distortions. ... In... more
... have lower distances in the direction from this symbol to the other ones. Finally, we have carried out an experiment to test the robustness of the representation obtained with the Radon transform to shape distortions. ... In Proceedings of First IbPRIA, June 2003. Andratx, Spain. ...
In this paper, we propose the use of an Attributed Graph Grammar as unique framework to model and recognize the structure of floor plans. This grammar represents a building as a hierarchical composition of structurally and semantically... more
In this paper, we propose the use of an Attributed Graph Grammar as unique framework to model and recognize the structure of floor plans. This grammar represents a building as a hierarchical composition of structurally and semantically related elements, where common representations are learned stochastically from annotated data. Given an input image, the parsing consists on constructing that graph representation that better agrees with the probabilistic model defined by the grammar. The proposed method provides several advantages with respect to the traditional floor plan analysis techniques. It uses an unsupervised statistical approach for detecting walls that adapts to different graphical notations and relaxes strong structural assumptions such are straightness and orthogonality. Moreover, the independence between the knowledge model and the parsing implementation allows the method to learn automatically different building configurations and thus, to cope the existing variability. These advantages are clearly demonstrated by comparing it with the most recent floor plan interpretation techniques on 4 datasets of real floor plans with different notations.
Nowadays, automatic identity document recognition, including passport and driving license recognition, is at the core of many applications within the administrative and service sectors, such as police, hospitality, car renting, etc. In... more
Nowadays, automatic identity document recognition, including passport and driving license recognition, is at the core of many applications within the administrative and service sectors, such as police, hospitality, car renting, etc. In former years, the document information was manually extracted whereas today this data is recognized automatically from images obtained by flat-bed scanners. Yet, since these scanners tend to be expensive and voluminous, companies in the sector have recently turned their attention to cheaper, small and yet computationally powerful scanners: the mobile devices. The document identity recognition from mobile images enclose several new difficulties w.r.t traditional scanned images, such as the loss of a controlled background, perspective, blurring, etc. In this paper we present a real application for identity document classification of images taken from mobile devices. This classification process is of extreme importance since a prior knowledge of the document type and origin strongly facilitates the subsequent information extraction. The proposed method is based on a traditional Bagof-Words in which we have taken into consideration several key aspects to enhance recognition rate. The method performance has been studied on three datasets containing more than 2000 images from 129 different document classes.
Research Interests:
In this paper, we explore the use of learning algorithm (K-SVD) for building dictionaries adapted to the image properties. In addition, in our model, we also modeled the energy of the noise basing on the function of the normalized... more
In this paper, we explore the use of learning algorithm (K-SVD) for building dictionaries adapted to the image properties. In addition, in our model, we also modeled the energy of the noise basing on the function of the normalized cross-correlation between noised and non noised documents identified in training set. We have evaluated this method on the Grec2005 dataset. The experimental results demonstrate the robustness of our approach by comparing it with state-of-the-art methods.
Research Interests:
Research Interests:
ABSTRACT In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then,... more
ABSTRACT In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to create a final text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.
Research Interests:
Research Interests:
In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the optical character recognition (OCR) of such a document page it is better to identify, at first, Thai and English script... more
In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the optical character recognition (OCR) of such a document page it is better to identify, at first, Thai and English script portions and then to use individual OCR system of the respective scripts on these identified portions. In this paper, a SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of the individual character group combining different character features obtained from structural shape, profile, component overlapping information, topological properties, water reservoir concept etc. Based on the experiment on 6110 data we obtained 99.36% script identification accuracy from the proposed scheme.
... inria-00179484, version 1; http://hal.inria.fr/inria-00179484/en/; oai:hal.inria.fr:inria-00179484; From: Oriol Ramos Terrades <>; Submitted on: Monday, 15 October 2007 15:56:38; Updated on: Tuesday, 16 October 2007 10:56:48.... more
... inria-00179484, version 1; http://hal.inria.fr/inria-00179484/en/; oai:hal.inria.fr:inria-00179484; From: Oriol Ramos Terrades <>; Submitted on: Monday, 15 October 2007 15:56:38; Updated on: Tuesday, 16 October 2007 10:56:48. See detailed view. Export. ...
Abstract We propose a set of shape descriptors for image retrieval of graphic documents based on the ridgelets transform, which can be seen as a combination of the Radon transform and the wavelets transform. It is especially well suited... more
Abstract We propose a set of shape descriptors for image retrieval of graphic documents based on the ridgelets transform, which can be seen as a combination of the Radon transform and the wavelets transform. It is especially well suited to detect linear features, ...
Many different kinds of shape descriptors have been defined but usually, each of them is only suitable for some particular kinds of shapes. Then, a strategy to improve performance in arbitrary shapes is the use of several descriptors. In... more
Many different kinds of shape descriptors have been defined but usually, each of them is only suitable for some particular kinds of shapes. Then, a strategy to improve performance in arbitrary shapes is the use of several descriptors. In this paper, we address the problem of how to combine several shape descriptors into a single representation. We present an adaptation
Retrieval and recognition of symbols in graphic images requires good symbol representation, able to identify those features providing the most relevant information about shape and visual appearance of symbols. In this work we have... more
Retrieval and recognition of symbols in graphic images requires good symbol representation, able to identify those features providing the most relevant information about shape and visual appearance of symbols. In this work we have introduced Ridgelets transform as it permits to detect lineal singularities in an image, which are the most important source of information in graphic images. Sparsity is
This paper presents a new use of the Ridgelets Transform as a multiscale method for indexing linear symbols in graphic documents. The Ridgelets Transform is useful to detect linear singularities in images. Therefore, it can be used to get... more
This paper presents a new use of the Ridgelets Transform as a multiscale method for indexing linear symbols in graphic documents. The Ridgelets Transform is useful to detect linear singularities in images. Therefore, it can be used to get a good representation of ...
Abstract Shape descriptors play an important role in many document analysis application. In this paper we review some of the shape descriptors proposed in the last years from a new point of view. We propose the definitions of descriptor... more
Abstract Shape descriptors play an important role in many document analysis application. In this paper we review some of the shape descriptors proposed in the last years from a new point of view. We propose the definitions of descriptor and primitive and introduce the ...
... Descriptors for Symbol Representation Ernest Valveny1, Salvatore Tabbone2, Oriol Ramos1, and Emilie Philippot2 1 Computer Vision Center, Dep. ... It is invariant to translation and scaling. – Pixel-level constraint (PLC)[14]: it is... more
... Descriptors for Symbol Representation Ernest Valveny1, Salvatore Tabbone2, Oriol Ramos1, and Emilie Philippot2 1 Computer Vision Center, Dep. ... It is invariant to translation and scaling. – Pixel-level constraint (PLC)[14]: it is based on the points of the skeleton of the shape. ...
A method for text block detection is introduced for old handwritten documents. The proposed method takes advantage of sequential book structure, taking into account layout information from pages previously transcribed. This glance at the... more
A method for text block detection is introduced for old handwritten documents. The proposed method takes advantage of sequential book structure, taking into account layout information from pages previously transcribed. This glance at the past is used to predict the position of text blocks in the current page with the help of conventional layout analysis methods. The method is integrated
ABSTRACT According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying,... more
ABSTRACT According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying, detecting, and recognizing symbols in technical drawings, maps, or miscellaneous documents such as logos and musical scores. This chapter aims at providing the reader an overview of the different existing ways of describing and recognizing symbols and how the field has evolved to attain a certain degree of maturity.
ABSTRACT Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods... more
ABSTRACT Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.

And 21 more