Skip to main content

Dr. Sakila Ayyasamy

Bharathiar University, Coimbatore, Tamilnadu, INDIA, Computer science and engineering, Post-Doc

Followers

62

Following

14

Co-author

1

Public Views

Address: India

less

Interests

Uploads

Papers

Text Extraction from Tamil and Hindi Document Images using Open Source Optical Character Recognition tools

Optical Character Recognition (OCR) is a technique, which is used to extract the text from docume... more Optical Character Recognition (OCR) is a technique, which is used to extract the text from document images and converted into text format. This kind of information retrieval is called as recognition based retrieval hence that it can be edited, searched, stored more efficiently. OCR is used for many applications such as library, organization, bank cheques, number plate recognition, historical book analysis and many others applications. Various OCR tools are available for converting document images in different types of languages. The primary objective of this work is to compare the performance analysis of the three different OCR tools for extracting the text information from Tamil and Hindi document images. The OCR tools considered in this analysis are Google Docs, Free Online OCR and i2OCR. Based on the conversion accuracy it is observed that the performance of Free Online OCR is better than other OCR tools. KeywordsOptical Character Recognition, OCR architecture for Tamil and Hindi...

A Survey on Word Spotting Techniques for Document Image Retrieval

Document images are becoming more popular in today’s world and being made available over the inte... more Document images are becoming more popular in today’s world and being made available over the internet, scanned/captured documents are used in paperless offices and digital libraries. Paper documents can be converted into digital form by using digitization equipments and it is stored in document image databases. If the documents are stored in image formats, it is very difficult to perform the searching process. Conventional way of information retrieval from document images into their text formats is done by using Optical Character Recognition (OCR). OCR is a technique which converts the document images into text and then we can perform the information retrieval. But the drawback here is OCR fails to perform 100% accurate conversions, hence it is difficult to perform the information retrieval task in document images. For this reason, there is a need for inventive method to search keywords in document images without performing the conversion process. Word spotting technique is one of s...

Image Enhancement using Morphological Operations

International journal of scientific research in science, engineering and technology, 2017

Today, Image Processing has become one of the popular and essential research domains in the field... more Today, Image Processing has become one of the popular and essential research domains in the field of computer science and information technology. Important tasks of image processing are image enhancement, image compression and information extraction. Image enhancement techniques are used to improve the quality of image where as information extraction techniques extract the statistical information about portion of an image or any particular image. Image compression is reducing the size of the digital image and allows more images to be stored in a computer disk. In this digital era, people can handle large amount of digital documents and with this documents we need to perform several tasks. Digital images may contain both machine printed documents and handwritten documents. Machine printed document images might have been produced using various technologies with good quality. A handwritten digital document has poor quality like blurred images, old handwritten manuscripts and handwritte...

Content Based Text Information Search and Retrieval in Document Images for Digital Library

Journal of Digital Information Management, 2018

Multimedia Mining Research – An Overview

International Journal of Computer Graphics & Animation, 2015

Performance Comparison of OCR Tools

International Journal of UbiComp, 2015

Template Matching Technique for Searching Words in Document Images

International Journal on Cybernetics & Informatics, 2015

Document Image Compression using Hybrid Compression Technique

Nowadays images are plays a vital role in the internet technology; people are sharing lots of pho... more Nowadays images are plays a vital role in the internet technology; people are sharing lots of photos and images through the internet like social media’s and internet messengers. Image size is a one of the issue for while downloading, consequently an image compression technique is need for transmission and storage of digital images. Different types of image compression techniques are available for compress the image, it may be two types one is lossy another one is lossless. In lossy compression, it has loss the image quality; lossless compression does not loss the image quality. In this research work proposed the hybrid technique, which compares the two existing compression techniques such as Discrete Wavelet Transform (DWT), Set Part itioning in Hierarchical Trees (SPIHT). From the performance are analyzed and found a hybrid technique produce good accuracy than other two compression techniques with its highest accuracy values by using PSNR and SSIM.

Content Based Text Information Search and Retrieval in Document Images for Digital Library

by Dr. Sakila Ayyasamy and Vijayarani Mohan

The main objective of this research work isto find the keywords in the captured/scanned pr... more The main objective of this research work isto find the keywords in the captured/scanned printdocument images in the image database. Documentimages are becoming more popular in today’s world andthese are used in paperless offices and digital libraries.Information retrieval from the document images is a verychallenging task. Hence, there is a need for developingsearching strategies to find the required information fromthese document images as per user’s needs, becomesvery essential in nowadays. Traditionally Optical CharacterRecognition (OCR) tools are used for information retrievalfrom the document images, but it’s not an efficient method.Word spotting is an inventive method for searching thedocument images and to retrieve relevant informationwithout any conversion. In this work an algorithm EnhancedDynamic Time Warping was proposed to for findingkeywords from document images, it is based on wordspotting technique. Different matching algorithms are madeavailable for word spotting. Popular algorithms areNormalization Cross Correlation (NCC) and Dynamic TimeWarping (DTW). In this work, we have compared theperformance of these two existing algorithms with theproposed algorithm named as Enhanced Dynamic TimeWarping algorithm (EDTW). Different image formats anddifferent sizes of images are used for experimentation.From the results it is observed that the proposed algorithmhas produced good results than an existing one.

Image Enhancement using Morphological Operations

Document Image Compression using Hybrid Compression Technique

Text Extraction from Tamil and Hindi Document Images using Open Source Optical Character Recognition tools

A SURVEY ON WORD SPOTTING TECHNIQUES FOR DOCUMENT IMAGE RETRIEVAL

A Performance Comparison of Edge Detection Techniques for Printed and Handwritten Document Images

TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES

Template matching technique is useful for searching and finding the location of a template image ... more Template matching technique is useful for searching and finding the location of a template image (Small part of image) in the larger image. This technique is also used in Optical Character Recognition (OCR) tools and these tools are used for converting the scanned document images into normal text. Template matching technique is used to find and recognize the template image which is found in the given input image. In this research work, template matching technique is applied for scanned document images which contains characters (both uppercase and lowercase) and numerals. In order to perform the comparison of the template image with the input image we have used Performance Index method and it is compared with the normalized cross correlation and cross correlation methods. Different types of comparisons done in this work are, (i) comparing single character from a word, sentence and paragraph; (ii) comparing multiple characters (words) from a word, sentence and paragraph.

PERFORMANCE COMPARISON OF OCR TOOLS

Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable t... more Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text format. Many different types of Optical Character Recognition (OCR) tools are commercially available today; it is a useful and popular method for different types of applications. OCR can predict the accurate result depends on text pre-processing and segmentation algorithms. Image quality is one of the most important factors that improve quality of recognition in performing OCR tools. Images can be processed independently (.png, .jpg, and .gif files) or in multi-page PDF documents (.pdf). The primary objective of this work is to provide the overview of various Optical Character Recognition (OCR) tools and analyses of their performance by applying the two factors of OCR tool performance i.e. accuracy and error rate.

MULTIMEDIA MINING RESEARCH – AN OVERVIEW

Multimedia data mining is a popular research domain which helps to extract interesting knowledge ... more Multimedia data mining is a popular research domain which helps to extract interesting knowledge from multimedia data sets such as audio, video, images, graphics, speech, text and combination of several types of data sets. Normally, multimedia data are categorized into unstructured and semi-structured data. These data are stored in multimedia databases and multimedia mining is used to find useful information from large multimedia database system by using various multimedia techniques and powerful tools. This paper provides the basic concepts of multimedia mining and its essential characteristics. Multimedia mining architectures for structured and unstructured data, research issues in multimedia mining, data mining models used for multimedia mining and applications are also discussed in this paper. It helps the researchers to get the knowledge about how to do their research in the field of multimedia mining.

TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES

Text Extraction from Tamil and Hindi Document Images using Open Source Optical Character Recognition tools

Optical Character Recognition (OCR) is a technique, which is used to extract the text from docume... more Optical Character Recognition (OCR) is a technique, which is used to extract the text from document images and converted into text format. This kind of information retrieval is called as recognition based retrieval hence that it can be edited, searched, stored more efficiently. OCR is used for many applications such as library, organization, bank cheques, number plate recognition, historical book analysis and many others applications. Various OCR tools are available for converting document images in different types of languages. The primary objective of this work is to compare the performance analysis of the three different OCR tools for extracting the text information from Tamil and Hindi document images. The OCR tools considered in this analysis are Google Docs, Free Online OCR and i2OCR. Based on the conversion accuracy it is observed that the performance of Free Online OCR is better than other OCR tools. KeywordsOptical Character Recognition, OCR architecture for Tamil and Hindi...

A Survey on Word Spotting Techniques for Document Image Retrieval

Document images are becoming more popular in today’s world and being made available over the inte... more Document images are becoming more popular in today’s world and being made available over the internet, scanned/captured documents are used in paperless offices and digital libraries. Paper documents can be converted into digital form by using digitization equipments and it is stored in document image databases. If the documents are stored in image formats, it is very difficult to perform the searching process. Conventional way of information retrieval from document images into their text formats is done by using Optical Character Recognition (OCR). OCR is a technique which converts the document images into text and then we can perform the information retrieval. But the drawback here is OCR fails to perform 100% accurate conversions, hence it is difficult to perform the information retrieval task in document images. For this reason, there is a need for inventive method to search keywords in document images without performing the conversion process. Word spotting technique is one of s...

Image Enhancement using Morphological Operations

International journal of scientific research in science, engineering and technology, 2017

Today, Image Processing has become one of the popular and essential research domains in the field... more Today, Image Processing has become one of the popular and essential research domains in the field of computer science and information technology. Important tasks of image processing are image enhancement, image compression and information extraction. Image enhancement techniques are used to improve the quality of image where as information extraction techniques extract the statistical information about portion of an image or any particular image. Image compression is reducing the size of the digital image and allows more images to be stored in a computer disk. In this digital era, people can handle large amount of digital documents and with this documents we need to perform several tasks. Digital images may contain both machine printed documents and handwritten documents. Machine printed document images might have been produced using various technologies with good quality. A handwritten digital document has poor quality like blurred images, old handwritten manuscripts and handwritte...

Content Based Text Information Search and Retrieval in Document Images for Digital Library

Journal of Digital Information Management, 2018

Multimedia Mining Research – An Overview

International Journal of Computer Graphics & Animation, 2015

Performance Comparison of OCR Tools

International Journal of UbiComp, 2015

Template Matching Technique for Searching Words in Document Images

International Journal on Cybernetics & Informatics, 2015

Document Image Compression using Hybrid Compression Technique

Nowadays images are plays a vital role in the internet technology; people are sharing lots of pho... more Nowadays images are plays a vital role in the internet technology; people are sharing lots of photos and images through the internet like social media’s and internet messengers. Image size is a one of the issue for while downloading, consequently an image compression technique is need for transmission and storage of digital images. Different types of image compression techniques are available for compress the image, it may be two types one is lossy another one is lossless. In lossy compression, it has loss the image quality; lossless compression does not loss the image quality. In this research work proposed the hybrid technique, which compares the two existing compression techniques such as Discrete Wavelet Transform (DWT), Set Part itioning in Hierarchical Trees (SPIHT). From the performance are analyzed and found a hybrid technique produce good accuracy than other two compression techniques with its highest accuracy values by using PSNR and SSIM.

Content Based Text Information Search and Retrieval in Document Images for Digital Library

by Dr. Sakila Ayyasamy and Vijayarani Mohan

The main objective of this research work isto find the keywords in the captured/scanned pr... more The main objective of this research work isto find the keywords in the captured/scanned printdocument images in the image database. Documentimages are becoming more popular in today’s world andthese are used in paperless offices and digital libraries.Information retrieval from the document images is a verychallenging task. Hence, there is a need for developingsearching strategies to find the required information fromthese document images as per user’s needs, becomesvery essential in nowadays. Traditionally Optical CharacterRecognition (OCR) tools are used for information retrievalfrom the document images, but it’s not an efficient method.Word spotting is an inventive method for searching thedocument images and to retrieve relevant informationwithout any conversion. In this work an algorithm EnhancedDynamic Time Warping was proposed to for findingkeywords from document images, it is based on wordspotting technique. Different matching algorithms are madeavailable for word spotting. Popular algorithms areNormalization Cross Correlation (NCC) and Dynamic TimeWarping (DTW). In this work, we have compared theperformance of these two existing algorithms with theproposed algorithm named as Enhanced Dynamic TimeWarping algorithm (EDTW). Different image formats anddifferent sizes of images are used for experimentation.From the results it is observed that the proposed algorithmhas produced good results than an existing one.

Image Enhancement using Morphological Operations

Document Image Compression using Hybrid Compression Technique

Text Extraction from Tamil and Hindi Document Images using Open Source Optical Character Recognition tools

A SURVEY ON WORD SPOTTING TECHNIQUES FOR DOCUMENT IMAGE RETRIEVAL

A Performance Comparison of Edge Detection Techniques for Printed and Handwritten Document Images

TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES

Template matching technique is useful for searching and finding the location of a template image ... more Template matching technique is useful for searching and finding the location of a template image (Small part of image) in the larger image. This technique is also used in Optical Character Recognition (OCR) tools and these tools are used for converting the scanned document images into normal text. Template matching technique is used to find and recognize the template image which is found in the given input image. In this research work, template matching technique is applied for scanned document images which contains characters (both uppercase and lowercase) and numerals. In order to perform the comparison of the template image with the input image we have used Performance Index method and it is compared with the normalized cross correlation and cross correlation methods. Different types of comparisons done in this work are, (i) comparing single character from a word, sentence and paragraph; (ii) comparing multiple characters (words) from a word, sentence and paragraph.

PERFORMANCE COMPARISON OF OCR TOOLS

Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable t... more Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text format. Many different types of Optical Character Recognition (OCR) tools are commercially available today; it is a useful and popular method for different types of applications. OCR can predict the accurate result depends on text pre-processing and segmentation algorithms. Image quality is one of the most important factors that improve quality of recognition in performing OCR tools. Images can be processed independently (.png, .jpg, and .gif files) or in multi-page PDF documents (.pdf). The primary objective of this work is to provide the overview of various Optical Character Recognition (OCR) tools and analyses of their performance by applying the two factors of OCR tool performance i.e. accuracy and error rate.

MULTIMEDIA MINING RESEARCH – AN OVERVIEW

Multimedia data mining is a popular research domain which helps to extract interesting knowledge ... more Multimedia data mining is a popular research domain which helps to extract interesting knowledge from multimedia data sets such as audio, video, images, graphics, speech, text and combination of several types of data sets. Normally, multimedia data are categorized into unstructured and semi-structured data. These data are stored in multimedia databases and multimedia mining is used to find useful information from large multimedia database system by using various multimedia techniques and powerful tools. This paper provides the basic concepts of multimedia mining and its essential characteristics. Multimedia mining architectures for structured and unstructured data, research issues in multimedia mining, data mining models used for multimedia mining and applications are also discussed in this paper. It helps the researchers to get the knowledge about how to do their research in the field of multimedia mining.

TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES