Hasnat et al., 2009 - Google Patents

An open source tesseract based optical character recognizer for bangla script

Hasnat et al., 2009

Document ID: 1643606859736684784
Author: Hasnat M; Chowdhury M; Khan M
Publication year: 2009
Publication venue: 2009 10th international conference on document analysis and recognition

External Links

Cited by

Snippet

BanglaOCR is currently the only open source optical character recognition (OCR) software for the Bangla (Bengali) script developed by the Center for Research on Bangla Language Processing (CRBLP). Tesseract, maintained by Google, is considered to be one of the most …

Continue reading at ieeexplore.ieee.org (other versions)

230000003287 optical 0 title abstract description 4

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/2054—Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K15/00—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers
- G06K15/02—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers using printers
- G06K15/18—Conditioning data for presenting it to the physical printing elements
- G06K15/1848—Generation of the printable image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Similar Documents

Publication	Publication Date	Title
US10418029B2 (en)	2019-09-17	Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
KR101376863B1 (en)	2014-03-20	Grammar Analysis of Document Visual Structures
JP5647919B2 (en)	2015-01-07	Character recognition device, character recognition method, character recognition system, and character recognition program
US8494280B2 (en)	2013-07-23	Automated method for extracting highlighted regions in scanned source
AU2010311067B2 (en)	2016-08-04	System and method for increasing the accuracy of optical character recognition (OCR)
US20030200078A1 (en)	2003-10-23	System and method for language translation of character strings occurring in captured image data
Hasnat et al.	2009	An open source tesseract based optical character recognizer for bangla script
US20040267734A1 (en)	2004-12-30	Document search method and apparatus
JP2014106961A (en)	2014-06-09	Method executed by computer for automatically recognizing text in arabic, and computer program
Vafaie et al.	2022	Handwritten and printed text identification in historical archival documents
CN114821613A (en)	2022-07-29	A method and system for extracting table information in PDF
Vasantharajan et al.	2022	Adapting the tesseract open-source OCR engine for Tamil and Sinhala legacy fonts and creating a parallel corpus for Tamil-Sinhala-English
Chowdhury et al.	2015	Implementation of an optical character reader (ocr) for bengali language
Chakraborty et al.	2013	An open source tesseract based tool for extracting text from images with application in braille translation for the visually impaired
JP2010061403A (en)	2010-03-18	Character string recognition device, method, and program
US12333240B2 (en)	2025-06-17	Systems and processes of extracting unstructured data from complex documents
Chowdhury et al.	0	An open source Tesseract based Optical Character Recognizer for Bangla script
Sarkar et al.	2024	Printed ocr for extremely low-resource indic languages
Shafait	2009	Document image analysis with OCRopus
JP4334068B2 (en)	2009-09-16	Keyword extraction method and apparatus for image document
JP7172343B2 (en)	2022-11-16	Document retrieval program
CN1226692C (en)	2005-11-09	Machine translation system based on semanteme and its method
JP7552113B2 (en)	2024-09-18	Information processing device and program
JP7533773B2 (en)	2024-08-14	Feature selection device, feature selection method, and feature selection program
US20240212376A1 (en)	2024-06-27	Ocr based on ml text segmentation input