Hasnat et al., 2009 - Google Patents
An open source tesseract based optical character recognizer for bangla scriptHasnat et al., 2009
- Document ID
- 1643606859736684784
- Author
- Hasnat M
- Chowdhury M
- Khan M
- Publication year
- Publication venue
- 2009 10th international conference on document analysis and recognition
External Links
Snippet
BanglaOCR is currently the only open source optical character recognition (OCR) software for the Bangla (Bengali) script developed by the Center for Research on Bangla Language Processing (CRBLP). Tesseract, maintained by Google, is considered to be one of the most …
- 230000003287 optical 0 title abstract description 4
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/2054—Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K15/00—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers
- G06K15/02—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers using printers
- G06K15/18—Conditioning data for presenting it to the physical printing elements
- G06K15/1848—Generation of the printable image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10418029B2 (en) | Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods | |
KR101376863B1 (en) | Grammar Analysis of Document Visual Structures | |
JP5647919B2 (en) | Character recognition device, character recognition method, character recognition system, and character recognition program | |
US8494280B2 (en) | Automated method for extracting highlighted regions in scanned source | |
AU2010311067B2 (en) | System and method for increasing the accuracy of optical character recognition (OCR) | |
US20030200078A1 (en) | System and method for language translation of character strings occurring in captured image data | |
Hasnat et al. | An open source tesseract based optical character recognizer for bangla script | |
US20040267734A1 (en) | Document search method and apparatus | |
JP2014106961A (en) | Method executed by computer for automatically recognizing text in arabic, and computer program | |
Vafaie et al. | Handwritten and printed text identification in historical archival documents | |
CN114821613A (en) | A method and system for extracting table information in PDF | |
Vasantharajan et al. | Adapting the tesseract open-source OCR engine for Tamil and Sinhala legacy fonts and creating a parallel corpus for Tamil-Sinhala-English | |
Chowdhury et al. | Implementation of an optical character reader (ocr) for bengali language | |
Chakraborty et al. | An open source tesseract based tool for extracting text from images with application in braille translation for the visually impaired | |
JP2010061403A (en) | Character string recognition device, method, and program | |
US12333240B2 (en) | Systems and processes of extracting unstructured data from complex documents | |
Chowdhury et al. | An open source Tesseract based Optical Character Recognizer for Bangla script | |
Sarkar et al. | Printed ocr for extremely low-resource indic languages | |
Shafait | Document image analysis with OCRopus | |
JP4334068B2 (en) | Keyword extraction method and apparatus for image document | |
JP7172343B2 (en) | Document retrieval program | |
CN1226692C (en) | Machine translation system based on semanteme and its method | |
JP7552113B2 (en) | Information processing device and program | |
JP7533773B2 (en) | Feature selection device, feature selection method, and feature selection program | |
US20240212376A1 (en) | Ocr based on ml text segmentation input |