[go: up one dir, main page]

Hasnat et al., 2009 - Google Patents

An open source tesseract based optical character recognizer for bangla script

Hasnat et al., 2009

Document ID
1643606859736684784
Author
Hasnat M
Chowdhury M
Khan M
Publication year
Publication venue
2009 10th international conference on document analysis and recognition

External Links

Snippet

BanglaOCR is currently the only open source optical character recognition (OCR) software for the Bangla (Bengali) script developed by the Center for Research on Bangla Language Processing (CRBLP). Tesseract, maintained by Google, is considered to be one of the most …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2217Character encodings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/2054Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/68Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00442Document analysis and understanding; Document recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K15/00Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers
    • G06K15/02Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers using printers
    • G06K15/18Conditioning data for presenting it to the physical printing elements
    • G06K15/1848Generation of the printable image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Similar Documents

Publication Publication Date Title
US10418029B2 (en) Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
KR101376863B1 (en) Grammar Analysis of Document Visual Structures
JP5647919B2 (en) Character recognition device, character recognition method, character recognition system, and character recognition program
US8494280B2 (en) Automated method for extracting highlighted regions in scanned source
AU2010311067B2 (en) System and method for increasing the accuracy of optical character recognition (OCR)
US20030200078A1 (en) System and method for language translation of character strings occurring in captured image data
Hasnat et al. An open source tesseract based optical character recognizer for bangla script
US20040267734A1 (en) Document search method and apparatus
JP2014106961A (en) Method executed by computer for automatically recognizing text in arabic, and computer program
Vafaie et al. Handwritten and printed text identification in historical archival documents
CN114821613A (en) A method and system for extracting table information in PDF
Vasantharajan et al. Adapting the tesseract open-source OCR engine for Tamil and Sinhala legacy fonts and creating a parallel corpus for Tamil-Sinhala-English
Chowdhury et al. Implementation of an optical character reader (ocr) for bengali language
Chakraborty et al. An open source tesseract based tool for extracting text from images with application in braille translation for the visually impaired
JP2010061403A (en) Character string recognition device, method, and program
US12333240B2 (en) Systems and processes of extracting unstructured data from complex documents
Chowdhury et al. An open source Tesseract based Optical Character Recognizer for Bangla script
Sarkar et al. Printed ocr for extremely low-resource indic languages
Shafait Document image analysis with OCRopus
JP4334068B2 (en) Keyword extraction method and apparatus for image document
JP7172343B2 (en) Document retrieval program
CN1226692C (en) Machine translation system based on semanteme and its method
JP7552113B2 (en) Information processing device and program
JP7533773B2 (en) Feature selection device, feature selection method, and feature selection program
US20240212376A1 (en) Ocr based on ml text segmentation input