[go: up one dir, main page]

Rychlik et al., 2020 - Google Patents

Development of a New Image-to-Text Conversion System for Pashto, Farsi and Traditional Chinese

Rychlik et al., 2020

View PDF
Document ID
3714176763887357002
Author
Rychlik M
Nwaigwe D
Han Y
Murphy D
Publication year
Publication venue
arXiv preprint arXiv:2005.08650

External Links

Snippet

We report upon the results of a research and prototype building project\emph {Worldly~ OCR} dedicated to developing new, more accurate image-to-text conversion software for several languages and writing systems. These include the cursive scripts Farsi and Pashto …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/68Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
    • G06K9/6807Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
    • G06K9/6842Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/72Methods or arrangements for recognition using electronic means using context analysis based on the provisionally recognized identity of a number of successive patterns, e.g. a word
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/34Segmentation of touching or overlapping patterns in the image field
    • G06K9/342Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/2054Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00442Document analysis and understanding; Document recognition
    • G06K9/00469Document understanding by extracting the logical structure, e.g. chapters, sections, columns, titles, paragraphs, captions, page number, and identifying its elements, e.g. author, keywords, ZIP code, money amount
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00852Recognising whole cursive words
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/03Detection or correction of errors, e.g. by rescanning the pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K2209/01Character recognition

Similar Documents

Publication Publication Date Title
US10936862B2 (en) System and method of character recognition using fully convolutional neural networks
Singh Optical character recognition techniques: a survey
EP3539051B1 (en) System and method of character recognition using fully convolutional neural networks
Boillet et al. Robust text line detection in historical documents: learning and evaluation methods
Mostafa et al. Ocformer: A transformer-based model for arabic handwritten text recognition
Rigaud et al. Segmentation-free speech text recognition for comic books
Sankaran et al. Devanagari text recognition: A transcription based formulation
Bilgin Tasdemir Printed Ottoman text recognition using synthetic data and data augmentation: EF Bilgin Tasdemir
Salaheldin Kasem et al. Advancements and challenges in Arabic optical character recognition: A comprehensive survey
Alrasheed et al. Evaluation of deep learning techniques for content extraction in Spanish colonial notary records
Al Ghamdi A novel approach to printed Arabic optical character recognition
Yasin et al. Transformer-based neural machine translation for post-OCR error correction in cursive text
Pan et al. A new dataset for mongolian online handwritten recognition
Asadi-zeydabadi et al. IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical Character Recognition
Chaudhury et al. A deep OCR for degraded Bangla documents
Kumar et al. Lipi Gnani: a versatile OCR for documents in any language printed in Kannada script
Tasdemir et al. Automatic transcription of ottoman documents using deep learning
Chamchong et al. Thai handwritten recognition on BEST2019 datasets using deep Learning
Rychlik et al. Development of a New Image-to-Text Conversion System for Pashto, Farsi and Traditional Chinese
Ruldeviyani et al. Character recognition system for pegon typed manuscript
Jayanthi et al. Recognition of HandwrittenWords from DigitalWriting Pad Using MMU-SNet.
Ashraf et al. An analysis of optical character recognition (ocr) methods
Alothman et al. A survey on the online Arabic handwriting recognition: challenges, datasets, and future directions
Vaddadi et al. A Handwriting Recognition System That Outputs Editable Text And Audio
Nakarmi et al. Nepal Script Text Recognition using CRNN CTC Architecture