Salih et al., 2025 - Google Patents
KSTRV1: A scene text recognition dataset for central Kurdish in (Arabic-Based) scriptSalih et al., 2025
View HTML- Document ID
- 138066164338734310
- Author
- Salih S
- Jacksi K
- Publication year
- Publication venue
- Data in Brief
External Links
Snippet
Abstract Scene Text Recognition (STR) has advanced significantly in recent years, yet languages utilizing Arabic-based scripts, such as Kurdish, remain underrepresented in existing datasets. This paper introduces KSTRV1, the first large-scale dataset designed for …
- 238000013515 script 0 title abstract description 36
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
- G06F17/242—Editing, e.g. insert/delete by use of digital ink
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/32—Aligning or centering of the image pick-up or image-field
- G06K9/3233—Determination of region of interest
- G06K9/325—Detection of text region in scene imagery, real life image or Web pages, e.g. licenses plates, captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/2054—Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
- G06K9/2063—Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification based on a marking or identifier characterising the document or the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/22—Image acquisition using hand-held instruments
- G06K9/222—Image acquisition using hand-held instruments the instrument generating sequences of position coordinates corresponding to handwriting; preprocessing or recognising digital ink
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
- G06K9/00463—Document analysis by extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics, paragraphs, words or letters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/34—Segmentation of touching or overlapping patterns in the image field
- G06K9/342—Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00402—Recognising digital ink, i.e. recognising temporal sequences of handwritten position coordinates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00852—Recognising whole cursive words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111723807B (en) | End-to-end deep learning recognition machine for typing characters and handwriting characters | |
| US8209600B1 (en) | Method and apparatus for generating layout-preserved text | |
| US8254681B1 (en) | Display of document image optimized for reading | |
| US20040202352A1 (en) | Enhanced readability with flowed bitmaps | |
| US20150146985A1 (en) | Handwritten document processing apparatus and method | |
| Markou et al. | A convolutional recurrent neural network for the handwritten text recognition of historical greek manuscripts | |
| Gervais et al. | Mathwriting: A dataset for handwritten mathematical expression recognition | |
| Zharikov et al. | DDI-100: dataset for text detection and recognition | |
| Elanwar et al. | Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model | |
| CN111291758B (en) | Method and device for recognizing stamp characters | |
| Nagy | Disruptive developments in document recognition | |
| CN115004261B (en) | Text line detection | |
| Salih et al. | KSTRV1: A scene text recognition dataset for central Kurdish in (Arabic-Based) script | |
| Kumar et al. | An automated invoice handling method using OCR | |
| Kumar et al. | Lipi Gnani: a versatile OCR for documents in any language printed in Kannada script | |
| Mathew et al. | Towards deployable OCR models for indic languages | |
| Ruldeviyani et al. | Character recognition system for pegon typed manuscript | |
| CN119129529A (en) | PDF document conversion method, device, equipment, storage medium and product | |
| Memon et al. | Glyph identification and character recognition for Sindhi OCR | |
| Gautam et al. | The dataset for printed Brahmi word recognition | |
| Ashraf et al. | An analysis of optical character recognition (ocr) methods | |
| Nom et al. | Cross-Lingual Learning for Low-Resource Khmer Scene Text Detection and Recognition | |
| Nom et al. | WildKhmerST: a comprehensive dataset and benchmark for khmer scene text detection and recognition in the wild | |
| Sasirekha et al. | Text extraction from PDF document | |
| Sen et al. | BYANJON: a ground truth preparation system for online handwritten Bangla documents |