Oluwaseyi et al., 2024 - Google Patents

Automatic spelling corrector for yorubá language using edit distance and n-gram language models

Oluwaseyi et al., 2024

Document ID: 12761878395526956345
Author: Oluwaseyi E; Abiodun O; Badeji-Ajisafe B; et al.
Publication year: 2024
Publication venue: 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG)

External Links

Cited by

Snippet

The lack of tools and resources to support higher-level Natural Language Processing (NLP) tasks for African languages has been a significant obstacle to developing NLP research in Africa. This research proposes an automatic spell correction for the Yoruba language. The …

Continue reading at ieeexplore.ieee.org (other versions)

238000012937 correction 0 abstract description 48

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling

Similar Documents

Publication	Publication Date	Title
Sen et al.	2022	Bangla natural language processing: A comprehensive analysis of classical, machine learning, and deep learning-based methods
Benajiba et al.	2009	Arabic named entity recognition: A feature-driven study
Gupta et al.	2013	A survey of common stemming techniques and existing stemmers for indian languages
Azmi et al.	2019	Real-word errors in Arabic texts: A better algorithm for detection and correction
Kumar et al.	2010	Part of speech taggers for morphologically rich indian languages: a survey
Jabbar et al.	2018	An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach
Virpioja et al.	2011	Empirical comparison of evaluation methods for unsupervised learning of morphology
Dutta et al.	2015	Text normalization in code-mixed social media text
Mishra et al.	2013	A survey of spelling error detection and correction techniques
Mosavi Miangah	2014	FarsiSpell: A spell-checking system for Persian using a large monolingual corpus
Etaiwi et al.	2017	Statistical Arabic name entity recognition approaches: A survey
Jain et al.	2018	“UTTAM” an efficient spelling correction system for hindi language based on supervised learning
Cing et al.	2020	Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language
Wong et al.	2014	iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
Sen et al.	2021	Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods
Onyenwe et al.	2019	Toward an effective igbo part-of-speech tagger
López et al.	2015	Experiments on sentence boundary detection in user-generated web content
Pal et al.	2020	Vartani Spellcheck--Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance
Büyük et al.	2021	Learning from mistakes: Improving spelling correction performance with automatic generation of realistic misspellings
Bhat	2012	Morpheme segmentation for kannada standing on the shoulder of giants
Azmi et al.	2021	Light diacritic restoration to disambiguate homographs in modern Arabic texts
Oluwaseyi et al.	2024	Automatic spelling corrector for yorubá language using edit distance and n-gram language models
Mittra et al.	2019	A bangla spell checking technique to facilitate error correction in text entry environment
Shirko	2020	Part of speech tagging for wolaita language using transformation based learning (tbl) approach
Mehmood	2021	On multi-domain sentence level sentiment analysis for Roman Urdu