Kirov et al., 2024 - Google Patents

Context-aware transliteration of romanized south asian languages

Kirov et al., 2024

Document ID: 3105610305272418493
Author: Kirov C; Johny C; Katanova A; Gutkin A; Roark B
Publication year: 2024
Publication venue: Computational Linguistics

External Links

Cited by

Snippet

While most transliteration research is focused on single tokens such as named entities—for example, transliteration of from the Gujarati script to the Latin script “Ahmedabad” footnoteThe most populous city in the Indian state of Gujarat. the informal romanization …

Continue reading at direct.mit.edu (HTML) (other versions)

238000013515 script 0 abstract description 145

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2755—Morphological analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker

Similar Documents

Publication	Publication Date	Title
Roark et al.	2020	Processing South Asian languages written in the Latin script: the Dakshina dataset
Nelken et al.	2005	Arabic diacritization using weighted finite-state transducers
WO2009035863A2 (en)	2009-03-19	Mining bilingual dictionaries from monolingual web pages
Said et al.	2013	A hybrid approach for Arabic diacritization
Masmoudi et al.	2019	Transliteration of Arabizi into Arabic script for Tunisian dialect
Scherrer et al.	2016	Modernising historical Slovene words
Hasegawa-Johnson et al.	2020	Grapheme-to-phoneme transduction for cross-language ASR
Younes et al.	2022	Romanized tunisian dialect transliteration using sequence labelling techniques
Kirov et al.	2024	Context-aware transliteration of romanized south asian languages
Li et al.	2012	Improving text normalization using character-blocks based models and system combination
Uthayamoorthy et al.	2019	Ddspell-a data driven spell checker and suggestion generator for the tamil language
JP2002149643A (en)	2002-05-24	Method for predicting reading of japanese ideogram
de Silva et al.	2021	Singlish to sinhala transliteration using rule-based approach
Lyons	2020	A review of Thai–English machine translation
Scherrer et al.	2018	New developments in tagging pre-modern orthodox Slavic texts
Al-Mannai et al.	2014	Unsupervised word segmentation improves dialectal Arabic to English machine translation
Karim et al.	2021	On the training of deep neural networks for automatic arabic-text diacritization
JP2010244385A (en)	2010-10-28	Machine translation device, machine translation method, and program
Amri et al.	2019	Amazigh POS tagging using TreeTagger: a language independant model
Lu et al.	2019	An automatic spelling correction method for classical mongolian
US8977538B2 (en)	2015-03-10	Constructing and analyzing a word graph
JP2022515048A (en)	2022-02-17	Transliteration for speech recognition training and scoring
Núñez et al.	2019	Phonetic normalization for machine translation of user generated content
Salam et al.	2019	Developing the bangladeshi national corpus-a balanced and representative bangla corpus
Pretkalniņa et al.	2012	Making historical Latvian texts more intelligible to contemporary readers