Sproat, 2010 - Google Patents

Lightly supervised learning of text normalization: Russian number names

Sproat, 2010

Document ID: 1505993940776950666
Author: Sproat R
Publication year: 2010
Publication venue: 2010 IEEE Spoken Language Technology Workshop

External Links

Cited by

Snippet

Most areas of natural language processing today make heavy use of automatic inference from large corpora. One exception is text-normalization for such applications as text-to- speech synthesis, where it is still the norm to build grammars by hand for such tasks as …

Continue reading at citeseerx.ist.psu.edu (PDF) (other versions)

238000010606 normalization 0 title abstract description 6

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/02—Input arrangements using manually operated switches, e.g. using keyboards or dials
- G06F3/023—Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
- G06F3/0233—Character input methods
- G06F3/0237—Character input methods using prediction or retrieval techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German

Similar Documents

Publication	Publication Date	Title
US20210157975A1 (en)	2021-05-27	Device, system, and method for extracting named entities from sectioned documents
Issar	1996	Estimation of language models for new spoken language applications
US20050226512A1 (en)	2005-10-13	Character string identification
Kumar et al.	2010	Part of speech taggers for morphologically rich indian languages: a survey
CN106570180A (en)	2017-04-19	Artificial intelligence based voice searching method and device
Dinarelli et al.	2011	Discriminative reranking for spoken language understanding
Sproat	2010	Lightly supervised learning of text normalization: Russian number names
WO2018206784A1 (en)	2018-11-15	Fault-tolerant information extraction
Onyenwe et al.	2019	Toward an effective igbo part-of-speech tagger
Tufiş et al.	2008	DIAC+: A professional diacritics recovering system
Dey et al.	2013	Named entity recognition using gazetteer method and n-gram technique for an inflectional language: A hybrid approach
Elshafei et al.	2006	Machine Generation of Arabic Diacritical Marks.
Melero et al.	2012	Holaaa!! writin like u talk is kewl but kinda hard 4 NLP
Tufiş et al.	2004	Extracting multilingual lexicons from parallel corpora
Alkahtani	2015	Building and verifying parallel corpora between Arabic and English
Ahmadi	2021	Hunspell for Sorani Kurdish spell checking and morphological analysis
Daya et al.	2008	Identifying semitic roots: Machine learning with linguistic constraints
US20140093173A1 (en)	2014-04-03	Classifying a string formed from hand-written characters
JP2008059389A (en)	2008-03-13	Vocabulary candidate output system, vocabulary candidate output method, and vocabulary candidate output program
Olinsky et al.	2000	Non-standard word and homograph resolution for asian language text analysis.
Sharma et al.	2016	Improving existing punjabi grammar checker
CN110162617B (en)	2022-11-04	Method, apparatus, language processing engine and medium for extracting summary information
Daya et al.	2007	Learning to identify Semitic roots
Naz et al.	2015	A hybrid approach for NER system for scarce resourced language-URDU: Integrating n-gram with rules and gazetteers
Megyesi	1999	Brill’s PoS tagger with extended lexical templates for Hungarian