Bao, 2018 - Google Patents
Design and implementation of Cyrillic Mongolian syllable text corpus systemBao, 2018
- Document ID
- 9302219875947329313
- Author
- Bao Q
- Publication year
- Publication venue
- Information, Computer and Application Engineering
External Links
Snippet
Based on the building of Cyrillic Mongolian corpus, this paper makes statistics of relevant data on the word and syllable levels of Cyrillic Mongolian. At the word level, this paper respectively makes statistics of word length distribution with syllable and character as the …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/273—Orthographic correction, e.g. spelling checkers, vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saad et al. | Osac: Open source arabic corpora | |
CN107169067A (en) | The dictionary picking up method and system of a kind of utilization speech polling Chinese character | |
Qu et al. | Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation | |
Bharadwaja Kumar et al. | Statistical analyses of Telugu text corpora | |
Alotaiby et al. | Processing large Arabic text corpora: Preliminary analysis and results | |
Jiao et al. | Chinese keyword extraction based on N-gram and word co-occurrence | |
CN103164397A (en) | Chinese-Kazakh electronic dictionary and automatic translating Chinese- Kazakh method thereof | |
CN106294310B (en) | A kind of Tibetan language tone prediction technique and system | |
CN103164395A (en) | Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof | |
Bao | Design and implementation of Cyrillic Mongolian syllable text corpus system | |
Khurshudyan et al. | Eastern Armenian national corpus: State of the art and perspectives | |
Khaltar et al. | Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary | |
CN112487762A (en) | Natural language processing method based on Chinese character pronunciation and meaning structure Chinese character coding | |
Al-Zyoud et al. | Arabic stemming techniques: comparisons and new vision | |
Michos et al. | An empirical text categorizing computational model based on stylistic aspects | |
Nahli et al. | Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects | |
Minghu et al. | Segmentation of Mandarin Braille word and Braille translation based on multi-knowledge | |
Dash et al. | Why do we need to develop corpora in Indian languages | |
Chen et al. | Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach. | |
Belkhouche et al. | Analysis of primary school Arabic language textbooks | |
US20240394283A1 (en) | Phonology-centric searching | |
Aksan et al. | A corpus-based word frequency list of Turkish: Evidence from the subcorpora of Turkish National Corpus project | |
Meftouh et al. | Modeling Arabic Language using statistical methods | |
Ashti Afrasyaw et al. | A Brief Overview of Kurdish Natural Language Processing | |
Michos et al. | Categorizing texts by using a three level functional style description |