Garabík, 2021 - Google Patents
Chinese language word embeddings based on the corpus HankuGarabík, 2021
View PDF- Document ID
- 8763793741444106428
- Author
- Garabík R
- Publication year
- Publication venue
- Jazykovedný časopis
External Links
Snippet
Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from" western" language models …
- 238000003058 natural language processing 0 abstract description 9
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/273—Orthographic correction, e.g. spelling checkers, vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
- G09B19/08—Printed or written appliances, e.g. text books, bilingual letter assemblies, charts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/018—Input/output arrangements for oriental characters
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dahlmeier et al. | Correcting semantic collocation errors with L1-induced paraphrases | |
Nguyen et al. | Korean-vietnamese neural machine translation system with Korean morphological analysis and word sense disambiguation | |
Surana et al. | A more discerning and adaptable multilingual transliteration mechanism for indian languages | |
Aswani et al. | A hybrid approach to align sentences and words in English-Hindi parallel corpora | |
Kang | Spoken language to sign language translation system based on HamNoSys | |
CN102541837A (en) | Method for correcting inputted Chinese characters | |
Alotaiby et al. | Arabic vs. English: Comparative statistical study | |
Jamro | Sindhi language processing: A survey | |
Phadte et al. | Towards normalising Konkani-English code-mixed social media text | |
Spina | The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment. | |
Garabík | Chinese language word embeddings based on the corpus Hanku | |
Nowakowski et al. | Applying support vector machines to POS tagging of the Ainu language | |
Rajendran et al. | Text processing for developing unrestricted Tamil text to speech synthesis system | |
Abumalloh et al. | Building Arabic corpus applied to part-of-speech tagging | |
Lazareva et al. | Technology for mastering russian vocabulary by chinese students in the field of international trade | |
Lau et al. | The construction of a large-scale Hong Kong Chinese lexicon with multilingual translations for Chinese-as-an-Additional-Language (CAL) students | |
Liu | The technical analyses of named entity translation | |
Li | From Mandarin to Cantonese Lexicography: A genealogical study of Robert Morrison’s Vocabulary of the Canton Dialect (1828) | |
Dutta et al. | System for identification and analysis of reduplication words in Hindi corpus | |
Huu et al. | Integrating pronunciation into Chinese-Vietnamese statistical machine translation | |
Moghadam et al. | A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian | |
Yadav et al. | Normalization of Spelling Variations in Code-Mixed Data | |
Garabík et al. | A cross linguistic database of children's printed words in three Slavic languages | |
Petrovčič et al. | The New Chinese Corpus of Literary Texts Litchi | |
Odinye | Mandarin Chinese Pinyin: Pronunciation, Orthography and Tone |