Garabík, 2021 - Google Patents

Chinese language word embeddings based on the corpus Hanku

Garabík, 2021

Document ID: 8763793741444106428
Author: Garabík R
Publication year: 2021
Publication venue: Jazykovedný časopis

External Links

Cited by

Snippet

Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from" western" language models …

Continue reading at intapi.sciendo.com (PDF) (other versions)

238000003058 natural language processing 0 abstract description 9

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/273—Orthographic correction, e.g. spelling checkers, vowelisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
- G09B19/08—Printed or written appliances, e.g. text books, bilingual letter assemblies, charts
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/018—Input/output arrangements for oriental characters

Similar Documents

Publication	Publication Date	Title
Dahlmeier et al.	2011	Correcting semantic collocation errors with L1-induced paraphrases
Nguyen et al.	2019	Korean-vietnamese neural machine translation system with Korean morphological analysis and word sense disambiguation
Surana et al.	2008	A more discerning and adaptable multilingual transliteration mechanism for indian languages
Aswani et al.	2005	A hybrid approach to align sentences and words in English-Hindi parallel corpora
Kang	2019	Spoken language to sign language translation system based on HamNoSys
CN102541837A (en)	2012-07-04	Method for correcting inputted Chinese characters
Alotaiby et al.	2014	Arabic vs. English: Comparative statistical study
Jamro	2017	Sindhi language processing: A survey
Phadte et al.	2017	Towards normalising Konkani-English code-mixed social media text
Spina	2010	The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment.
Garabík	2021	Chinese language word embeddings based on the corpus Hanku
Nowakowski et al.	2019	Applying support vector machines to POS tagging of the Ainu language
Rajendran et al.	2015	Text processing for developing unrestricted Tamil text to speech synthesis system
Abumalloh et al.	2016	Building Arabic corpus applied to part-of-speech tagging
Lazareva et al.	2020	Technology for mastering russian vocabulary by chinese students in the field of international trade
Lau et al.	2023	The construction of a large-scale Hong Kong Chinese lexicon with multilingual translations for Chinese-as-an-Additional-Language (CAL) students
Liu	2015	The technical analyses of named entity translation
Li	2022	From Mandarin to Cantonese Lexicography: A genealogical study of Robert Morrison’s Vocabulary of the Canton Dialect (1828)
Dutta et al.	2016	System for identification and analysis of reduplication words in Hindi corpus
Huu et al.	2018	Integrating pronunciation into Chinese-Vietnamese statistical machine translation
Moghadam et al.	2021	A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
Yadav et al.	2022	Normalization of Spelling Variations in Code-Mixed Data
Garabík et al.	2007	A cross linguistic database of children's printed words in three Slavic languages
Petrovčič et al.	2020	The New Chinese Corpus of Literary Texts Litchi
Odinye	2020	Mandarin Chinese Pinyin: Pronunciation, Orthography and Tone