[go: up one dir, main page]

Garabík, 2021 - Google Patents

Chinese language word embeddings based on the corpus Hanku

Garabík, 2021

View PDF
Document ID
8763793741444106428
Author
Garabík R
Publication year
Publication venue
Jazykovedný časopis

External Links

Snippet

Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from" western" language models …
Continue reading at intapi.sciendo.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • G06F17/2827Example based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/3066Query translation
    • G06F17/30669Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2217Character encodings
    • G06F17/2223Handling non-latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/273Orthographic correction, e.g. spelling checkers, vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/211Formatting, i.e. changing of presentation of document
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • G09B19/08Printed or written appliances, e.g. text books, bilingual letter assemblies, charts
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters

Similar Documents

Publication Publication Date Title
Dahlmeier et al. Correcting semantic collocation errors with L1-induced paraphrases
Nguyen et al. Korean-vietnamese neural machine translation system with Korean morphological analysis and word sense disambiguation
Surana et al. A more discerning and adaptable multilingual transliteration mechanism for indian languages
Aswani et al. A hybrid approach to align sentences and words in English-Hindi parallel corpora
Kang Spoken language to sign language translation system based on HamNoSys
CN102541837A (en) Method for correcting inputted Chinese characters
Alotaiby et al. Arabic vs. English: Comparative statistical study
Jamro Sindhi language processing: A survey
Phadte et al. Towards normalising Konkani-English code-mixed social media text
Spina The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment.
Garabík Chinese language word embeddings based on the corpus Hanku
Nowakowski et al. Applying support vector machines to POS tagging of the Ainu language
Rajendran et al. Text processing for developing unrestricted Tamil text to speech synthesis system
Abumalloh et al. Building Arabic corpus applied to part-of-speech tagging
Lazareva et al. Technology for mastering russian vocabulary by chinese students in the field of international trade
Lau et al. The construction of a large-scale Hong Kong Chinese lexicon with multilingual translations for Chinese-as-an-Additional-Language (CAL) students
Liu The technical analyses of named entity translation
Li From Mandarin to Cantonese Lexicography: A genealogical study of Robert Morrison’s Vocabulary of the Canton Dialect (1828)
Dutta et al. System for identification and analysis of reduplication words in Hindi corpus
Huu et al. Integrating pronunciation into Chinese-Vietnamese statistical machine translation
Moghadam et al. A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
Yadav et al. Normalization of Spelling Variations in Code-Mixed Data
Garabík et al. A cross linguistic database of children's printed words in three Slavic languages
Petrovčič et al. The New Chinese Corpus of Literary Texts Litchi
Odinye Mandarin Chinese Pinyin: Pronunciation, Orthography and Tone