Héja, 2010 - Google Patents
The Role of Parallel Corpora in Bilingual Lexicography.Héja, 2010
View PDF- Document ID
- 10117826592182191070
- Author
- Héja E
- Publication year
- Publication venue
- LREC
External Links
Snippet
This paper describes an approach based on word alignment on parallel corpora, which aims at facilitating the lexicographic work of dictionary building. Although this method has been widely used in the MT community for at least 16 years, as far as we know, it has not been …
- 230000014616 translation 0 abstract description 30
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Banea et al. | A bootstrapping method for building subjectivity lexicons for languages with scarce resources. | |
Vyas et al. | Pos tagging of english-hindi code-mixed social media content | |
JP3906356B2 (en) | Syntax analysis method and apparatus | |
Othman et al. | English-asl gloss parallel corpus 2012: Aslg-pc12 | |
Volk et al. | Machine translation of TV subtitles for large scale production | |
Mititelu et al. | CoRoLa―The Reference Corpus of Contemporary Romanian Language. | |
Gupta et al. | Problems with automating translation of movie/tv show subtitles | |
Dayter | Collocations in non-interpreted and simultaneously interpreted English: a corpus study | |
Popović | On reducing translation shifts in translations intended for MT evaluation | |
Héja | The Role of Parallel Corpora in Bilingual Lexicography. | |
Popović | Evaluating conjunction disambiguation on English-to-German and French-to-German WMT 2019 translation hypotheses | |
Sanjaya et al. | Analysis of Category Shift on Emma Heesters’s Cover Song Lyrics on Youtube | |
Li et al. | Uzbek-English and Turkish-English morpheme alignment corpora | |
Jian et al. | TANGO: Bilingual collocational concordancer | |
Marujo et al. | BP2EP-adaptation of Brazilian Portuguese texts to European Portuguese | |
Skadiņa et al. | Latvian Language in the Digital Age: The Main Achievements in the Last Decade. | |
Volk | The automatic translation of film subtitles. A machine translation success story? | |
Hamed et al. | A survey of code-switched Arabic NLP: Progress, challenges, and future directions | |
Héja et al. | Dictionary building based on parallel corpora and word alignment | |
Arkhangelskiy et al. | Sound-aligned corpus of Udmurt dialectal texts | |
Srdanović | Corpus-based collocation research targeted at Japanese language learners | |
Weller-Di Marco et al. | Modeling complement types in phrase-based smt | |
Burchardt et al. | Machine translation quality in an audiovisual context | |
Song et al. | Entity Translation and Alignment in the ACE-07 ET Task. | |
Héja et al. | An online dictionary browser for automatically generated bilingual dictionaries |