Kondrak, 2001 - Google Patents
Identifying cognates by phonetic and semantic similarityKondrak, 2001
View PDF- Document ID
- 4175948356455235394
- Author
- Kondrak G
- Publication year
- Publication venue
- Second Meeting of the North American Chapter of the Association for Computational Linguistics
External Links
Snippet
I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than “orthographic” measures, such as the Longest Common Subsequence Ratio (LCSR) or …
- 238000000034 method 0 abstract description 5
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kondrak | Identifying cognates by phonetic and semantic similarity | |
CN106598937B (en) | Language Identification, device and electronic equipment for text | |
Sporleder et al. | Unsupervised recognition of literal and non-literal use of idiomatic expressions | |
Mackay et al. | Computing word similarity and identifying cognates with Pair Hidden Markov Models | |
WO1997004405A9 (en) | Method and apparatus for automated search and retrieval processing | |
JP2008262587A (en) | Example based machine translation system | |
Darwish et al. | Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging. | |
Adouane et al. | Identification of languages in Algerian Arabic multilingual documents | |
Tedeschi et al. | ID10M: Idiom identification in 10 languages | |
Bedrick et al. | Robust kaomoji detection in Twitter | |
Charoenpornsawat et al. | Automatic sentence break disambiguation for Thai | |
Frunza et al. | Identification and disambiguation of cognates, false friends, and partial cognates using machine learning techniques | |
Vikram et al. | Development of prototype morphological analyzer for he south indian language of kannada | |
Loftsson et al. | Developing a PoS-tagged corpus using existing tools | |
Kondrak | Combining evidence in cognate identification | |
US20110106849A1 (en) | New case generation device, new case generation method, and new case generation program | |
Parida et al. | Translating short segments with nmt: A case study in english-to-hindi | |
Sharma et al. | Improving existing punjabi grammar checker | |
Graham | Using natural language processing to search for textual references | |
Gardner et al. | Automatic link detection: a sequence labeling approach | |
Kulick et al. | Parsing Early Modern English for Linguistic Search | |
Frunza | Automatic identification of cognates, false friends, and partial cognates | |
Garcia et al. | A Method to Automatically Identify Diachronic Variation in Collocations. | |
Tyrkkö et al. | Semi-automatic discovery of multilingual elements in English historical corpora: Methods and challenges | |
Hurskainen | Optimizing disambiguation in Swahili |