Kondrak, 2001 - Google Patents

Identifying cognates by phonetic and semantic similarity

Kondrak, 2001

Document ID: 4175948356455235394
Author: Kondrak G
Publication year: 2001
Publication venue: Second Meeting of the North American Chapter of the Association for Computational Linguistics

External Links

Cited by

Snippet

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than “orthographic” measures, such as the Longest Common Subsequence Ratio (LCSR) or …

Continue reading at aclanthology.org (PDF) (other versions)

238000000034 method 0 abstract description 5

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image

Similar Documents

Publication	Publication Date	Title
Kondrak	2001	Identifying cognates by phonetic and semantic similarity
CN106598937B (en)	2019-10-18	Language Identification, device and electronic equipment for text
Sporleder et al.	2009	Unsupervised recognition of literal and non-literal use of idiomatic expressions
Mackay et al.	2005	Computing word similarity and identifying cognates with Pair Hidden Markov Models
WO1997004405A9 (en)	1997-07-31	Method and apparatus for automated search and retrieval processing
JP2008262587A (en)	2008-10-30	Example based machine translation system
Darwish et al.	2014	Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging.
Adouane et al.	2017	Identification of languages in Algerian Arabic multilingual documents
Tedeschi et al.	2022	ID10M: Idiom identification in 10 languages
Bedrick et al.	2012	Robust kaomoji detection in Twitter
Charoenpornsawat et al.	2001	Automatic sentence break disambiguation for Thai
Frunza et al.	2009	Identification and disambiguation of cognates, false friends, and partial cognates using machine learning techniques
Vikram et al.	2007	Development of prototype morphological analyzer for he south indian language of kannada
Loftsson et al.	2010	Developing a PoS-tagged corpus using existing tools
Kondrak	2004	Combining evidence in cognate identification
US20110106849A1 (en)	2011-05-05	New case generation device, new case generation method, and new case generation program
Parida et al.	2018	Translating short segments with nmt: A case study in english-to-hindi
Sharma et al.	2016	Improving existing punjabi grammar checker
Graham	2019	Using natural language processing to search for textual references
Gardner et al.	2009	Automatic link detection: a sequence labeling approach
Kulick et al.	2020	Parsing Early Modern English for Linguistic Search
Frunza	2006	Automatic identification of cognates, false friends, and partial cognates
Garcia et al.	2019	A Method to Automatically Identify Diachronic Variation in Collocations.
Tyrkkö et al.	2017	Semi-automatic discovery of multilingual elements in English historical corpora: Methods and challenges
Hurskainen	2004	Optimizing disambiguation in Swahili