Castro et al., 2017 - Google Patents

Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties

Castro et al., 2017

Document ID: 2894885897539103061
Author: Castro D; Souza E; Vitório D; Santos D; Oliveira A
Publication year: 2017
Publication venue: Applied Soft Computing

External Links

Cited by

Snippet

Identifying the language of a text is an important step for several natural language processing applications. State-of-the-art language identification (LID) systems perform very well when discriminating between unrelated languages on standard datasets. However, the …

Continue reading at www.sciencedirect.com (other versions)

238000003058 natural language processing 0 abstract description 9

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation

Similar Documents

Publication	Publication Date	Title
Chen et al.	2022	A comparative study of automated legal text classification using random forests and deep learning
Castro et al.	2017	Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties
Rangel et al.	2016	A low dimensionality representation for language variety identification
Sazzed et al.	2019	A sentiment classification in bengali and machine translated english corpus
Poon et al.	2009	Unsupervised morphological segmentation with log-linear models
Zhang et al.	2014	Authorship identification from unstructured texts
Pranckevičius et al.	2016	Application of logistic regression with part-of-the-speech tagging for multi-class text classification
Fourkioti et al.	2019	Language models and fusion for authorship attribution
US9355372B2 (en)	2016-05-31	Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus
Hande et al.	2021	Offensive language identification in low-resourced code-mixed dravidian languages using pseudo-labeling
Suleiman et al.	2018	Comparative study of word embeddings models and their usage in Arabic language applications
Atia et al.	2015	Increasing the accuracy of opinion mining in Arabic
CA2917153A1 (en)	2015-01-08	Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus
Jayakrishnan et al.	2018	Multi-class emotion detection and annotation in Malayalam novels
Ljubešić et al.	2015	Discriminating between closely related languages on twitter
Utomo et al.	2019	Text classification of british english and American english using support vector machine
Balazevic et al.	2016	Language detection for short text messages in social media
Al-Thubaity et al.	2020	Arabic diacritization using bidirectional long short-term memory neural networks with conditional random fields
Kolchyna et al.	2015	Methodology for twitter sentiment analysis
Joo et al.	2019	Author profiling on social media: An ensemble learning model using various features
Hövelmann et al.	2017	Fasttext and gradient boosted trees at GermEval-2017 on relevance classification and document-level polarity
Osterrieder	2023	A primer on natural language processing for finance
Mridha et al.	2019	Semantic error detection and correction in Bangla sentence
pal Singh et al.	2018	Naive Bayes classifier for word sense disambiguation of Punjabi language
Muthukumaran et al.	2017	Text analysis for product reviews for sentiment analysis using NLP methods