Peng et al., 2014 - Google Patents
PU text classification enhanced by term frequency–inverse document frequency‐improved weightingPeng et al., 2014
- Document ID
- 18141893391316333065
- Author
- Peng T
- Liu L
- Zuo W
- Publication year
- Publication venue
- Concurrency and computation: practice and experience
External Links
Snippet
Term frequency–inverse document frequency (TF–IDF), one of the most popular feature (also called term or word) weighting methods used to describe documents in the vector space model and the applications related to text mining and information retrieval, can …
- 238000004422 calculation algorithm 0 abstract description 37
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
- G06F17/30595—Relational databases
- G06F17/30598—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Peng et al. | PU text classification enhanced by term frequency–inverse document frequency‐improved weighting | |
| Alsmadi et al. | Term weighting scheme for short-text classification: Twitter corpuses | |
| Moldagulova et al. | Using KNN algorithm for classification of textual documents | |
| CN107992633B (en) | Method and system for automatic classification of electronic documents based on keyword features | |
| Petkova et al. | Hierarchical language models for expert finding in enterprise corpora | |
| Kalaivani et al. | Sentiment classification of movie reviews by supervised machine learning approaches | |
| Basavaraju et al. | A novel method of spam mail detection using text based clustering approach | |
| US20090006391A1 (en) | Automatic categorization of document through tagging | |
| Duwairi et al. | Feature reduction techniques for Arabic text categorization | |
| Hsiao et al. | An incremental cluster-based approach to spam filtering | |
| Asim et al. | Comparison of feature selection methods in text classification on highly skewed datasets | |
| Alaa | A comparative study on Arabic text classification | |
| Khan et al. | Lifelong aspect extraction from big data: knowledge engineering | |
| Rahman et al. | Text classification using the concept of association rule of data mining | |
| Dommati et al. | Bug Classification: Feature Extraction and Comparison of Event Model using Na\" ive Bayes Approach | |
| Murthy et al. | A comparative study on term weighting methods for automated telugu text categorization with effective classifiers | |
| Isabella et al. | Analysis and evaluation of Feature selectors in opinion mining | |
| Badawi et al. | Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization | |
| Lahiri et al. | Learning from litigation: Graphs and llms for retrieval and reasoning in ediscovery | |
| Govindarajan | A novel framework for evaluating the software project management efficiency–an artificial intelligence approach | |
| Kamruzzaman et al. | Text categorization using association rule and naive Bayes classifier | |
| Ado et al. | Comparative analysis of integrating multiple filter-based feature selection methods using vector magnitude score on text classification | |
| Rosen | E-mail Classification in the Haystack Framework | |
| Lee et al. | A comparative study on statistical machine learning algorithms and thresholding strategies for automatic text categorization | |
| Decherchi et al. | K-means clustering for content-based document management in intelligence |