CN102637205B - 一种基于Hadoop的文档分类方法 - Google Patents
一种基于Hadoop的文档分类方法 Download PDFInfo
- Publication number
- CN102637205B CN102637205B CN201210072522.3A CN201210072522A CN102637205B CN 102637205 B CN102637205 B CN 102637205B CN 201210072522 A CN201210072522 A CN 201210072522A CN 102637205 B CN102637205 B CN 102637205B
- Authority
- CN
- China
- Prior art keywords
- key
- document
- word
- value
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 3
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 238000013398 bayesian method Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210072522.3A CN102637205B (zh) | 2012-03-19 | 2012-03-19 | 一种基于Hadoop的文档分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210072522.3A CN102637205B (zh) | 2012-03-19 | 2012-03-19 | 一种基于Hadoop的文档分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102637205A CN102637205A (zh) | 2012-08-15 |
CN102637205B true CN102637205B (zh) | 2014-10-15 |
Family
ID=46621599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210072522.3A Active CN102637205B (zh) | 2012-03-19 | 2012-03-19 | 一种基于Hadoop的文档分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102637205B (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103713885A (zh) * | 2013-12-27 | 2014-04-09 | 中国科学院计算机网络信息中心 | 一种面向多核集群的smo并行处理方法 |
CN105938561A (zh) * | 2016-04-13 | 2016-09-14 | 南京大学 | 一种基于典型相关性分析的计算机数据属性约简方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1339756A (zh) * | 2000-08-23 | 2002-03-13 | 松下电器产业株式会社 | 文档检索和分类方法及其装置 |
CN1452098A (zh) * | 2002-04-19 | 2003-10-29 | 株式会社日立制作所 | 文档分类系统及其实现程序 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE396943T1 (de) * | 2004-06-04 | 2008-06-15 | Rue De Int Ltd | Sortierverfahren für dokumente |
-
2012
- 2012-03-19 CN CN201210072522.3A patent/CN102637205B/zh active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1339756A (zh) * | 2000-08-23 | 2002-03-13 | 松下电器产业株式会社 | 文档检索和分类方法及其装置 |
CN1452098A (zh) * | 2002-04-19 | 2003-10-29 | 株式会社日立制作所 | 文档分类系统及其实现程序 |
Non-Patent Citations (2)
Title |
---|
向小军等.基于Hadoop平台的海量文本分类的并行化.《计算机科学》.2011,第38卷(第10期),第184~188页. |
基于Hadoop平台的海量文本分类的并行化;向小军等;《计算机科学》;20111031;第38卷(第10期);第184~188页 * |
Also Published As
Publication number | Publication date |
---|---|
CN102637205A (zh) | 2012-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330446B (zh) | 一种面向图像分类的深度卷积神经网络的优化方法 | |
US10719664B2 (en) | Cross-media search method | |
Xie et al. | Comparison among dimensionality reduction techniques based on Random Projection for cancer classification | |
CN105117429A (zh) | 基于主动学习和多标签多示例学习的场景图像标注方法 | |
CN103886048B (zh) | 一种基于聚类的增量数字图书推荐方法 | |
CN104239554A (zh) | 跨领域跨类别的新闻评论情绪预测方法 | |
CN101770580B (zh) | 一种跨领域的文本情感分类器的训练方法和分类方法 | |
CN107832353A (zh) | 一种社交媒体平台虚假信息识别方法 | |
CN105631482A (zh) | 一种基于卷积神经网络模型的危险物品图像分类方法 | |
CN103425996B (zh) | 一种并行分布式的大规模图像识别方法 | |
CN104331506A (zh) | 一种面向双语微博文本的多类情感分析方法与系统 | |
CN102289522A (zh) | 一种对于文本智能分类的方法 | |
CN102298646A (zh) | 一种主观文本和客观文本分类方法及装置 | |
CN104881689A (zh) | 一种多标签主动学习分类方法及系统 | |
CN104463208A (zh) | 组合标记规则的多视图协同半监督分类算法 | |
CN105005794A (zh) | 融合多粒度上下文信息的图像像素语义标注方法 | |
CN103412878B (zh) | 基于领域知识地图社区结构的文档主题划分方法 | |
CN113282701B (zh) | 作文素材生成方法、装置、电子设备及可读存储介质 | |
CN106959946A (zh) | 一种基于深度学习的文本语义特征生成优化方法 | |
CN110727758A (zh) | 一种基于多长度文本向量拼接的舆情分析方法及其系统 | |
CN106776740A (zh) | 一种基于卷积神经网络的社交网络文本聚类方法 | |
CN107292348A (zh) | 一种Bagging_BSJ短文本分类方法 | |
CN105184322B (zh) | 一种基于增量集成学习的多时相影像分类方法 | |
CN102637205B (zh) | 一种基于Hadoop的文档分类方法 | |
CN103258212A (zh) | 基于吸引子传播聚类的半监督集成遥感影像分类方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20161026 Address after: No. 163 Nanjing City, Jiangsu province 210046 Xianlin Avenue Patentee after: Nanjing University Address before: No. 163 Nanjing City, Jiangsu province 210046 Xianlin Avenue Patentee before: Nanjing University Patentee before: Jiangyin Institute of Information Technology of Nanjing University |
|
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20120815 Assignee: Xiamen Nebula sea Mdt InfoTech Ltd Assignor: Nanjing Univ. Contract record no.: 2016320000239 Denomination of invention: Document classification method based on Hadoop Granted publication date: 20141015 License type: Exclusive License Record date: 20161228 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model |