AU1907300A - Term-length term-frequency method for measuring document similarity and classifying text - Google Patents
Term-length term-frequency method for measuring document similarity and classifying textInfo
- Publication number
- AU1907300A AU1907300A AU19073/00A AU1907300A AU1907300A AU 1907300 A AU1907300 A AU 1907300A AU 19073/00 A AU19073/00 A AU 19073/00A AU 1907300 A AU1907300 A AU 1907300A AU 1907300 A AU1907300 A AU 1907300A
- Authority
- AU
- Australia
- Prior art keywords
- term
- frequency method
- length
- document similarity
- classifying text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US20156998A | 1998-11-30 | 1998-11-30 | |
| US09201569 | 1998-11-30 | ||
| PCT/US1999/025686 WO2000033215A1 (en) | 1998-11-30 | 1999-11-01 | Term-length term-frequency method for measuring document similarity and classifying text |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| AU1907300A true AU1907300A (en) | 2000-06-19 |
Family
ID=22746357
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| AU19073/00A Abandoned AU1907300A (en) | 1998-11-30 | 1999-11-01 | Term-length term-frequency method for measuring document similarity and classifying text |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU1907300A (en) |
| WO (1) | WO2000033215A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105956010A (en) * | 2016-04-20 | 2016-09-21 | 浙江大学 | Distributed information retrieval set selection method based on distributed representation and local ordering |
Families Citing this family (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3573688B2 (en) * | 2000-06-28 | 2004-10-06 | 松下電器産業株式会社 | Similar document search device and related keyword extraction device |
| AUPR208000A0 (en) * | 2000-12-15 | 2001-01-11 | 80-20 Software Pty Limited | Method of document searching |
| US7412453B2 (en) | 2002-12-30 | 2008-08-12 | International Business Machines Corporation | Document analysis and retrieval |
| EP1462950B1 (en) * | 2003-03-27 | 2007-08-29 | Sony Deutschland GmbH | Method for language modelling |
| US7321880B2 (en) | 2003-07-02 | 2008-01-22 | International Business Machines Corporation | Web services access to classification engines |
| WO2007057809A2 (en) * | 2005-11-15 | 2007-05-24 | Koninklijke Philips Electronics N.V. | Method of obtaining a representation of a text |
| JP5027483B2 (en) * | 2006-11-10 | 2012-09-19 | 富士通株式会社 | Information search apparatus and information search method |
| US8244767B2 (en) | 2009-10-09 | 2012-08-14 | Stratify, Inc. | Composite locality sensitive hash based processing of documents |
| US9355171B2 (en) | 2009-10-09 | 2016-05-31 | Hewlett Packard Enterprise Development Lp | Clustering of near-duplicate documents |
| CN103218435B (en) * | 2013-04-15 | 2017-01-25 | 上海嘉之道企业管理咨询有限公司 | Method and system for clustering Chinese text data |
| US8837835B1 (en) | 2014-01-20 | 2014-09-16 | Array Technology, LLC | Document grouping system |
| CN114492446B (en) * | 2022-02-16 | 2023-06-16 | 平安科技(深圳)有限公司 | Legal document processing method and device, electronic equipment and storage medium |
| CN115757680A (en) * | 2022-11-16 | 2023-03-07 | 北京彩智科技有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
| CN116933766B (en) * | 2023-06-02 | 2024-08-16 | 盐城工学院 | An Ad-hoc Information Retrieval Model Based on Triple Frequency Scheme |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5748953A (en) * | 1989-06-14 | 1998-05-05 | Hitachi, Ltd. | Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols |
| JP3270783B2 (en) * | 1992-09-29 | 2002-04-02 | ゼロックス・コーポレーション | Multiple document search methods |
| US5642502A (en) * | 1994-12-06 | 1997-06-24 | University Of Central Florida | Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text |
-
1999
- 1999-11-01 AU AU19073/00A patent/AU1907300A/en not_active Abandoned
- 1999-11-01 WO PCT/US1999/025686 patent/WO2000033215A1/en not_active Ceased
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105956010A (en) * | 2016-04-20 | 2016-09-21 | 浙江大学 | Distributed information retrieval set selection method based on distributed representation and local ordering |
| CN105956010B (en) * | 2016-04-20 | 2019-03-26 | 浙江大学 | Distributed Information Retrieval Set Selection Method Based on Distributed Representation and Local Sorting |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2000033215A1 (en) | 2000-06-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2001264928A1 (en) | System and method for automatically classifying text | |
| AU4905997A (en) | Management and analysis of document information text | |
| EP0996927A4 (en) | Text classification system and method | |
| AU3869899A (en) | Secure document reader and method therefor | |
| AU4698899A (en) | Computer audio reading device providing highlighting of either character or bitmapped based text images | |
| GB2345771B (en) | Apparatus for classifying or disambiguating data | |
| AUPP764398A0 (en) | Method and apparatus for computing the similarity between images | |
| AU4320299A (en) | Methods and apparatuses for processing security documents | |
| GB2318439B (en) | Device and method for representing handwriting, and an alphabet therefor | |
| AU2001275422A1 (en) | Method and system for text analysis | |
| AU1432001A (en) | Method for analyzing and classifying three dimensional seismic information | |
| AU3825197A (en) | Book-like interface for browsing on-line documents and methods therefor | |
| AU4043797A (en) | Method and apparatus for processing and determining the orientation of documents | |
| AU2001287447A1 (en) | Information system and method using analyses based on object-centric longitudinal data | |
| AU6401599A (en) | Environmental material ticket reader (emtr) and environmental material ticket (emt) system | |
| AU1580100A (en) | Classification method and apparatus | |
| AU6265999A (en) | Computer curve construction system and method | |
| AU4620899A (en) | Electronic file retrieval method and system | |
| AU1907300A (en) | Term-length term-frequency method for measuring document similarity and classifying text | |
| AU2198300A (en) | Improved techniques for spatial representation of data and browsing based on similarity | |
| IL138806A0 (en) | System and method for searching electronic documents created with optical character recognition | |
| AU2277900A (en) | Method and device for object recognition | |
| GB2320390B (en) | Document paper size detection | |
| AU2002243337A1 (en) | System and method for providing data analysis and interpretation | |
| AU2002217843A1 (en) | Computer method and apparatus for classifying objects |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MK6 | Application lapsed section 142(2)(f)/reg. 8.3(3) - pct applic. not entering national phase |