[go: up one dir, main page]

AU1907300A - Term-length term-frequency method for measuring document similarity and classifying text - Google Patents

Term-length term-frequency method for measuring document similarity and classifying text

Info

Publication number
AU1907300A
AU1907300A AU19073/00A AU1907300A AU1907300A AU 1907300 A AU1907300 A AU 1907300A AU 19073/00 A AU19073/00 A AU 19073/00A AU 1907300 A AU1907300 A AU 1907300A AU 1907300 A AU1907300 A AU 1907300A
Authority
AU
Australia
Prior art keywords
term
frequency method
length
document similarity
classifying text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU19073/00A
Inventor
Mark Kantrowitz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JustSystems Corp
Original Assignee
Justsystem Pittsburgh Research Center Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Justsystem Pittsburgh Research Center Inc filed Critical Justsystem Pittsburgh Research Center Inc
Publication of AU1907300A publication Critical patent/AU1907300A/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
AU19073/00A 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text Abandoned AU1907300A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US20156998A 1998-11-30 1998-11-30
US09201569 1998-11-30
PCT/US1999/025686 WO2000033215A1 (en) 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text

Publications (1)

Publication Number Publication Date
AU1907300A true AU1907300A (en) 2000-06-19

Family

ID=22746357

Family Applications (1)

Application Number Title Priority Date Filing Date
AU19073/00A Abandoned AU1907300A (en) 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text

Country Status (2)

Country Link
AU (1) AU1907300A (en)
WO (1) WO2000033215A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3573688B2 (en) * 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
AUPR208000A0 (en) * 2000-12-15 2001-01-11 80-20 Software Pty Limited Method of document searching
US7412453B2 (en) 2002-12-30 2008-08-12 International Business Machines Corporation Document analysis and retrieval
EP1462950B1 (en) * 2003-03-27 2007-08-29 Sony Deutschland GmbH Method for language modelling
US7321880B2 (en) 2003-07-02 2008-01-22 International Business Machines Corporation Web services access to classification engines
WO2007057809A2 (en) * 2005-11-15 2007-05-24 Koninklijke Philips Electronics N.V. Method of obtaining a representation of a text
JP5027483B2 (en) * 2006-11-10 2012-09-19 富士通株式会社 Information search apparatus and information search method
US8244767B2 (en) 2009-10-09 2012-08-14 Stratify, Inc. Composite locality sensitive hash based processing of documents
US9355171B2 (en) 2009-10-09 2016-05-31 Hewlett Packard Enterprise Development Lp Clustering of near-duplicate documents
CN103218435B (en) * 2013-04-15 2017-01-25 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data
US8837835B1 (en) 2014-01-20 2014-09-16 Array Technology, LLC Document grouping system
CN114492446B (en) * 2022-02-16 2023-06-16 平安科技(深圳)有限公司 Legal document processing method and device, electronic equipment and storage medium
CN115757680A (en) * 2022-11-16 2023-03-07 北京彩智科技有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN116933766B (en) * 2023-06-02 2024-08-16 盐城工学院 An Ad-hoc Information Retrieval Model Based on Triple Frequency Scheme

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748953A (en) * 1989-06-14 1998-05-05 Hitachi, Ltd. Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
JP3270783B2 (en) * 1992-09-29 2002-04-02 ゼロックス・コーポレーション Multiple document search methods
US5642502A (en) * 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering
CN105956010B (en) * 2016-04-20 2019-03-26 浙江大学 Distributed Information Retrieval Set Selection Method Based on Distributed Representation and Local Sorting

Also Published As

Publication number Publication date
WO2000033215A1 (en) 2000-06-08

Similar Documents

Publication Publication Date Title
AU2001264928A1 (en) System and method for automatically classifying text
AU4905997A (en) Management and analysis of document information text
EP0996927A4 (en) Text classification system and method
AU3869899A (en) Secure document reader and method therefor
AU4698899A (en) Computer audio reading device providing highlighting of either character or bitmapped based text images
GB2345771B (en) Apparatus for classifying or disambiguating data
AUPP764398A0 (en) Method and apparatus for computing the similarity between images
AU4320299A (en) Methods and apparatuses for processing security documents
GB2318439B (en) Device and method for representing handwriting, and an alphabet therefor
AU2001275422A1 (en) Method and system for text analysis
AU1432001A (en) Method for analyzing and classifying three dimensional seismic information
AU3825197A (en) Book-like interface for browsing on-line documents and methods therefor
AU4043797A (en) Method and apparatus for processing and determining the orientation of documents
AU2001287447A1 (en) Information system and method using analyses based on object-centric longitudinal data
AU6401599A (en) Environmental material ticket reader (emtr) and environmental material ticket (emt) system
AU1580100A (en) Classification method and apparatus
AU6265999A (en) Computer curve construction system and method
AU4620899A (en) Electronic file retrieval method and system
AU1907300A (en) Term-length term-frequency method for measuring document similarity and classifying text
AU2198300A (en) Improved techniques for spatial representation of data and browsing based on similarity
IL138806A0 (en) System and method for searching electronic documents created with optical character recognition
AU2277900A (en) Method and device for object recognition
GB2320390B (en) Document paper size detection
AU2002243337A1 (en) System and method for providing data analysis and interpretation
AU2002217843A1 (en) Computer method and apparatus for classifying objects

Legal Events

Date Code Title Description
MK6 Application lapsed section 142(2)(f)/reg. 8.3(3) - pct applic. not entering national phase