[go: up one dir, main page]

Agarwal et al., 2013 - Google Patents

A high-speed and large-scale dictionary matching engine for information extraction systems

Agarwal et al., 2013

View PDF
Document ID
10580573574076663441
Author
Agarwal K
Polig R
Publication year
Publication venue
2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors

External Links

Snippet

Dictionary matching is a commonly used operation in Information Extraction (IE) systems. It involves matching a set of strings in a document against a dictionary of pre-defined patterns. In this paper, we describe a high performance and scalable hardware architecture to enable …
Continue reading at pdfs.semanticscholar.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30964Querying
    • G06F17/30979Query processing
    • G06F17/30985Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/30675Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • G06F17/30949Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/3071Clustering or classification including class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30587Details of specialised database models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/02Indexing scheme relating to groups G06F7/02 - G06F7/026
    • G06F2207/025String search, i.e. pattern matching, e.g. find identical word or best match in a string
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures

Similar Documents

Publication Publication Date Title
US10339141B2 (en) Detecting at least one predetermined pattern in stream of symbols
Goel et al. Small subset queries and bloom filters using ternary associative memories, with applications
US10579661B2 (en) System and method for machine learning and classifying data
US8266179B2 (en) Method and system for processing text
EP2421223A2 (en) Method and Apparatus for Approximate Pattern Matching
US20090210412A1 (en) Method for searching and indexing data and a system for implementing same
Breitinger et al. FRASH: A framework to test algorithms of similarity hashing
CN107567621A (en) For performing the method, system and computer program product of numeric search
Upchurch et al. Malware provenance: code reuse detection in malicious software at scale
Berberich et al. Computing n-gram statistics in MapReduce
CN104881439A (en) Method and system for space-efficient multi-pattern matching
Sood et al. Probabilistic near-duplicate detection using simhash
US20130332146A1 (en) High Speed Large Scale Dictionary Matching
Agarwal et al. A high-speed and large-scale dictionary matching engine for information extraction systems
Hahn et al. Raw filtering of JSON data on FPGAs
Sinha et al. Improving suffix array locality for fast pattern matching on disk
Lin et al. Pipelined parallel ac-based approach for multi-string matching
Li et al. Forensic analysis of document fragment based on SVM
CN113642331B (en) Financial named entity identification method and system, storage medium and terminal
CN110347804A (en) A kind of sensitive information detection method of linear time complexity
Tseng et al. A fast scalable automaton-matching accelerator for embedded content processors
Vanderbauwhede et al. A hybrid CPU-FPGA system for high throughput (10Gb/s) streaming document classification
Wang et al. Hardware accelerator to detect multi-segment virus patterns
Lin et al. A platform-based SoC design and implementation of scalable automaton matching for deep packet inspection
Ebrahim Finding the Top-K Heavy Hitters in Data Streams: A Reconfigurable Accelerator Based on an FPGA-Optimized Algorithm. Electronics 2023, 12, 2376