Agarwal et al., 2013 - Google Patents
A high-speed and large-scale dictionary matching engine for information extraction systemsAgarwal et al., 2013
View PDF- Document ID
- 10580573574076663441
- Author
- Agarwal K
- Polig R
- Publication year
- Publication venue
- 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors
External Links
Snippet
Dictionary matching is a commonly used operation in Information Extraction (IE) systems. It involves matching a set of strings in a document against a dictionary of pre-defined patterns. In this paper, we describe a high performance and scalable hardware architecture to enable …
- 238000000605 extraction 0 title abstract description 17
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30964—Querying
- G06F17/30979—Query processing
- G06F17/30985—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
- G06F17/30949—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/02—Indexing scheme relating to groups G06F7/02 - G06F7/026
- G06F2207/025—String search, i.e. pattern matching, e.g. find identical word or best match in a string
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10339141B2 (en) | Detecting at least one predetermined pattern in stream of symbols | |
| Goel et al. | Small subset queries and bloom filters using ternary associative memories, with applications | |
| US10579661B2 (en) | System and method for machine learning and classifying data | |
| US8266179B2 (en) | Method and system for processing text | |
| EP2421223A2 (en) | Method and Apparatus for Approximate Pattern Matching | |
| US20090210412A1 (en) | Method for searching and indexing data and a system for implementing same | |
| Breitinger et al. | FRASH: A framework to test algorithms of similarity hashing | |
| CN107567621A (en) | For performing the method, system and computer program product of numeric search | |
| Upchurch et al. | Malware provenance: code reuse detection in malicious software at scale | |
| Berberich et al. | Computing n-gram statistics in MapReduce | |
| CN104881439A (en) | Method and system for space-efficient multi-pattern matching | |
| Sood et al. | Probabilistic near-duplicate detection using simhash | |
| US20130332146A1 (en) | High Speed Large Scale Dictionary Matching | |
| Agarwal et al. | A high-speed and large-scale dictionary matching engine for information extraction systems | |
| Hahn et al. | Raw filtering of JSON data on FPGAs | |
| Sinha et al. | Improving suffix array locality for fast pattern matching on disk | |
| Lin et al. | Pipelined parallel ac-based approach for multi-string matching | |
| Li et al. | Forensic analysis of document fragment based on SVM | |
| CN113642331B (en) | Financial named entity identification method and system, storage medium and terminal | |
| CN110347804A (en) | A kind of sensitive information detection method of linear time complexity | |
| Tseng et al. | A fast scalable automaton-matching accelerator for embedded content processors | |
| Vanderbauwhede et al. | A hybrid CPU-FPGA system for high throughput (10Gb/s) streaming document classification | |
| Wang et al. | Hardware accelerator to detect multi-segment virus patterns | |
| Lin et al. | A platform-based SoC design and implementation of scalable automaton matching for deep packet inspection | |
| Ebrahim | Finding the Top-K Heavy Hitters in Data Streams: A Reconfigurable Accelerator Based on an FPGA-Optimized Algorithm. Electronics 2023, 12, 2376 |