Zhou et al., 2023 - Google Patents
ForestZip: An Effective Parallel Parser for Log CompressionZhou et al., 2023
View PDF- Document ID
- 653570963222931083
- Author
- Zhou Y
- Su Y
- Publication year
- Publication venue
- Proceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum
External Links
Snippet
Nowadays, cloud services generate a significant amount of log streams. Storing these log streams consumes a large amount of disk space and leads to high costs. Traditional compression tools and algorithms work well for small-scale text processing but are not …
- 238000007906 compression 0 title abstract description 70
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G06F17/30303—Improving data quality; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7996369B2 (en) | Method and apparatus for improving performance of approximate string queries using variable length high-quality grams | |
| US8244767B2 (en) | Composite locality sensitive hash based processing of documents | |
| RU2464630C2 (en) | Two-pass hash extraction of text strings | |
| US8175875B1 (en) | Efficient indexing of documents with similar content | |
| CN107210753B (en) | Lossless reduction of data by deriving data from prime data units residing in a content association filter | |
| CN112800008A (en) | Compression, search and decompression of log messages | |
| CN111950263A (en) | A log parsing method, system and electronic device | |
| Nevill-Manning et al. | On-line and off-line heuristics for inferring hierarchies of repetitions in sequences | |
| Yu et al. | Unlocking the power of numbers: Log compression via numeric token parsing | |
| Ferragina et al. | Learned monotone minimal perfect hashing | |
| Ferragina et al. | On optimally partitioning a text to improve its compression | |
| Zhou et al. | ForestZip: An Effective Parallel Parser for Log Compression | |
| US20150082142A1 (en) | Method for storing and applying related sets of pattern/message rules | |
| Deypir et al. | EclatDS: An efficient sliding window based frequent pattern mining method for data streams | |
| Zhang | Transform based and search aware text compression schemes and compressed domain text retrieval | |
| CN111488439B (en) | System and method for saving and analyzing log data | |
| Dong et al. | Content-aware partial compression for big textual data analysis acceleration | |
| Nevill-Manning et al. | Phrase hierarchy inference and compression in bounded space | |
| Nishimoto et al. | Dynamic suffix array in optimal compressed space | |
| Klein | Improving static compression schemes by alphabet extension | |
| Platos et al. | Word-based text compression | |
| Bookstein et al. | Models of bitmap generation: A systematic approach to bitmap compression | |
| Xie et al. | OCSL: An Online Compression Scheme for Streaming Semi-Structured Logs | |
| Amir et al. | Quasi-distinct parsing and optimal compression methods | |
| Liu et al. | LogPrism: Unifying Structure and Variable Encoding for Effective Log Compression |