[go: up one dir, main page]

Zhou et al., 2023 - Google Patents

ForestZip: An Effective Parallel Parser for Log Compression

Zhou et al., 2023

View PDF
Document ID
653570963222931083
Author
Zhou Y
Su Y
Publication year
Publication venue
Proceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum

External Links

Snippet

Nowadays, cloud services generate a significant amount of log streams. Storing these log streams consumes a large amount of disk space and leads to high costs. Traditional compression tools and algorithms work well for small-scale text processing but are not …
Continue reading at dl.acm.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30156De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • G06F17/30303Improving data quality; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/3071Clustering or classification including class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • HELECTRICITY
    • H03BASIC ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication Publication Date Title
US7996369B2 (en) Method and apparatus for improving performance of approximate string queries using variable length high-quality grams
US8244767B2 (en) Composite locality sensitive hash based processing of documents
RU2464630C2 (en) Two-pass hash extraction of text strings
US8175875B1 (en) Efficient indexing of documents with similar content
CN107210753B (en) Lossless reduction of data by deriving data from prime data units residing in a content association filter
CN112800008A (en) Compression, search and decompression of log messages
CN111950263A (en) A log parsing method, system and electronic device
Nevill-Manning et al. On-line and off-line heuristics for inferring hierarchies of repetitions in sequences
Yu et al. Unlocking the power of numbers: Log compression via numeric token parsing
Ferragina et al. Learned monotone minimal perfect hashing
Ferragina et al. On optimally partitioning a text to improve its compression
Zhou et al. ForestZip: An Effective Parallel Parser for Log Compression
US20150082142A1 (en) Method for storing and applying related sets of pattern/message rules
Deypir et al. EclatDS: An efficient sliding window based frequent pattern mining method for data streams
Zhang Transform based and search aware text compression schemes and compressed domain text retrieval
CN111488439B (en) System and method for saving and analyzing log data
Dong et al. Content-aware partial compression for big textual data analysis acceleration
Nevill-Manning et al. Phrase hierarchy inference and compression in bounded space
Nishimoto et al. Dynamic suffix array in optimal compressed space
Klein Improving static compression schemes by alphabet extension
Platos et al. Word-based text compression
Bookstein et al. Models of bitmap generation: A systematic approach to bitmap compression
Xie et al. OCSL: An Online Compression Scheme for Streaming Semi-Structured Logs
Amir et al. Quasi-distinct parsing and optimal compression methods
Liu et al. LogPrism: Unifying Structure and Variable Encoding for Effective Log Compression