Selimovic, 2019 - Google Patents

Compressing Massive Sequencing Data with Multiple Attribute Tree

Selimovic, 2019

Document ID: 17107973334260349913
Author: Selimovic D
Publication year: 2019

External Links

Cited by

Snippet

The significant drop in DNA Sequencing costs caused by Next-Generation Sequencing has led to the production of massive amounts of raw sequencing data. This data is stored in FASTQ files, which are text files containing a large number of reads, each composed of a …

Continue reading at core.ac.uk (PDF) (other versions)

238000007906 compression 0 abstract description 116

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/14—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for phylogeny or evolution, e.g. evolutionarily conserved regions determination or phylogenetic tree construction
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code

Similar Documents

Publication	Publication Date	Title
CN110506272B (en)	2023-08-01	Method and device for accessing bioinformatic data structured in access units
Navarro	2016	Compact data structures: A practical approach
Ferragina et al.	2012	Lightweight data indexing and compression in external memory
Benoit et al.	2015	Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
Novak et al.	2017	A graph extension of the positional Burrows–Wheeler transform and its applications
US20070255748A1 (en)	2007-11-01	Method of structuring and compressing labeled trees of arbitrary degree and shape
Deorowicz	2020	FQSqueezer: k-mer-based compression of sequencing data
Yanovsky	2011	ReCoil-an algorithm for compression of extremely large datasets of DNA data
Qu et al.	2022	Clover: tree structure-based efficient DNA clustering for DNA-based data storage
CN110168652B (en)	2023-11-21	Method and system for storing and accessing bioinformatic data
Sirén	2016	Burrows-Wheeler transform for terabases
Liu et al.	2014	GPU-accelerated BWT construction for large collection of short reads
Pibiri et al.	2024	Meta-colored compacted de Bruijn graphs
KR20190062551A (en)	2019-06-05	Method and apparatus for accessing bioinformatics data structured as an access unit
Sirén	2012	Compressed Full-Text Indexes for Highly Repetitive Collections.
Alanko et al.	2022	Succinct k-mer sets using subset rank queries on the spectral burrows-wheeler transform
White et al.	2008	Compressing DNA sequence databases with coil
KR20190113971A (en)	2019-10-08	Compression representation method and apparatus of bioinformatics data using multiple genome descriptors
Guerra et al.	2016	Performance comparison of sequential and parallel compression applications for DNA raw data
Selimovic	2019	Compressing Massive Sequencing Data with Multiple Attribute Tree
Ediger et al.	2013	Computational graph analytics for massive streaming data
Huo et al.	2016	CS2A: A compressed suffix array-based method for short read alignment
Chen et al.	2022	CMIC: an efficient quality score compressor with random access functionality
US20240119027A1 (en)	2024-04-11	Compression and search process on a data set based on multiple strategies
Womack	2019	Cigarcoil: A new algorithm for the compression of dna sequencing data