Selimovic, 2019 - Google Patents
Compressing Massive Sequencing Data with Multiple Attribute TreeSelimovic, 2019
View PDF- Document ID
- 17107973334260349913
- Author
- Selimovic D
- Publication year
External Links
Snippet
The significant drop in DNA Sequencing costs caused by Next-Generation Sequencing has led to the production of massive amounts of raw sequencing data. This data is stored in FASTQ files, which are text files containing a large number of reads, each composed of a …
- 238000007906 compression 0 abstract description 116
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/14—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for phylogeny or evolution, e.g. evolutionarily conserved regions determination or phylogenetic tree construction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110506272B (en) | Method and device for accessing bioinformatic data structured in access units | |
Navarro | Compact data structures: A practical approach | |
Ferragina et al. | Lightweight data indexing and compression in external memory | |
Benoit et al. | Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph | |
Novak et al. | A graph extension of the positional Burrows–Wheeler transform and its applications | |
US20070255748A1 (en) | Method of structuring and compressing labeled trees of arbitrary degree and shape | |
Deorowicz | FQSqueezer: k-mer-based compression of sequencing data | |
Yanovsky | ReCoil-an algorithm for compression of extremely large datasets of DNA data | |
Qu et al. | Clover: tree structure-based efficient DNA clustering for DNA-based data storage | |
CN110168652B (en) | Method and system for storing and accessing bioinformatic data | |
Sirén | Burrows-Wheeler transform for terabases | |
Liu et al. | GPU-accelerated BWT construction for large collection of short reads | |
Pibiri et al. | Meta-colored compacted de Bruijn graphs | |
KR20190062551A (en) | Method and apparatus for accessing bioinformatics data structured as an access unit | |
Sirén | Compressed Full-Text Indexes for Highly Repetitive Collections. | |
Alanko et al. | Succinct k-mer sets using subset rank queries on the spectral burrows-wheeler transform | |
White et al. | Compressing DNA sequence databases with coil | |
KR20190113971A (en) | Compression representation method and apparatus of bioinformatics data using multiple genome descriptors | |
Guerra et al. | Performance comparison of sequential and parallel compression applications for DNA raw data | |
Selimovic | Compressing Massive Sequencing Data with Multiple Attribute Tree | |
Ediger et al. | Computational graph analytics for massive streaming data | |
Huo et al. | CS2A: A compressed suffix array-based method for short read alignment | |
Chen et al. | CMIC: an efficient quality score compressor with random access functionality | |
US20240119027A1 (en) | Compression and search process on a data set based on multiple strategies | |
Womack | Cigarcoil: A new algorithm for the compression of dna sequencing data |