[go: up one dir, main page]

Nothaft, 2017 - Google Patents

Scalable systems and algorithms for genomic variant analysis

Nothaft, 2017

View PDF
Document ID
5681884128893361648
Author
Nothaft F
Publication year

External Links

Snippet

With the cost of sequencing a human genome dropping below $1,000, population-scale sequencing has become feasible. With projects that sequence more than 10,000 genomes becoming commonplace, there is a strong need for genome analysis tools that can scale …
Continue reading at escholarship.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/18Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for functional genomics or proteomics, e.g. genotype-phenotype associations, linkage disequilibrium, population genetics, binding site identification, mutagenesis, genotyping or genome annotation, protein-protein interactions or protein-nucleic acid interactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/28Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements

Similar Documents

Publication Publication Date Title
Garrison et al. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar
Van der Auwera et al. From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline
US10600217B2 (en) Methods for the graphical representation of genomic sequence data
Timón-Reina et al. An overview of graph databases and their applications in the biomedical domain
JP5791149B2 (en) Computer-implemented method, computer program, and data processing system for database query optimization
Coombe et al. ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads
Nothaft Scalable systems and algorithms for genomic variant analysis
Li et al. ntEdit+ Sealer: efficient targeted error resolution and automated finishing of long‐read genome assemblies
Nothaft Scalable Genome Resequencing with ADAM and avocado
Ricketts et al. Using LICHeE and BAMSE for reconstructing cancer phylogenetic trees
Shajii et al. A Python-based optimization framework for high-performance genomics
Robinson et al. Postprocessing the Alignment
US12164516B2 (en) Click-to-script reflection
Forer et al. Cloudflow-A framework for mapreduce pipeline development in biomedical research
Petkau A framework for the indexing, querying, clustering, and visualization of microbial genomes for surveillance and outbreak investigation
Schatz High performance computing for DNA sequence alignment and assembly
Spiegelberg et al. Tuplex: robust, efficient analytics when Python rules
Kaye Approaches to genome analysis through the application of graph theory
Ruano Implementing Bioinformatics Pipelines and User Interfaces for Selection of Immunotherapeutic Targets in Colorectal Cancer
Purohit PostGUI: A Modern Web Application for Sharing Biological Big Data
Oguchi A Comparison of Sensitive Splice Aware Aligners in RNA Sequence Data Analysis in Leaping towards Benchmarking
Novak Infrastructure for Scalable Analysis of Genomic Variation
Rengasamy Engineering High Performance Workflows for End-to-End Acceleration of Genomic Applications
Yuan BESS: bounded evaluation SQL systems
Kozanitis Compressing and Querrying the Human Genome