Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants... more
Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.
Research Interests:
How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000... more
How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to "edgetic" alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leadin...
Research Interests:
Whereas the genome-era technologies have produced the sequence of complete human genome, the modern post-genome technologies aim at the understanding of mechanisms of processing of genetic information and elucidation of within-species... more
Whereas the genome-era technologies have produced the sequence of complete human genome, the modern post-genome technologies aim at the understanding of mechanisms of processing of genetic information and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) comprise the majority of polymorphism in the human population. Non-synonymous coding SNPs together with SNPs in regulatory regions are believed to have the highest impact on complex disease etiology, quantitative traits and response to drug treatment. PolyPhen is a computational tool for prediction of putatively functional nsSNPs with application areas such as genetics of complex disease, birth defects, identification of functional mutations in model organisms and evolutionary genetics.
Research Interests: Genetics, Molecular Biology, Polymorphism, Computational Biology, Evolutionary genetics, and 10 moreInformation Processing, Genetics of complex disease, Humans, Human Genome, Single Nucleotide Polymorphism, Phenotypic variation, Quantitative Trait Loci, Amino Acid Substitution Rates, Congenital Defect, and Biochemistry and cell biology
Research Interests:
The ability to sequence cost-effectively all of the coding regions of a given individual genome is rapidly approaching, with the potential for whole-genome resequencing not far behind. Initiatives are currently underway to phenotype... more
The ability to sequence cost-effectively all of the coding regions of a given individual genome is rapidly approaching, with the potential for whole-genome resequencing not far behind. Initiatives are currently underway to phenotype hundreds of thousands of individuals for major human traits. Here, we determine the power for de novo discovery of genes related to human traits by resequencing all
Research Interests:
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for... more
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence databases using partially interpreted tandem mass spectra of tryptic peptides. We have extended the ability of sequence tag searching to the identification of proteins whose sequences are yet unknown but are homologous to known database entries. The MultiTag method presented here assigns statistical significance to matches of multiple error-tolerant sequence tags to a database entry and ranks alignments by their significance. The MultiTag approach has the distinct advantage over other sequence-similarity approaches of being able to perform sequence-similarity identifications using only very short (2-4) amino acid residue stretches of peptide sequences, rather than complete peptide sequences deduced by de novo interpretation of tandem mass spectra. This feature facilitates the identification of low abundance proteins, since noisy and low-intensity tandem mass spectra can be utilized.
Research Interests:
Analysis of human genetic variation can shed light on the problem of the genetic basis of complex disorders. Nonsynonymous single nucleotide polymorphisms (SNPs), which affect the amino acid sequence of proteins, are believed to be the... more
Analysis of human genetic variation can shed light on the problem of the genetic basis of complex disorders. Nonsynonymous single nucleotide polymorphisms (SNPs), which affect the amino acid sequence of proteins, are believed to be the most frequent type of variation associated with the respective disease phenotype. Complete enumeration of nonsynonymous SNPs in the candidate genes will enable further association
Research Interests:
The MultiTag method (Sunyaev et al., Anal. Chem. 2003 15, 1307-1315) employs multiple error-tolerant searches with peptide sequence tags (Mann and Wilm, Anal. Chem. 1994, 66, 4390-4399) for the identification of proteins from organisms... more
The MultiTag method (Sunyaev et al., Anal. Chem. 2003 15, 1307-1315) employs multiple error-tolerant searches with peptide sequence tags (Mann and Wilm, Anal. Chem. 1994, 66, 4390-4399) for the identification of proteins from organisms with unsequenced genomes. Here we demonstrate that the error-tolerant capabilities of MultiTag increased the number of peptide alignments and improved the confidence of identifications in an EST database. The MultiTag outperformed conventional database searching software that only utilizes stringent matching of tandem mass spectra to nucleotide sequences of ESTs.
Research Interests:
Non-African populations have experienced size reductions in the time since their split from West Africans, leading to the hypothesis that natural selection to remove weakly deleterious mutations has been less effective in the history of... more
Non-African populations have experienced size reductions in the time since their split from West Africans, leading to the hypothesis that natural selection to remove weakly deleterious mutations has been less effective in the history of non-Africans. To test this hypothesis, we measured the per-genome accumulation of nonsynonymous substitutions across diverse pairs of populations. We find no evidence for a higher load of deleterious mutations in non-Africans. However, we detect significant differences among more divergent populations, as archaic Denisovans have accumulated nonsynonymous mutations faster than either modern humans or Neanderthals. To reconcile these findings with patterns that have been interpreted as evidence of the less effective removal of deleterious mutations in non-Africans than in West Africans, we use simulations to show that the observed patterns are not likely to reflect changes in the effectiveness of selection after the populations split but are instead li...
Research Interests:
We propose a method for estimating the evolutionary distance between DNA sequences in terms of insertions and deletions (indels), defined as the per site number of indels accumulated in the course of divergence of the two sequences. We... more
We propose a method for estimating the evolutionary distance between DNA sequences in terms of insertions and deletions (indels), defined as the per site number of indels accumulated in the course of divergence of the two sequences. We derive a maximal likelihood estimate of this distance from differences between lengths of orthologous introns or other segments of sequences delimited by
Research Interests:
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium... more
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic info...
Research Interests:
Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G + C)-rich (or (A + T)-rich) genomes contain more (or fewer) amino acids encoded by (G + C)-rich... more
Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G + C)-rich (or (A + T)-rich) genomes contain more (or fewer) amino acids encoded by (G + C)-rich codons. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded
Research Interests:
Research Interests:
Structural biology can provide three-dimensional structures for proteins of unknown function. When sequence or structure comparisons fail to suggest a function, insights can come from discovery of functionally important local structural... more
Structural biology can provide three-dimensional structures for proteins of unknown function. When sequence or structure comparisons fail to suggest a function, insights can come from discovery of functionally important local structural patterns. Existing methods to detect such patterns lack rigorous statistics needed for widespread application. Here, we derive a formula to calculate statistical significance of the root-mean-square deviation between atoms
Research Interests:
We study fitness landscape in the space of protein sequences by relating sets of human pathogenic missense mutations in 32 proteins to amino acid substitutions that occurred in the course of evolution of these proteins. On average, 10% of... more
We study fitness landscape in the space of protein sequences by relating sets of human pathogenic missense mutations in 32 proteins to amino acid substitutions that occurred in the course of evolution of these proteins. On average, 10% of deviations of a nonhuman protein from its human ortholog are compensated pathogenic deviations (CPDs), i.e., are caused by an amino acid
Research Interests: Molecular Evolution, Multidisciplinary, Humans, Sequence alignment, Mutation, and 10 moreProtein evolution, Protein structure, Proteins, Protein Sequence Analysis, Positive Selection, Protein Secondary Structure Prediction, Amino Acid Profile, Protein Conformation, Amino Acid Sequence, and Amino Acid Substitution Rates
INDIVIDUAL VARIATION IN PROTEIN-CODING SEQUENCES OF HUMAN GENOME SHAMIL SUNYAEV, JENS HANKE, DAVID BRETT, ATAKAN AYDIN, INGA ZASTROW, WARREN LATHE, PEER BORK, and JENS REICH Max-Delbru ck-Centrum of Molecular Medicine, ...
Research Interests:
Research Interests:
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density... more
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even high...
Research Interests:
Mammalian genomes contain many highly conserved nongenic sequences (CNGs) whose functional significance is poorly understood. Sets of CNGs have previously been identified by selecting the most conserved elements from a chromosome or... more
Mammalian genomes contain many highly conserved nongenic sequences (CNGs) whose functional significance is poorly understood. Sets of CNGs have previously been identified by selecting the most conserved elements from a chromosome or genome, but in these highly selected samples, conservation may be unrelated to purifying selection. Furthermore, conservation of CNGs may be caused by mutation rate variation rather than selective
Research Interests:
Balancing selection has been shown to act on several genes in short-term evolutionary contexts, but it is not known whether this force is responsible for maintaining a significant number of long-term polymorphisms. We aligned 7628... more
Balancing selection has been shown to act on several genes in short-term evolutionary contexts, but it is not known whether this force is responsible for maintaining a significant number of long-term polymorphisms. We aligned 7628 chimpanzee virtual transcripts and 5524 chimp ESTs to the 4x chimp draft genome assembly and identified polymorphisms in chimpanzee that also occurred in the human single nucleotide polymorphism database (dbSNP). Our analysis suggests that the incidence of ancestral polymorphism is low or absent and that balancing selection on the time-scale of chimpanzee-human divergence has not been a significant force in human evolution.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
The parametric description of residue environments through solvent accessibility, backbone conformation, or pairwise residue-residue distances is the key to the comparison between amino acid types at protein sequence positions and residue... more
The parametric description of residue environments through solvent accessibility, backbone conformation, or pairwise residue-residue distances is the key to the comparison between amino acid types at protein sequence positions and residue locations in structural templates (condition of protein sequence-structure match). For the first time, the research results presented in this study clarify and allow to quantify, on a rigorous statistical basis, to what extent the amino acid type-specific distributions of commonly used environment parameters are discriminative with respect to the 20 amino acid types. Relying on the Bahadur theory, we estimate the probability of error in a single-sequence-structure alignment based on weak or absent discriminative power in a learning database of protein structure. We present the results for many residue environment variables and demonstrate that each fold description parameter is sensitive with respect to only a few amino acid types while indifferent to most of the other amino acid types. Even complex structural characteristics combining solvent-accessible surface area, backbone conformation, and pairwise distances distinguish only some amino acid types, whereas the others remain nondiscriminated. We find that the knowledge-based potentials currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr as essentially "average" amino acids. Thus, highly discriminative amino acid types define the alignment register in gapless sequence-structure alignments. The introduction of gaps leads to alignment ambiguities at sequence positions occupied by nondiscriminated amino acid types. Therefore, local sequence-structure alignments produced by techniques with gaps cannot be reliable. Conceptionally new and more sensitive environment parameters must be invented.
Research Interests:
Research Interests: Genetics, Molecular Evolution, Phylogeny, Humans, Mutation, and 12 morePhenotype, Adaptive evolution, PLoS Genetics, Amino Acid Profile, Host Specificity, Sialic Acid, Virus infection, Demyelinating disease, Healthy Subjects, Binding Site, Capsid Protein, and Progressive Multifocal Leukoencephalopathy
Research Interests: Genetics, Population Genetics, Evolutionary genetics, Molecular Evolution, Humans, and 17 moreAfrican American, Female, Animals, Male, Population expansion, Positive Selection, Human Genome, Pan troglodytes, Single Nucleotide Polymorphism, Association Mapping, PLoS Genetics, European Continental Ancestry Group, Genetic variation, Amino Acid Profile, Demographic History, Amino Acid Substitution Rates, and Complex Traits
Research Interests:
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the... more
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.
Research Interests: Genetics, Transcription Regulation, Biological Sciences, Protein Structure and Function, Environmental Sciences, and 12 moreGenetics of complex disease, Humans, Mutation, World Wide Web, Protein structure, Nucleic Acids, Proteins, Information Storage and Retrieval, Single Nucleotide Polymorphism, Phenotypic variation, Amino Acid Substitution Rates, and Internet
Research Interests:
Research Interests: Pharmacology, Biochemistry, Bioinformatics, Evolutionary Biology, Genetics, and 53 moreMarine Biology, Neuroscience, Environmental Science, Geophysics, Physics, Materials Science, Quantum Physics, Developmental Biology, Immunology, Climate Change, Molecular Biology, Structural Biology, Genomics, RNA, Computational Biology, Transcriptomics, Molecular Evolution, Biotechnology, Systems Biology, Cancer, Biology, Metabolomics, Cell Cycle, Proteomics, Ecology, Drug Discovery, Evolution, Nanotechnology, Astrophysics, Neurobiology, Medicine, Multidisciplinary, Palaeobiology, Functional Genomics, Nature, Signal Transduction, Astronomy, DNA, Humans, Sequence alignment, Mutation, Mice, Animals, Cell Signalling, Medical Research, Positive Selection, Rats, Negative Selection Algorithm, Genetic variation, Codon, Amino Acid Substitution Rates, Earth Science, and Nucleotides
Research Interests: Polymorphism, Computational Biology, Africa, Protein Structure Prediction, Comparative Genomics, and 13 moreMultidisciplinary, Nature, Humans, African American, Europe, United States, Polymerase Chain Reaction, Human Genome, Single Nucleotide Polymorphism, Genetic variation, Amino Acid Profile, Nucleotides, and DNA sequence
Research Interests:
The accumulation of genome-wide information on single nucleotide polymorphisms in humans provides an unprecedented opportunity to detect the evolutionary forces responsible for heterogeneity of the level of genetic variability across... more
The accumulation of genome-wide information on single nucleotide polymorphisms in humans provides an unprecedented opportunity to detect the evolutionary forces responsible for heterogeneity of the level of genetic variability across loci. Previous studies have shown that history of recombination events has produced long haplotype blocks in the human genome, which contribute to this heterogeneity. Other factors, however, such as natural selection or the heterogeneity of mutation rates across loci, may also lead to heterogeneity of genetic variability. We compared synonymous and non-synonymous variability within human genes with their divergence from murine orthologs. We separately analyzed the non-synonymous variants predicted to damage protein structure or function and the variants predicted to be functionally benign. The predictions were based on comparative sequence analysis and, in some cases, on the analysis of protein structure. A strong correlation between non-synonymous, benign variability and non-synonymous human-mouse divergence suggests that selection played an important role in shaping the pattern of variability in coding regions of human genes. However, the lack of correlation between deleterious variability and evolutionary divergence shows that a substantial proportion of the observed non-synonymous single-nucleotide polymorphisms reduces fitness and never reaches fixation. Evolutionary and medical implications of the impact of selection on human polymorphisms are discussed.