Skip to main content

Charlotte Deane

University of Oxford, Statistics, Faculty Member

Followers

40

Following

18

Co-authors

18

Public Views

Interests

Uploads

Papers by Charlotte Deane

KA-Search, a method for rapid and exhaustive sequence identity search of known antibodies

Scientific Reports

Antibodies with similar amino acid sequences, especially across their complementarity-determining... more Antibodies with similar amino acid sequences, especially across their complementarity-determining regions, often share properties. Finding that an antibody of interest has a similar sequence to naturally expressed antibodies in healthy or diseased repertoires is a powerful approach for the prediction of antibody properties, such as immunogenicity or antigen specificity. However, as the number of available antibody sequences is now in the billions and continuing to grow, repertoire mining for similar sequences has become increasingly computationally expensive. Existing approaches are limited by either being low-throughput, non-exhaustive, not antibody specific, or only searching against entire chain sequences. Therefore, there is a need for a specialized tool, optimized for a rapid and exhaustive search of any antibody region against all known antibodies, to better utilize the full breadth of available repertoire sequences. We introduce Known Antibody Search (KA-Search), a tool that ...

Specific attributes of the VLdomain influence both the structure and structural variability of CDR-H3 through steric effects

Antibodies, through their ability to target virtually any epitope, play a key role in driving the... more Antibodies, through their ability to target virtually any epitope, play a key role in driving the adaptive immune response in jawed vertebrates. The binding domains of standard antibodies are their variable light (VL) and heavy (VH) domains, both of which present analogous complementarity-determining region (CDR) loops. It has long been known that the VHCDRs contribute more heavily to the antigen-binding surface (paratope), with the CDR-H3 loop providing a major modality for the generation of diverse paratopes. Here, we provide evidence for an additional role of the VLdomain as a modulator of CDR-H3 structure, using a diverse set of antibody crystal structures and a large set of molecular dynamics simulations. We show that specific attributes of the VLdomain such as CDR canonical forms and genes can influence the structural diversity of the CDR-H3 loop, and provide a physical model for how this effect occurs through inter-loop contacts and packing of CDRs against each other. Our stu...

works at Multiple Scales

Protein loop structure prediction

This dissertation concerns the study and prediction of loops in protein structures. Proteins perf... more This dissertation concerns the study and prediction of loops in protein structures. Proteins perform crucial functions in living organisms. Despite their importance, we are currently unable to predict their three dimensional structure accurately. Loops are segments that connect regular secondary structures of proteins. They tend to be located on the surface of proteins and often interact with other biological agents. As loops are generally subject to more frequent mutations than the rest of the protein, their sequences and structural conformations can vary significantly even within the same protein family. Although homology modelling is the most accurate computational method for protein structure prediction, difficulties still arise in predicting protein loops. Protein loop structure prediction is therefore a bottleneck in solving the protein structure prediction problem. Reflecting on the success of homology modelling, I implement an improved version of a database search method, FREAD. I show how sequence similarity as quantified by environment specific substitution scores can be used to significantly improve loop prediction. FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than ab initio methods; FREAD's predictive ability is length independent. In general, it produces results within 2A root mean square deviation (RMSD) from the native conformations, compared to an average of over 10A for loop length 20 for any of the other tested ab initio methods. I then examine FREAD’s predictive ability on a specific type of loops called complementarity determining regions (CDRs) in antibodies. CDRs consist of six hypervariable loops and form the majority of the antigen binding site. I examine CDR loop structure prediction as a general case of loop structure prediction problem. FREAD achieves accuracy similar to specific CDR predictors. However, it fails to accurately predict CDR-H3, which is known to be the most challenging CDR. Various FREAD versions including FREAD with contact information (ConFREAD) are examined. The FREAD variants improve predictions for CDR-H3 on homology models and docked structures. Lastly, I focus on the local properties of protein loops and demonstrate that the protein loop structure prediction problem is a local protein folding problem. The end-to-end distance of loops (loop span) follows a distinctive frequency distribution, regardless of secondary structure elements connected or the number of residues in the loop. I show that the loop span distribution follows a Maxwell-Boltzmann distribution. Based on my research, I propose future directions in protein loop structure prediction including estimating experimentally undetermined local structures using FREAD, multiple loop structure prediction using contact information and a novel ab initio method which makes use of loop stretch.

ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins

Immune receptor proteins play a key role in the immune system and have shown great promise as bio... more Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every...

Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning

Nature Machine Intelligence

Characterisation of the immune repertoire of a humanised transgenic mouse through immunophenotyping and high-throughput sequencing

Immunoglobulin loci-transgenic animals are widely used in antibody discovery and increasingly in ... more Immunoglobulin loci-transgenic animals are widely used in antibody discovery and increasingly in vaccine response modelling. In this study, we phenotypically characterised B-cell populations from the Intelliselect® Transgenic mouse (Kymouse) demonstrating full B-cell development competence. Comparison of the naïve B-cell receptor (BCR) repertoires of Kymice BCRs naïve human and murine BCR repertoires revealed key differences in germline gene usage and junctional diversification. These differences result in Kymice having CDRH3 length and diversity intermediate between mice and humans. To compare the structural space explored by CDRH3s in each species repertoire, we used computational structure prediction to show that Kymouse naïve BCR repertoires are more human-like than mouse-like in their predicted distribution of CDRH3 shape. Our combined sequence and structural analysis indicates that the naïve Kymouse BCR repertoire is diverse with key similarities to human repertoires, while im...

Fragment libraries designed to be functionally diverse recover protein binding information more efficiently than standard structurally diverse libraries

Current fragment-based drug design relies on the efficient exploration of chemical space though t... more Current fragment-based drug design relies on the efficient exploration of chemical space though the use of structurally diverse libraries of small fragments. However, structurally dissimilar compounds can exploit the same interactions on a target, and thus be functionally similar. Using 3D structures of many fragments bound to multiple targets, we examined if there exists a better strategy for selecting fragments for screening libraries. We show that structurally diverse fragments can be described as functionally redundant, often making the same interactions. Ranking fragments by the number of novel interactions they made, we show that functionally diverse selections of fragments substantially increase the amount of information recovered for unseen targets compared to other methods of selection. Using these results, we design small functionally efficient libraries that are able to give significantly more information about new protein targets than similarly sized structurally diverse...

Ranking of communities in multiplex spatiotemporal models of brain dynamics

Applied Network Science, 2022

As a relatively new field, network neuroscience has tended to focus on aggregate behaviours of th... more As a relatively new field, network neuroscience has tended to focus on aggregate behaviours of the brain averaged over many successive experiments or over long recordings in order to construct robust brain models. These models are limited in their ability to explain dynamic state changes in the brain which occurs spontaneously as a result of normal brain function. Hidden Markov Models (HMMs) trained on neuroimaging time series data have since arisen as a method to produce dynamical models that are easy to train but can be difficult to fully parametrise or analyse. We propose an interpretation of these neural HMMs as multiplex brain state graph models we term Hidden Markov Graph Models. This interpretation allows for dynamic brain activity to be analysed using the full repertoire of network analysis techniques. Furthermore, we propose a general method for selecting HMM hyperparameters in the absence of external data, based on the principle of maximum entropy, and use this to select t...

BIOINFORMATICS doi:10.1093/bioinformatics/btu447 Alignment-free

Vol. 30 ECCB 2014, pages i430–i437

Positioning of Membrane Proteins Positioning of Membrane Proteins Within the Lipid Bilayer Within the Lipid Bilayer

ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation

Antibodies are a key component of the immune system and have been extensively used as biotherapeu... more Antibodies are a key component of the immune system and have been extensively used as biotherapeutics. Accurate knowledge of their structure is central to understanding their antigen binding function. The key area for antigen binding and the main area of structural variation in antibodies is concentrated in the six complementarity determining regions (CDRs), with the most important for binding and most variable being the CDR-H3 loop. The sequence and structural variability of CDR-H3 make it particularly challenging to model. Recently deep learning methods have offered a step change in our ability to predict protein structures. In this work we present ABlooper, an end-to-end equivariant deep-learning based CDR loop structure prediction tool. ABlooper rapidly predicts the structure of CDR loops with high accuracy and provides a confidence estimate for each of its predictions. On the models of the Rosetta Antibody Benchmark, ABlooper makes predictions with an average CDR-H3 RMSD of 2.4...

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses

Journal of Chemical Information and Modeling, 2021

MHC binding affects the dynamics of different T-cell receptors in different ways

PLOS Computational Biology, 2019

Ligity: A Non-Superpositional, Knowledge-Based Approach to Virtual Screening

Journal of Chemical Information and Modeling, 2019

RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold

While template-free protein structure prediction protocols now produce good quality models for ma... more While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein do...

Association between a common immunoglobulin heavy chain allele and rheumatic heart disease risk in Oceania

Nature communications, May 11, 2017

The indigenous populations of the South Pacific experience a high burden of rheumatic heart disea... more The indigenous populations of the South Pacific experience a high burden of rheumatic heart disease (RHD). Here we report a genome-wide association study (GWAS) of RHD susceptibility in 2,852 individuals recruited in eight Oceanian countries. Stratifying by ancestry, we analysed genotyped and imputed variants in Melanesians (607 cases and 1,229 controls) before follow-up of suggestive loci in three further ancestral groups: Polynesians, South Asians and Mixed or other populations (totalling 399 cases and 617 controls). We identify a novel susceptibility signal in the immunoglobulin heavy chain (IGH) locus centring on a haplotype of nonsynonymous variants in the IGHV4-61 gene segment corresponding to the IGHV4-61*02 allele. We show each copy of IGHV4-61*02 is associated with a 1.4-fold increase in the risk of RHD (odds ratio 1.43, 95% confidence intervals 1.27-1.61, P=4.1 × 10(-9)). These findings provide new insight into the role of germline variation in the IGH locus in disease sus...

Generating weighted and thresholded gene coexpression networks using signed distance correlation

Even within well-studied organisms, many genes lack useful functional annotations. One way to gen... more Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct weighted gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to net...

Two examples of helix pairs, which are (A) not significantly different and (B) significantly different

<p>PDB code, chain identifier and residue numbers are given for each helix. The black resid... more <p>PDB code, chain identifier and residue numbers are given for each helix. The black residues are at the most disrupted site (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#sec002" target="_blank">Methods</a>) in each helix pair. <i>r</i><sub><i>n</i></sub> and <i>r</i><sub><i>c</i></sub> give the quality of the cylinder fit (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#pone.0157553.e001" target="_blank">Eq (1)</a>) to the backbone atoms on the N- (red) and C- (blue) terminal sides of the kink site. <i>θ</i> is the angle measured between the two cylinders. <i>ε</i> is the estimated error of the angle measurement, calculated from <i>r</i><sub><i>n</i></sub> + <i>r</i><sub><i>c</i></sub> using <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#pone.0157553.e006" target="_blank">Eq (4)</a>. If <i>θ</i><sub>max</sub> − <i>θ</i><sub>min</sub> > <i>ε</i><sub>1</sub> + <i>ε</i><sub>2</sub>, the confidence intervals do not overlap therefore we consider the angles to be significantly different.</p

Distributions of angles measured at each site of the seven transmembrane helices in the GPCR family, after smoothing

<p>The label at each site shown on the <i>x</i>-axis is the Class A numbering u... more

KA-Search, a method for rapid and exhaustive sequence identity search of known antibodies

Scientific Reports

Antibodies with similar amino acid sequences, especially across their complementarity-determining... more Antibodies with similar amino acid sequences, especially across their complementarity-determining regions, often share properties. Finding that an antibody of interest has a similar sequence to naturally expressed antibodies in healthy or diseased repertoires is a powerful approach for the prediction of antibody properties, such as immunogenicity or antigen specificity. However, as the number of available antibody sequences is now in the billions and continuing to grow, repertoire mining for similar sequences has become increasingly computationally expensive. Existing approaches are limited by either being low-throughput, non-exhaustive, not antibody specific, or only searching against entire chain sequences. Therefore, there is a need for a specialized tool, optimized for a rapid and exhaustive search of any antibody region against all known antibodies, to better utilize the full breadth of available repertoire sequences. We introduce Known Antibody Search (KA-Search), a tool that ...

Specific attributes of the VLdomain influence both the structure and structural variability of CDR-H3 through steric effects

Antibodies, through their ability to target virtually any epitope, play a key role in driving the... more Antibodies, through their ability to target virtually any epitope, play a key role in driving the adaptive immune response in jawed vertebrates. The binding domains of standard antibodies are their variable light (VL) and heavy (VH) domains, both of which present analogous complementarity-determining region (CDR) loops. It has long been known that the VHCDRs contribute more heavily to the antigen-binding surface (paratope), with the CDR-H3 loop providing a major modality for the generation of diverse paratopes. Here, we provide evidence for an additional role of the VLdomain as a modulator of CDR-H3 structure, using a diverse set of antibody crystal structures and a large set of molecular dynamics simulations. We show that specific attributes of the VLdomain such as CDR canonical forms and genes can influence the structural diversity of the CDR-H3 loop, and provide a physical model for how this effect occurs through inter-loop contacts and packing of CDRs against each other. Our stu...

works at Multiple Scales

Protein loop structure prediction

This dissertation concerns the study and prediction of loops in protein structures. Proteins perf... more This dissertation concerns the study and prediction of loops in protein structures. Proteins perform crucial functions in living organisms. Despite their importance, we are currently unable to predict their three dimensional structure accurately. Loops are segments that connect regular secondary structures of proteins. They tend to be located on the surface of proteins and often interact with other biological agents. As loops are generally subject to more frequent mutations than the rest of the protein, their sequences and structural conformations can vary significantly even within the same protein family. Although homology modelling is the most accurate computational method for protein structure prediction, difficulties still arise in predicting protein loops. Protein loop structure prediction is therefore a bottleneck in solving the protein structure prediction problem. Reflecting on the success of homology modelling, I implement an improved version of a database search method, FREAD. I show how sequence similarity as quantified by environment specific substitution scores can be used to significantly improve loop prediction. FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than ab initio methods; FREAD's predictive ability is length independent. In general, it produces results within 2A root mean square deviation (RMSD) from the native conformations, compared to an average of over 10A for loop length 20 for any of the other tested ab initio methods. I then examine FREAD’s predictive ability on a specific type of loops called complementarity determining regions (CDRs) in antibodies. CDRs consist of six hypervariable loops and form the majority of the antigen binding site. I examine CDR loop structure prediction as a general case of loop structure prediction problem. FREAD achieves accuracy similar to specific CDR predictors. However, it fails to accurately predict CDR-H3, which is known to be the most challenging CDR. Various FREAD versions including FREAD with contact information (ConFREAD) are examined. The FREAD variants improve predictions for CDR-H3 on homology models and docked structures. Lastly, I focus on the local properties of protein loops and demonstrate that the protein loop structure prediction problem is a local protein folding problem. The end-to-end distance of loops (loop span) follows a distinctive frequency distribution, regardless of secondary structure elements connected or the number of residues in the loop. I show that the loop span distribution follows a Maxwell-Boltzmann distribution. Based on my research, I propose future directions in protein loop structure prediction including estimating experimentally undetermined local structures using FREAD, multiple loop structure prediction using contact information and a novel ab initio method which makes use of loop stretch.

ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins

Immune receptor proteins play a key role in the immune system and have shown great promise as bio... more Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every...

Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning

Nature Machine Intelligence

Characterisation of the immune repertoire of a humanised transgenic mouse through immunophenotyping and high-throughput sequencing

Immunoglobulin loci-transgenic animals are widely used in antibody discovery and increasingly in ... more Immunoglobulin loci-transgenic animals are widely used in antibody discovery and increasingly in vaccine response modelling. In this study, we phenotypically characterised B-cell populations from the Intelliselect® Transgenic mouse (Kymouse) demonstrating full B-cell development competence. Comparison of the naïve B-cell receptor (BCR) repertoires of Kymice BCRs naïve human and murine BCR repertoires revealed key differences in germline gene usage and junctional diversification. These differences result in Kymice having CDRH3 length and diversity intermediate between mice and humans. To compare the structural space explored by CDRH3s in each species repertoire, we used computational structure prediction to show that Kymouse naïve BCR repertoires are more human-like than mouse-like in their predicted distribution of CDRH3 shape. Our combined sequence and structural analysis indicates that the naïve Kymouse BCR repertoire is diverse with key similarities to human repertoires, while im...

Fragment libraries designed to be functionally diverse recover protein binding information more efficiently than standard structurally diverse libraries

Current fragment-based drug design relies on the efficient exploration of chemical space though t... more Current fragment-based drug design relies on the efficient exploration of chemical space though the use of structurally diverse libraries of small fragments. However, structurally dissimilar compounds can exploit the same interactions on a target, and thus be functionally similar. Using 3D structures of many fragments bound to multiple targets, we examined if there exists a better strategy for selecting fragments for screening libraries. We show that structurally diverse fragments can be described as functionally redundant, often making the same interactions. Ranking fragments by the number of novel interactions they made, we show that functionally diverse selections of fragments substantially increase the amount of information recovered for unseen targets compared to other methods of selection. Using these results, we design small functionally efficient libraries that are able to give significantly more information about new protein targets than similarly sized structurally diverse...

Ranking of communities in multiplex spatiotemporal models of brain dynamics

Applied Network Science, 2022

As a relatively new field, network neuroscience has tended to focus on aggregate behaviours of th... more As a relatively new field, network neuroscience has tended to focus on aggregate behaviours of the brain averaged over many successive experiments or over long recordings in order to construct robust brain models. These models are limited in their ability to explain dynamic state changes in the brain which occurs spontaneously as a result of normal brain function. Hidden Markov Models (HMMs) trained on neuroimaging time series data have since arisen as a method to produce dynamical models that are easy to train but can be difficult to fully parametrise or analyse. We propose an interpretation of these neural HMMs as multiplex brain state graph models we term Hidden Markov Graph Models. This interpretation allows for dynamic brain activity to be analysed using the full repertoire of network analysis techniques. Furthermore, we propose a general method for selecting HMM hyperparameters in the absence of external data, based on the principle of maximum entropy, and use this to select t...

BIOINFORMATICS doi:10.1093/bioinformatics/btu447 Alignment-free

Vol. 30 ECCB 2014, pages i430–i437

Positioning of Membrane Proteins Positioning of Membrane Proteins Within the Lipid Bilayer Within the Lipid Bilayer

ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation

Antibodies are a key component of the immune system and have been extensively used as biotherapeu... more Antibodies are a key component of the immune system and have been extensively used as biotherapeutics. Accurate knowledge of their structure is central to understanding their antigen binding function. The key area for antigen binding and the main area of structural variation in antibodies is concentrated in the six complementarity determining regions (CDRs), with the most important for binding and most variable being the CDR-H3 loop. The sequence and structural variability of CDR-H3 make it particularly challenging to model. Recently deep learning methods have offered a step change in our ability to predict protein structures. In this work we present ABlooper, an end-to-end equivariant deep-learning based CDR loop structure prediction tool. ABlooper rapidly predicts the structure of CDR loops with high accuracy and provides a confidence estimate for each of its predictions. On the models of the Rosetta Antibody Benchmark, ABlooper makes predictions with an average CDR-H3 RMSD of 2.4...

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses

Journal of Chemical Information and Modeling, 2021

MHC binding affects the dynamics of different T-cell receptors in different ways

PLOS Computational Biology, 2019

Ligity: A Non-Superpositional, Knowledge-Based Approach to Virtual Screening

Journal of Chemical Information and Modeling, 2019

RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold

While template-free protein structure prediction protocols now produce good quality models for ma... more While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein do...

Association between a common immunoglobulin heavy chain allele and rheumatic heart disease risk in Oceania

Nature communications, May 11, 2017

The indigenous populations of the South Pacific experience a high burden of rheumatic heart disea... more The indigenous populations of the South Pacific experience a high burden of rheumatic heart disease (RHD). Here we report a genome-wide association study (GWAS) of RHD susceptibility in 2,852 individuals recruited in eight Oceanian countries. Stratifying by ancestry, we analysed genotyped and imputed variants in Melanesians (607 cases and 1,229 controls) before follow-up of suggestive loci in three further ancestral groups: Polynesians, South Asians and Mixed or other populations (totalling 399 cases and 617 controls). We identify a novel susceptibility signal in the immunoglobulin heavy chain (IGH) locus centring on a haplotype of nonsynonymous variants in the IGHV4-61 gene segment corresponding to the IGHV4-61*02 allele. We show each copy of IGHV4-61*02 is associated with a 1.4-fold increase in the risk of RHD (odds ratio 1.43, 95% confidence intervals 1.27-1.61, P=4.1 × 10(-9)). These findings provide new insight into the role of germline variation in the IGH locus in disease sus...

Generating weighted and thresholded gene coexpression networks using signed distance correlation

Even within well-studied organisms, many genes lack useful functional annotations. One way to gen... more Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct weighted gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to net...

Two examples of helix pairs, which are (A) not significantly different and (B) significantly different

<p>PDB code, chain identifier and residue numbers are given for each helix. The black resid... more <p>PDB code, chain identifier and residue numbers are given for each helix. The black residues are at the most disrupted site (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#sec002" target="_blank">Methods</a>) in each helix pair. <i>r</i><sub><i>n</i></sub> and <i>r</i><sub><i>c</i></sub> give the quality of the cylinder fit (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#pone.0157553.e001" target="_blank">Eq (1)</a>) to the backbone atoms on the N- (red) and C- (blue) terminal sides of the kink site. <i>θ</i> is the angle measured between the two cylinders. <i>ε</i> is the estimated error of the angle measurement, calculated from <i>r</i><sub><i>n</i></sub> + <i>r</i><sub><i>c</i></sub> using <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0157553#pone.0157553.e006" target="_blank">Eq (4)</a>. If <i>θ</i><sub>max</sub> − <i>θ</i><sub>min</sub> > <i>ε</i><sub>1</sub> + <i>ε</i><sub>2</sub>, the confidence intervals do not overlap therefore we consider the angles to be significantly different.</p

Distributions of angles measured at each site of the seven transmembrane helices in the GPCR family, after smoothing

<p>The label at each site shown on the <i>x</i>-axis is the Class A numbering u... more