(PDF) Progress in structure prediction of α-helical membrane proteins

Transmembrane (TM) proteins comprise 20-30% of the genome but, because of experimental difficulties, they represent less than 1% of the Protein Data Bank. The dearth of membrane protein structures makes computational prediction a potentially important means of obtaining novel structures. Recent advances in computational methods have been combined with experimental data to constrain the modeling of three-dimensional structures. Furthermore, threading and ab initio modeling approaches that were effective for soluble proteins have been applied to TM domains. Surprisingly, experimental structures, proteomic analyses and bioinformatics have revealed unexpected architectures that counter long-held views on TM protein structure and stability. Future computational and experimental studies aimed at understanding the thermodynamic and evolutionary bases of these architectural details will greatly enhance predictive capabilities.

We review recent computational advances in the study of membrane proteins, focusing on those that have at least one transmembrane helix. Transmembrane protein regions are, in many respects, easier to investigate computationally than experimentally, due to the uniformity of their structure and interactions (e.g. consisting predominately of nearly parallel helices packed together) on one hand and presenting the challenges of solubility on the other. We present the progress made on identifying and classifying membrane proteins into families, predicting their structure from amino-acid sequence patterns (using many different methods), and analyzing their interactions and packing The total result of this work allows us for the first time to begin to think about the membrane protein interactome, the set of all interactions between distinct transmembrane helices in the lipid bilayer.

Abstract Since high-resolution structural data are still scarce, different kinds of theoretical structure prediction algorithms are of major importance in membrane protein biochemistry. But how well do the current prediction methods perform? Which structural features can be predicted and which cannot? And what can we expect in the next few years?

Progress in structure prediction of a-helical membrane proteins Sarel J Fleishman and Nir Ben-Tal Transmembrane (TM) proteins comprise 20–30% of the genome but, because of experimental difficulties, they represent less than 1% of the Protein Data Bank. The dearth of membrane protein structures makes computational prediction a potentially important means of obtaining novel structures. Recent advances in computational methods have been combined with experimental data to constrain the modeling of three-dimensional structures. Furthermore, threading and ab initio modeling approaches that were effective for soluble proteins have been applied to TM domains. Surprisingly, experimental structures, proteomic analyses and bioinformatics have revealed unexpected architectures that counter long-held views on TM protein structure and stability. Future computational and experimental studies aimed at understanding the thermodynamic and evolutionary bases of these architectural details will greatly enhance predictive capabilities. Addresses Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel-Aviv University Ramat Aviv 69978, Israel Corresponding author: Ben-Tal, Nir (nirb@tauex.tau.ac.il) Current Opinion in Structural Biology 2006, 16:496–504 This review comes from a themed issue on Membranes Edited by Roderick MacKinnon and Gunnar von Heijne Available online 5th July 2006 0959-440X/$ – see front matter # 2006 Elsevier Ltd. All rights reserved. DOI 10.1016/j.sbi.2006.06.003 Introduction Transmembrane (TM) proteins comprise 20-30% of the genome [1,2] and are involved in many crucial cellular processes, such as cell-to-cell signaling, metabolite transport and energy production. Solving the structures of these proteins is therefore imperative for clear mechanistic understanding of central processes in physiology. However, despite recent advances in production of TM protein crystals, membrane protein structures are difficult to obtain and comprise less than 1% of the entries in the Protein Data Bank (PDB) [3]. Comparative- or homology-based approaches to structure prediction have been immensely successful with soluble proteins [4]. These methods require a homologous protein, for which a structure has been solved. Because of this Current Opinion in Structural Biology 2006, 16:496–504 requirement, homology modeling has been most useful for the few TM protein families, for which at least one member has been crystallized. A recent analysis of homology-modeling accuracy for membrane proteins has shown that the protocols that are successful in comparative modeling of soluble proteins reach similar achievements for membrane proteins [5]. However, because at present only few representative atomic-resolution structures of TM protein families are available, homology modeling cannot serve as a general purpose approach for structural modeling. In this review, we will therefore focus on recent advances in structure prediction that do not rely on homology to solve structures (subject covered in [6,7]). Membrane protein folding can be conceptually decomposed into two consecutive steps: folding of the individual hydrophobic segments into helices followed by helix association (Figure 1) [8]. Accordingly, the problem of predicting the structure of a-helical TM proteins has been approached by breaking it down into the following steps: (i) delineating the boundaries of the TM segments, each of which will assume a helical conformation; (ii) determining the topology of the protein (i.e. which extramembrane segments reside inside the cytoplasm and, conversely, which segments reside outside the cell); and (iii) predicting the tertiary conformation of the protein (i.e. the way in which the helices are packed with respect to one another). The past few years have seen considerable advances in all of these steps. In this review, we will describe some of these advances and emphasize the discovery of novel features of TM protein folds that bear on the goal of structure prediction. Identification of TM a-helices in the protein sequence Early attempts for predicting the locations in the sequence of membrane-integral segments were based on the notion that a sequence segment would partition into the membrane if it were sufficiently long and hydrophobic. Starting with the method of Kyte and Doolittle [9], various algorithms for detecting membraneembedded sequence segments were proposed on the basis of experimental and computational data. At the core of these methods lies a hydrophobicity scale that assigns to each amino acid residue a score that can be roughly interpreted as the free energy of its transfer from hydrophilic to hydrophobic media, corresponding to its insertion probability into the membrane. The typical approach would then be to search the sequence for a sufficiently hydrophobic stretch of residues comprising www.sciencedirect.com Structure prediction of a-helical membrane proteins Fleishman and Ben-Tal 497 Figure 1 TM protein folding can be thought to proceed in two stages [8]: the folding of individual TM segments into helices (top) followed by helix packing (bottom). The topology of the protein is often determined by the positive-inside rule [17], with the cytoplasmic loops tending to be enriched by positively charged residues in comparison with the extracellular loops. approximately 20 amino acids, which is the minimal length necessary for an a-helix to traverse the 30 Å hydrophobic core of the membrane [10]. During the 90s, there was a departure from physicochemically based approaches to methods that rely on statistical inference, such as hidden Markov models, support vector machines and neural nets, all of which make use of the existing knowledge on the partitioning of particular sequence segments to the membrane. These methods appeared at first to be superior to the simple hydrophobicity-based methods, with success rates of 90% and above [1]. However, a fundamental difficulty in the validation of statistical methods is to obtain sufficiently disparate datasets for training and validation. Indeed, when Rost and co-workers recently revisited the problem of TM sequence prediction [11] using datasets that were carefully constructed with the aim of decreasing redundancy, they found that the success of the statistical approaches was overrated, and they in fact achieved results that were not much better than those that were obtained by some of the hydrophobicity-based methods. In this respect it is important to emphasize that an overlap of only three amino acids between the predicted and observed helices is considered sufficient for being an accurate prediction [11]. Thus, in a recent survey it was demonstrated that, on average, the best-performing prediction methods were in error by a little more than two turns at the helix termini [12]. Because most structural modeling approaches rely on the correct identification of the helical segments in the sequence (see below), these large errors are likely to propagate in subsequent modeling stages, requiring manual intervention. A more alarming conclusion made in this survey concerned the www.sciencedirect.com inability of current prediction methods to identify ‘irregular’ structures, such as half helices and re-entrant loops, as those seen in the structure of the potassium channel (Figure 2) [13] and the aquaporin family [14]. Hopefully, with the likely increase in the number of proteins exhibiting such irregularities over the next few years, some unifying principles will emerge from their sequences, enabling prediction of these features. Recently, the hydrophobicity-based approach to detecting membrane-embedded segments was given another boost from the experimental studies by von Heijne and co-workers [15]. The authors reported a series of experiments that attempted to obtain a hydrophobicity scale using an experimental setup that is far closer to the physiological system than previous experimental reports, including the translocon protein-conducting channel and membranes from the endoplasmic reticulum (ER). Concerns were raised regarding the possibility that some of the measured partitioning energies encompass contributions from interactions between the probe sequence segments and other protein components in the system, thus limiting the generality of the scale produced by these measurements [16]. Nevertheless, this experimental Figure 2 The potassium channel [13] is one of the several structures of membrane proteins that show structural ‘irregularities’, such as half helices (blue) and re-entrant loops. These irregularities cannot be identified from the sequence by current methods [12]. For clarity, only three out of four of the subunits comprising the potassium channel are shown. Figure generated with MolScript [70] and rendered with Raster3d [71]. Figure reproduced with permission from [37]. Current Opinion in Structural Biology 2006, 16:496–504 498 Membranes approach is promising, raising hope that the prediction of the location of TM helices in the sequence of membrane proteins will eventually be based on algorithms that account for the various factors that affect protein translocation in biological systems. Topology Determining the topology of a membrane protein is a crucial preliminary step to modeling its structure as it constrains the way individual TM segments could associate within the membrane, as well as subunits within complexes. The positive-inside rule (i.e. the observation that the segments in the cytoplasmic loops and the TM segments that are adjacent to the cytoplasm are often enriched in the positively charged lysine (K) and arginine (R) residues when compared with the extracellular loops (Figure 1) [17]) has remained the most powerful tool for predicting the topology of a protein from its sequence for almost two decades. The factors contributing to the (K + R) bias are under intense study, and it is still unclear whether the bias originates from properties of the translocon [18] or the cytoplasmic membrane [19], but a recent statistical survey of 107 genomes reconfirmed the validity of this empirical rule [20]. The (K + R) bias can serve as a rule for predicting topology, by requiring that more positively charged residues face the cytoplasm [1]. Recently, von Heijne and co-workers have conducted a whole-proteome experimental analysis of the topology of TM proteins in the Escherichia coli inner membrane [21]. They used two reporter proteins that were linked to the C-terminus of each putative membrane-integral protein in E. coli. One of these reporters is only active in the cytoplasm, whereas the other is exclusively activated in the periplasm. By measuring the activities of the reporters, the authors assigned the topology of 601 out of 700 predicted TM proteins in the E. coli genome. Comparing these data to the predictions of a widely used algorithm that is based on a hidden Markov model called TMHMM [2], the authors found that roughly 80% of the predictions were in accord with the experimentally determined topologies. This correlation shows that the major aspects affecting protein topology are captured by contemporary computational methods, but that these still have significant room for improvement. These experimental results can serve as a much-needed large-scale benchmark for validation and comparison of future topology prediction algorithms. The vast majority of proteins in von Heijne and coworkers’ analysis exhibited unique topology [21], whereby their C-terminus was found to be either cytoplasmic or periplasmic. However, for five out of 601 proteins both reporters were activated, implying that for each of these five proteins, some of the protein copies inserted with one topology, and the others with the reverse topology [21,22]. The five proteins with dual Current Opinion in Structural Biology 2006, 16:496–504 topology are relatively small in size, comprising 100 amino acid residues and are predicted to contain four TM domains. Furthermore, as expected, all five exhibit very small (K + R) biases. For at least one of these proteins, the prototypical small multidrug resistance antiporter EmrE, the suggestion of dual topology was already made in the past on the basis of structural data and the lack of clear (K + R) bias [23]. Nevertheless, it is important to note that a previous study based on a different biochemical assay reported a unique topology for this protein [24]. This conflict between two lines of experimental evidence still needs to be resolved, but the suggestion that some TM proteins insert with opposite topology has significant implications for understanding structures and functions of these proteins. Threading and ab initio structure prediction On the one hand, integral membrane proteins exhibit much higher uniformity of secondary structure (mostly ahelical bundles) than soluble proteins, and are highly constrained in their conformations because of the presence of the membrane [25]. It could therefore be expected that ab initio structure prediction, whereby the protein structure is predicted without resorting to homology with other proteins or to experimental data, should be a more feasible goal for TM than for soluble proteins. On the other hand, as sampling significant portions of conformation space remains a very challenging aspect of ab initio structure prediction [26], success in soluble protein structure prediction has been restricted to small proteins, consisting of approximately 80 amino acid residues [27]. Membrane proteins are usually much larger; for instance, visual rhodopsin, which serves as a prototype for the large family of 7-TM GPCRs, consists of more than 300 amino acid residues. Two similar methods, MembStruk [28–31] and PREDICT [32,33], were specifically tailored to predict the structures of GPCRs on the basis of physicochemical principles. For both methods, a full-atom model of the GPCR is automatically obtained, based on the amino acid sequence of the protein alone. In the first step, the boundaries of the seven TM helices are predicted by means of hydrophobicity scales. A preliminary (tentative) coarse-grained model of the packing of these helices into a compact and closed structure is constructed, and various conformations in the vicinity of this state are sampled at random, favoring conformations in which hydrophobic residues face the lipid. Full-atom models of the TM domains of these structures are built and subjected to several cycles of optimization using molecular dynamics (MD) simulations. The outcome is a full-atom model of the entire protein, including the extra-membrane loops. The methods produced 3D models of bovine rhodopsin, the only GPCR structure available in the PDB, with 3 Å root-mean-square deviation (RMSD) from the native structure in the TM region. Further validation of this www.sciencedirect.com Structure prediction of a-helical membrane proteins Fleishman and Ben-Tal 499 approach includes in silico docking of known drug-like compounds to the receptors. Model structures of several GPCRs, including the b2 adrenergic [30] and D2 dopamine [28] receptors, were built this way and used successfully for drug design [32]. This suggests that important structural aspects of the ligand-binding site were accurately captured by these methods. However, it was not shown unambiguously that the remainder of the structure is correct too. Another potentially promising approach utilizes the twostep TASSER method that threads the sequence on parts of solved protein structures, and then refines the resulting template [34]. Validation on a set of 38 nonhomologous TM protein structures yielded 17 structures for which the RMSD to native was less than 6.5 Å, but many others with RMSD to native greater than 10 Å. When applied to predicting the structure of bovine rhodopsin, TASSER produced a model with a low 2.1 Å RMSD from native on the Ca coordinates of the TM domain. Subsequently, the method was applied to model the structures of most of the 900 human GPCRs, and a few of these models were examined and appeared to be consistent with the available experimental data. It is important to note that although the method’s success in modeling rhodopsin is promising, only a few other GPCRs showed substantial similarity (>30% sequence identity) to bovine rhodopsin [7,34], and it is therefore uncertain that the other models are as faithful to the native state as the model of rhodopsin. Also, it is not known yet whether TASSER’s GPCR models are likely to be closer to the receptors’ inactive or active form, the latter of which is pharmaceutically more interesting [7]. Nevertheless, the models generated by TASSER might provide an important resource for probing structure–function relationships in this important class of receptors, as many of the current approaches to modeling GPCR structures rely on homology to bovine rhodopsin [6], despite the low sequence identity. Recently, the Rosetta algorithm for structure prediction, which has been successful in the free-modeling category of the community-wide experiment on critical assessment of structure prediction (CASP) [35], was adopted and implemented for TM protein structures [36]. Inter-residue contact potentials were derived from a set of solved protein structures, and enriched with their sequence homologues. Validation on a set of solved TM protein structures showed that the performance of this implementation of Rosetta (below 4 Å for 51–145 of the superimposed residues) is comparable to that of Rosetta for soluble proteins in the same size range. Although fullatom prediction was shown to produce significant improvements in prediction accuracy of soluble proteins [27], it was not tested in this implementation of Rosetta, partly because of the prohibitive computational load associated with full-atom prediction for large proteins. www.sciencedirect.com Structure prediction based on experimental constraints One potential venue for obtaining novel structures, which has been explored by several groups in recent years, is the exploitation of functional and low-resolution structural data on TM proteins to constrain models [37]. Such data could involve site-specific mutagenesis, chemical crosslinking, intermediate-resolution structures and biophysical data, such as NMR, EPR and FTIR. These heterogeneous data are interpreted as constraints on the positions of individual amino acid residues or on the structural relationships among them. For instance, positions that are intolerant to substitution are likely to be packed inside the protein core, and positions that crosslink are likely to be vicinal. In addition to these experimental data, the modeling methods assume that the hydrophobic sequence segments form a-helices that traverse the membrane. The pioneering work of Herzyk and Hubbard [38] employing such disparate data sources produced very promising results, with a model of bacteriorhodopsin matching the native-state structure by a low 1.87 Å RMSD. However, further modeling attempts that relied primarily on mutation and crosslinking data demonstrated that it is difficult to interpret many of these data in a structurally unequivocal way [37]. Recent implementations of this approach have therefore relied on more limited data sources. For instance, a method was suggested recently that employs data that can be interpreted as distance constraints between amino acid residues from EPR, FTIR and chemical crosslinking [39]. Models consisting of a-helices were sampled using a Monte Carlo strategy. The conformations were scored according to the extent to which they satisfied the experimental distance constraints and structural parameters derived from a set of solved TM proteins, including preferred helix-packing angles and distances, pairwise amino acid contact preferences and overall structural compactness. Encouragingly, this method was shown to produce a model of rhodopsin, which was 3.2 Å RMSD from the native-state structure, based on only 27 experimentally derived distance constraints (taken from published studies), demonstrating that it might be possible to obtain close-to-native models of large membrane proteins on the basis of a limited set of experimental constraints. Several groups have recently suggested methods that employ data from cryo-electron microscopy (cryo-EM) intermediate-resolution structures, together with data on hydrophobicity, evolutionary patterns and the lengths of the loops that connect neighboring TM segments [37]. For several proteins, cryo-EM structures are available at in-plane resolutions of 5–10 Å (e.g. the gap junction [40] and EmrE [23]). At this resolution, it is impossible to either position individual amino acid residues, or even unambiguously identify the assignment of TM segments Current Opinion in Structural Biology 2006, 16:496–504 500 Membranes to the helices observed in the cryo-EM structure. Hence, structure prediction based on cryo-EM is typically comprised of helix assignment, followed by orientation of the helices around their principal axes. To solve the helix assignment problem, various studies used biochemical data on the functional roles of individual TM segments [41,42]. A complementary approach relies on the fact that some of the loops that connect TM helices are quite short (less than eight amino acid residues). Such short loops constrain the distance between the helix termini that they connect. Based on this constraint, an algorithm was recently suggested, which, for a given cryo-EM structure and the lengths of each of the interconnecting loops, scans all possible assignments (potentially n! permutations, where n is the number of helices in the map), and ranks them by their compatibility with the cryo-EM structure [43]. The performance of the algorithm was found to be sensitive to the exact delineation of the helix start and end points, which are difficult to predict with accuracy. Another proposed method that suffers less from such sensitivity ranks each TM sequence segment according to its overall hydrophobicity and evolutionary conservation [44]. Highly conserved and hydrophilic segments were ranked as helices that are likely to be buried within the protein core, and more variable and hydrophilic segments were assigned to lipid-exposed positions. Once the helix assignment problem is solved for a given protein, canonical a-helices are constructed to fit the data in the cryo-EM map, and are rotated around their principal axes to identify the native state conformation. Following the work of Baldwin et al. [45] on the prediction of the structure of the TM domain of rhodopsin based on its cryo-EM structure and sequence analysis, recently two similar methods [46,47] were independently suggested. It was shown that the cores of many TM protein structures are much more evolutionarily conserved than their peripheries, and tend to pack the most polar residues [48]. These observations can be framed as predictive rules, according to which orientations that pack conserved and hydrophilic positions in the helix bundle are more favored than others. One of the methods generates only Ca models [47], whereas the other adds sidechains and uses manual refinements and minimization to generate fullatom models [46]. It should be noted, however, that often the energy landscape for full-atom models is extremely rugged and even 1 Å differences in the atom positions from the native-state structure can result in large energy penalties [26]; thus, it still remains to be seen whether the addition of sidechains improves the resulting models. The two methods were applied to intermediate-resolution structures of TM proteins, for which atomic-resolution data were not available: the oxalate transporter OxlT [46] and the gap junction [49]. Because the evolutionary-conservation pattern on two of the helices of the gapCurrent Opinion in Structural Biology 2006, 16:496–504 junction forming protein, connexin, was not informative enough to constrain their orientations, another sequence analysis method [50] was employed that identified correlated amino acid positions, thus predicting which pairs of amino acid residues could interact. Part of the attractiveness of an approach to structure prediction, which uses information from sequences and cryo-EM structures, lies in the fact that it does not necessarily rely on large amounts of previously published functional data. Hence, it is possible to subsequently use these data for validation. In the modeling of the gap junction TM domain, for instance, it was shown that, although the model was not constrained by clinical data, it placed almost 30 diseasecausing but physicochemically mild mutations in the core of the helix bundle, where they would disrupt folding, whereas two physicochemically radical polymorphisms were placed in more spacious regions of the protein structure [49]. Similarly, the model structure of OxlT placed residues that were found to crosslink in experimental assays in proximal positions [46]. Kinks in TM proteins are known to have important functional roles [51,52] but, until recently, could not be predicted from sequence information. Recently, it was shown that, in many cases where a kink is present in a TM protein structure, prolines are observed in the multiplesequence alignment, even if the solved protein structure does not contain a proline at that position [53]. The direction and magnitude of the kink might also be predicted from local sequence features [54]. Accordingly, it might be possible to model kinks where these have been observed in low-resolution structures, as in EmrE [23], or to bias the ab initio predictions to produce kinks and, thus, generate more native-like models. Computational validation of structures Recently, a small number of atomic resolution structures of membrane-integral proteins were suggested to represent conformations that are distorted with respect to the native-state structure [55,56]. Atomic resolution structures inspire a large amount of (usually very productive) work aimed at understanding structure–function relationships. Conversely, physiologically irrelevant structures might cause much work to be done in vain, on top of supplying a wrong view of the protein. Usually, the ultimate test for the physiological relevance of a structure is its compatibility with carefully crafted biochemical and biophysical analysis. However, such analyses are often difficult to conduct. Because some of the computational analyses described above can be used to predict the structures of membrane-integral proteins, it is reasonable to expect that they might provide grounds for doubting structures that have not been sufficiently supported by biochemical data. As an example of this approach, Figure 3 shows two structures of the bacterial multidrug resistance protein EmrE obtained by X-ray crystallography at 3.8 Å and 3.7 Å resolution [57,58]. Both structures www.sciencedirect.com Structure prediction of a-helical membrane proteins Fleishman and Ben-Tal 501 Figure 3 are clearly at odds with the observation made on many TM protein structures that evolutionarily conserved positions tend to be packed in the core of the a-helix bundle, whereas the variable residues face the lipid environment [46,47,59,60]. The discrepancy between the conservation pattern and the packing of residues parallels an analysis, reported in this issue of Current Opinion in Structural Biology [61], that compares these structures with the known biochemical and biophysical data on EmrE, concluding that they most likely do not represent the physiological native state of the protein. Future directions Two recently solved structures of homodimers of the multidrug resistance protein EmrE from E. coli are shown, which are incompatible with the observation that amino acid residues at the core of many membrane-integral proteins tend to be evolutionarily conserved, whereas those on the periphery are variable. (a) The structure of substrate-bound EmrE [58] exhibits highly variable residues on helix M2 forming tight contacts with M3, whereas highly conserved positions on M1, M2, M3, M3’ and M4’ are exposed to lipid. The substrate tetraphenylphosphonium molecule is shown in space-fill mode, with the phosphate colored in yellow, and carbon atoms in green. The structure is viewed perpendicular to the proposed membrane plane. (b) Similarly, the structure of EmrE without bound substrate [57] locates highly variable residues in the tight interface formed between M2 and M2’, and highly conserved residues on M1, M4, M1’, M3’, and M4’ in lipid exposed positions. The incompatibility between the conservation pattern and the burial of amino acid residues parallels the observation that both structures have many features that are in contradiction with biochemical data on EmrE [61]. Evolutionary conservation was computed using a multiple-sequence alignment of 99 small multidrug resistance proteins with the ConSurf webserver [72]. Figure generated with MolScript [70] and rendered with Raster3d [71]. www.sciencedirect.com In recent years, computational methods have been implemented for the prediction of TM protein structures. However, the roles of different energetic factors in contributing to TM protein folding are still poorly understood [25,62] and therefore difficult to predict. For instance, it was proposed that in low-dielectric environments polar bonds would make a large contribution to protein stability [10]. Indeed, in engineered systems, hydrogen bonds were shown to drive the interaction between TM helices [63,64], but recent measurements of the strengths of polar interactions in membrane proteins have yielded smaller magnitudes [65,66] than anticipated by computations on ideal hydrogen bonds [67,68]. Based on these and other measurements of the energetics of helix association in the membrane, it has been suggested that the primary contribution to helix interactions in the membrane comes from van der Waals packing and originates from buried surface area as in soluble proteins [69]. This suggestion, which requires additional experimental support, is crucial because it implies that the major factors that are currently embodied in ab initio methods for structure prediction in soluble proteins, such as steric packing [27], might be equally useful in membrane-integral proteins. It is likely that the relative contributions of polar and van der Waals interactions to membrane protein stability will continue to be a matter of intense experimental investigation over the next few years, and that the lessons learned from these studies will be incorporated into the force fields of ab initio and threading algorithms for membrane proteins [34,36]. The use of these lessons could reduce, in part, the need for deriving pairwise contact potentials from the small number of solved TM protein structures. One impediment on the way to the application of ab initio techniques to membrane proteins is the fact that these proteins are very large in comparison with soluble proteins, to which these methods were successfully applied, thus making full-atom prediction impractical [36]. However, as modeling approaches that make use of experimental information, such as cryo-EM low-resolution structures and distance constraints, have been clearly successful in identifying near-native although coarsegrained conformations of TM proteins [38,39,45–47], a Current Opinion in Structural Biology 2006, 16:496–504 502 Membranes synergy might be attainable from combining these methods with full-atom predictions. This would result in reliable atomic models at a computationally feasible cost. With the advent of new structures and the application of novel biochemical assays to membrane-integral proteins, the last few years have seen a large increase in the qualitative understanding of TM protein folds. This improved understanding has gone hand-in-hand with more sophisticated prediction and modeling attempts. Undoubtedly, the new structures and structure–function analyses that will be conducted over the next few years will teach us many lessons on the possible architectures of TM proteins and their governing thermodynamic principles, further increasing our predictive capabilities. Update Recently, the Rosetta membrane methodology [36] was adapted and applied to study the voltage-induced conformational changes in the voltage-dependent potassium (Kv) channels [73]. Open and closed conformations were computed for the eukaryotic Kv1.2 channel and for the bacterial KvAP on the basis of the published methodology, the homology to X-ray structures of these channels and several experimental constraints. The computed open conformation of Kv1.2 was close to its crystal structure, thus serving as partial validation for the approach. Interestingly, the results suggest that the conformational changes in the voltage-sensor domain of the bacterial protein are larger than the changes in Kv1.2, which could explain the large inconsistencies between functional studies of the bacterial and eukaryotic channels. Acknowledgements The authors thank SE Harrington, JU Bowie, J Skolnick, O Kalid and CG Tate for critical reading, and B Honig, LR Forrest, L Adamian, J Liang, CG Tate and DT Jones for providing manuscripts before publication. This study was supported by a grant 222/04 from the Israel Science Foundation to N B-T. SJF was supported by a doctoral fellowship from the Clore Israel Foundation. References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest 1. Rost B, Fariselli P, Casadio R: Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci 1996, 5:1704-1718. 2. Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305:567-580. 3. White SH: The progress of membrane protein structure determination. Protein Sci 2004, 13:1948-1949. 4. Petrey D, Honig B: Protein structure prediction: inroads to biology. Mol Cell 2005, 20:811-819. 5. Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 2006, in press. Current Opinion in Structural Biology 2006, 16:496–504 An evaluation of homology modeling applied to TM proteins that segregated the available TM protein structures into families according to sequence homology, and attempted to predict the structures of proteins using their homologues as templates. It was concluded that those methods that were shown to work well for soluble proteins work equally well for TM proteins. 6. Fanelli F, De Benedetti PG: Computational modeling approaches to structure–function analysis of G proteincoupled receptors. Chem Rev 2005, 105:3297-3351. 7. Oliveira L, Hulsen T, Lutje Hulsik D, Paiva AC, Vriend G: Heavier-than-air flying machines are impossible. FEBS Lett 2004, 564:269-273. An extensive evaluation of modeling approaches applied to GPCRs, particularly to the use of rhodopsin’s structure as a template. 8. Popot JL, Engelman DM: Membrane protein folding and oligomerization: the two-stage model. Biochemistry 1990, 29:4031-4037. 9. Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157:105-132. 10. White SH, Wimley WC: Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct 1999, 28:319-365. 11. Chen CP, Kernytsky A, Rost B: Transmembrane helix predictions revisited. Protein Sci 2002, 11:2774-2791. 12. Cuthbertson JM, Doyle DA, Sansom MS: Transmembrane helix prediction: a comparative evaluation and analysis. Protein Eng Des Sel 2005, 18:295-308. 13. Doyle DA, Morais Cabral J, Pfuetzner RA, Kuo A, Gulbis JM, Cohen SL, Chait BT, MacKinnon R: The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science 1998, 280:69-77. 14. Fu D, Libson A, Miercke LJ, Weitzman C, Nollert P, Krucinski J, Stroud RM: Structure of a glycerol-conducting channel and the basis for its selectivity. Science 2000, 290:481-486. 15. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G: Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 2005, 433:377-381. This article reports the use of an experimental system to probe the energetics of the transfer of peptides between translocated and membrane-inserted forms, using an experimental setup very close to physiological conditions. Thus, the authors derive a hydrophobicity scale. 16. Shental-Bechor D, Fleishman SJ, Ben-Tal N: Has the code of protein translocation been broken? Trends Biochem Sci 2006, 31:192-196. A critique of the thermodynamic quantities obtained by Hessa et al. [15] in their analysis of peptide insertion into the membrane. It is argued that the more polar peptides might be stabilized by other protein components in the experiment, causing the energetic penalty on the transfer for polar amino acid residues to appear lower than it actually is. 17. von Heijne G, Gavel Y: Topogenic signals in integral membrane proteins. Eur J Biochem 1988, 174:671-678. 18. Goder V, Junne T, Spiess M: Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol Biol Cell 2004, 15:1470-1478. 19. van Klompenburg W, Nilsson I, von Heijne G, de Kruijff B: Anionic phospholipids are determinants of membrane protein topology. EMBO J 1997, 16:4261-4266. 20. Nilsson J, Persson B, von Heijne G: Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins 2005, 60:606-616. 21. Daley DO, Rapp M, Granseth E, Melen K, Drew D, von Heijne G: Global topology analysis of the Escherichia coli inner membrane proteome. Science 2005, 308:1321-1323. A whole-proteome analysis of the topology of proteins in E. coli that are predicted to be transmembrane. The data could serve as a benchmark for future studies and evaluations of topology prediction algorithms. Five out of 601 proteins were identified as having putative dual topology, with www.sciencedirect.com Structure prediction of a-helical membrane proteins Fleishman and Ben-Tal 503 some of the protein copies inserting into the membrane with one topology and others with the reverse topology. 22. Rapp M, Granseth E, Seppala S, von Heijne G, Daley DO, Melen K, Drew D: Identification and evolution of dualtopology membrane proteins. Nat Struct Mol Biol 2006, 13:112-116. 23. Ubarretxena-Belandia I, Baldwin JM, Schuldiner S, Tate CG: Three-dimensional structure of the bacterial multidrug transporter EmrE shows it is an asymmetric homodimer. EMBO J 2003, 22:6175-6181. A review of approaches for modeling TM protein structures based on intermediate resolution data. Some experimental data, particularly from crosslinking, are sometimes found to bias models away from the native state structures. 38. Herzyk P, Hubbard RE: Automated method for modeling sevenhelix transmembrane receptors from experimental data. Biophys J 1995, 69:2419-2442. 39. Sale K, Faulon JL, Gray GA, Schoeniger JS, Young MM: Optimal bundling of transmembrane helices using sparse distance constraints. Protein Sci 2004, 13:2613-2627. 24. Ninio S, Elbaz Y, Schuldiner S: The membrane topology of EmrE — a small multidrug transporter from Escherichia coli. FEBS Lett 2004, 562:193-196. 40. Unger VM, Kumar NM, Gilula NB, Yeager M: Three-dimensional structure of a recombinant gap junction membrane channel. Science 1999, 283:1176-1180. 25. Bowie JU: Solving the membrane protein folding problem. Nature 2005, 438:581-589. 41. Hirai T, Heymann JA, Maloney PC, Subramaniam S: Structural model for 12-helix transporters belonging to the major facilitator superfamily. J Bacteriol 2003, 185:1712-1718. 26. Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D: Progress in modeling of protein structures and interactions. Science 2005, 310:638-642. 42. Baldwin JM: The probable arrangement of the helices in G protein-coupled receptors. EMBO J 1993, 12:1693-1703. 27. Bradley P, Misura KM, Baker D: Toward high-resolution de novo structure prediction for small proteins. Science 2005, 309:1868-1871. 43. Enosh A, Fleishman SJ, Ben-Tal N, Halperin D: Assigning transmembrane segments to helices in intermediateresolution structures. Bioinformatics 2004, 20:I122-I129. 28. Kalani MY, Vaidehi N, Hall SE, Trabanino RJ, Freddolino PL, Kalani MA, Floriano WB, Kam VW, Goddard WA III: The predicted 3D structure of the human D2 dopamine receptor and the binding site and binding affinities for agonists and antagonists. Proc Natl Acad Sci USA 2004, 101:3815-3820. 44. Adamian L, Liang J: Prediction of buried helices in multispan a helical membrane proteins. Proteins 2006, 63:1-5. 29. Trabanino RJ, Hall SE, Vaidehi N, Floriano WB, Kam VW, Goddard WA III: First principles predictions of the structure and function of G-protein-coupled receptors: validation for bovine rhodopsin. Biophys J 2004, 86:1904-1921. 30. Freddolino PL, Kalani MY, Vaidehi N, Floriano WB, Hall SE, Trabanino RJ, Kam VW, Goddard WA III: Predicted 3D structure for the human b2 adrenergic receptor and its binding site for agonists and antagonists. Proc Natl Acad Sci USA 2004, 101:2736-2741. 31. Vaidehi N, Floriano WB, Trabanino R, Hall SE, Freddolino P, Choi EJ, Zamanakos G, Goddard WA III: Prediction of structure and function of G protein-coupled receptors. Proc Natl Acad Sci USA 2002, 99:12622-12627. 32. Becker OM, Marantz Y, Shacham S, Inbal B, Heifetz A, Kalid O, Bar-Haim S, Warshaviak D, Fichman M, Noiman S: G protein-coupled receptors: in silico drug discovery in 3D. Proc Natl Acad Sci USA 2004, 101:11304-11309. 33. Shacham S, Marantz Y, Bar-Haim S, Kalid O, Warshaviak D, Avisar N, Inbal B, Heifetz A, Fichman M, Topf M et al.: Predict modeling and in-silico screening for G-protein coupled receptors. Proteins 2004, 57:51-86. 34. Zhang Y, Devries ME, Skolnick J: Structure modeling of all identified G protein-coupled receptors in the human genome. PLoS Comput Biol 2006, 2:e13. An adaptation of the TASSER algorithm for threading and refinement of protein structures to membrane proteins. The algorithm was validated on several proteins of solved structure, and then applied to predicting the structure of most human GPCRs. The resource of predicted structures is available at http://cssb.biology.gatech.edu/skolnick/files/gpcr/ gpcr.html. 35. Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KM, Baker D: Free modeling with Rosetta in CASP6. Proteins 2005, 61:128-134. 36. Yarov-Yarovoy V, Schonbrun J, Baker D: Multipass membrane protein structure prediction using Rosetta. Proteins 2006, 62:1010-1025. An adaptation of the Rosetta algorithm for ab initio protein structure prediction to membrane proteins. The quality of the predicted models was similar to that obtained for soluble proteins. Full-atom prediction was not attempted because of the computational cost of such implementations in large proteins. 37. Fleishman SJ, Unger VM, Ben-Tal N: Transmembrane protein structures without X-rays. Trends Biochem Sci 2006, 31:106-113. www.sciencedirect.com 45. Baldwin JM, Schertler GF, Unger VM: An a-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors. J Mol Biol 1997, 272:144-164. 46. Beuming T, Weinstein H: Modeling membrane proteins based on low-resolution electron microscopy maps: a template for the TM domains of the oxalate transporter OxlT. Protein Eng Des Sel 2005, 18:119-125. 47. Fleishman SJ, Harrington S, Friesner RA, Honig B, Ben-Tal N: An automatic method for predicting the structures of transmembrane proteins using cryo-EM and evolutionary data. Biophys J 2004, 87:3448-3459. 48. Hurwitz N, Pellegrini-Calace M, Jones DT: Towards genomescale structure prediction for transmembrane proteins. Philos Trans R Soc Lond B Biol Sci 2006, 361:465-475. 49. Fleishman SJ, Unger VM, Yeager M, Ben-Tal N: A C-a model for the transmembrane a-helices of gap-junction intercellular channels. Mol Cell 2004, 15:879-888. A cryo-EM map of the gap junction was used together with evolutionaryconservation and correlated-mutations analyses to predict a model structure of the TM domain. The model puts disease-causing point mutations in structurally packed regions of the model. 50. Fleishman SJ, Yifrach O, Ben-Tal N: An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J Mol Biol 2004, 340:307-318. 51. Ubarretxena-Belandia I, Engelman DM: Helical membrane proteins: diversity of functions in the context of simple architecture. Curr Opin Struct Biol 2001, 11:370-376. 52. Abramson J, Smirnova I, Kasho V, Verner G, Kaback HR, Iwata S: Structure and mechanism of the lactose permease of Escherichia coli. Science 2003, 301:610-615. 53. Yohannan S, Faham S, Yang D, Whitelegge JP, Bowie JU: The evolution of transmembrane helix kinks and the structural diversity of G protein-coupled receptors. Proc Natl Acad Sci USA 2004, 101:959-963. This analysis finds that in most cases where a proline is not observed in a kinked region of a TM protein structure, the multiple-sequence alignment exhibits a proline in several sequence homologues. This observation provides an approach for predicting the locations of kinks in protein structures. 54. Deupi X, Olivella M, Govaerts C, Ballesteros JA, Campillo M, Pardo L: Ser and Thr residues modulate the conformation of pro-kinked transmembrane a-helices. Biophys J 2004, 86:105-115. 55. Lee SY, Lee A, Chen J, MacKinnon R: Structure of the KvAP voltage-dependent K+ channel and its dependence on the lipid membrane. Proc Natl Acad Sci USA 2005, 102:15441-15446. Current Opinion in Structural Biology 2006, 16:496–504 504 Membranes 56. Davidson AL, Chen J: Structural biology. Flipping lipids: is the third time the charm? Science 2005, 308:963-965. 57. Ma C, Chang G: Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli. Proc Natl Acad Sci USA 2004, 101:2852-2857. 58. Pornillos O, Chen YJ, Chen AP, Chang G: X-ray structure of the EmrE multidrug transporter in complex with a substrate. Science 2005, 310:1950-1953. 59. Donnelly D, Overington JP, Ruffle SV, Nugent JH, Blundell TL: Modeling a-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci 1993, 2:55-70. 60. Briggs JA, Torres J, Arkin IT: A new method to model membrane protein structure based on silent amino acid substitutions. Proteins 2001, 44:370-375. 61. Tate CG: Comparison of three structures of the multidrug transporter EmrE. Curr Opin Struct Biol 2006, 16: this issue. 62. Mottamal M, Zhang J, Lazaridis T: Energetics of the native and non-native states of the glycophorin transmembrane helix dimer. Proteins 2006, 62:996-1009. 63. Zhou FX, Cocco MJ, Russ WP, Brunger AT, Engelman DM: Interhelical hydrogen bonding drives strong interactions in membrane proteins. Nat Struct Biol 2000, 7:154-160. 64. Choma C, Gratkowski H, Lear JD, DeGrado WF: Asparaginemediated self-association of a model transmembrane helix. Nat Struct Biol 2000, 7:161-166. 65. Arbely E, Arkin IT: Experimental measurement of the strength of a Ca–H. . .O bond in a lipid bilayer. J Am Chem Soc 2004, 126:5362-5363. Current Opinion in Structural Biology 2006, 16:496–504 66. Yohannan S, Faham S, Yang D, Grosfeld D, Chamberlain AK, Bowie JU: A Ca–H. . .O hydrogen bond in a membrane protein is not stabilizing. J Am Chem Soc 2004, 126:2284-2285. 67. Vargas R, Garza J, Dixon D, Hay B: How strong is the Ca–H. . .O C hydrogen bond? J Am Chem Soc 2000, 122:4750-4755. 68. Scheiner S, Kar T, Gu Y: Strength of the Ca–H. . .O hydrogen bond of amino acid residues. J Biol Chem 2001, 276:9832-9837. 69. Faham S, Yang D, Bare E, Yohannan S, Whitelegge JP, Bowie JU: Side-chain contributions to membrane protein structure and stability. J Mol Biol 2004, 335:297-305. An analysis of the contributions to stability of individual amino acid residues on helix B from bacteriorhodopsin. It is found that the contribution correlates with the amount of buried surface area rather than the ability to provide hydrogen-bonding interactions, roughly as seen for soluble proteins. Surprisingly, a mutation of a kink-inducing proline to alanine did not decrease stability significantly, and only elicited minor changes in secondary structure. 70. Kraulis PJ: MolScript: a program to produce both detailed and schematic plots of protein structures. J Appl Cryst 1991, 24:946-950. 71. Merritt EA, Bacon DJ: Raster3D: photorealistic molecular graphics. Methods Enzymol 1997, 277:505-524. 72. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N: ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003, 19:163-164. 73. Yarov-Yarovoy V, Baker D, Caterall WA: Voltage sensor conformations in the open and closed states in structural models of K+ channels. Proc Natl Acad Sci USA 2006, 103:7292-7297. www.sciencedirect.com

RELATED PAPERS

RELATED TOPICS

Log In

Progress in structure prediction of α-helical membrane proteins

Progress in structure prediction of α-helical membrane proteins

Related Papers

RELATED PAPERS

RELATED TOPICS