Algorithm to assess packing quality of α-helical transmembrane domains
Fig. 1. The scheme of residues surface area division. The total residue surface area is subdivided into the following parts: ASA is the surface area accessible to membrane; Sp and Snp are the areas covered by polar and non-polar protein atoms, respectively. The two latters are further subdivided into Sp0 and Snp0 the contact areas with residues of the same TM segment, and Sp1 and Snp1 the contact areas with other helices.
Chugunov et al., Biochemistry (Moscow), 2007; Chugunov et al., 2007a; Chugunov et al., 2007b.
Integral membrane proteins (MPs) are objects of exceptional biological importance. Information on the structure and functioning of TM domains is highly required for pharmaceutical applications, such as structure-based drug design (SBDD). Modern experimental techniques of three-dimensional (3D) structure determination, such as X-ray crystallography and NMR spectroscopy, on the other hand, often fail to solve the problem due to technical difficulties related to protein purification and crystallization. Just a little more then one hundred of MPs structures have been determined to date, making up insignificant part of the total number of known structures in the Protein Data Bank, although the genomes sequenced so far encode at least 15–30% of MPs.
We have developed a novel method for assessment of the packing quality of TM α-helical proteins based on the analysis of a non-redundant set of 26 high-resolution X-ray structures of MPs (training set). Amino acid residues in the structures of the training set were characterized using the concept of the protein–lipid environment classes. It permitted determination of statistical preferences of residues to five pre-defined classes of environment. These preferences describe an interaction of each type of residue with the membrane and a polarity of the proximal protein groups.
The environment of each residue in the 3D model was characterized by two parameters the fractions of its side chain surface area that are covered by polar (nitrogen or oxygen) and non-polar atoms of neighboring helices in the TM domain, Fp1 and Fnp1, respectively (fig. 1).
Distributions of Leu and Arg residues from the training set over the values of Fp1 and Fnp1 and the graphical definition of environment classes are presented on fig. 2.
Fig. 2. Environment classes for residues in MPs. └. Distributions of residues Leu and Arg over the values of Fp1 and Fnp1 fractions of the surface area covered by polar and non-polar atoms of neighbouring helices, respectively. Each data point corresponds to a single residue of a given type in the database. Gray and black dots denote residues on the interface (|Z|≥15 Å) and in the membrane core (|Z|<15 Å), respectively (Z is the coordinate along the membrane normal). B. Depending on the values of Fp1 and Fnp1, each residue is attributed to one of five environment classes. Class 1 corresponds to residues preferentially exposed to the membrane, classes 2, 3 and 4, 5 are composed of residues partly and completely buried inside the bundle, respectively. Classes 2 and 4 denote non-polar, while classes 3 and 5 polar environment.
Prominent correlation between the length of a protein sequence and the corresponding score values obtained for the training set of X-ray structures of MPs allows quantitative estimation of the packing quality for a particular protein model and its comparison with an X-ray structure of such a length. This will hopefully be used in an automated criterion for delineation of correct MPs structures. Moreover, the method allows identification of the native-like structures among a large set of decoys (fig. 3).
Fig. 3. Membrane scores (Smem) for membrane proteins, as a function of TM domain sequence length. Structures from the training set are shown with black circles, and computer models of rhodopsin with white triangles. The solid line represents the least squares fits obtained for the training set. Inset: Dependence of the values Smem on the root-mean-square deviation from the crystal structure (empty circle) for theoretical models of rhodopsin. The regression trend-line is also given.
The proposed method of the membrane score is conceptually similar to the 3D environment profiles (3D-1D) technique, although there are some important differences. Firstly, the membrane score is optimized for TM α-helical proteins and takes into account only one secondary structure type (α-helix), since random coil conformation is highly unfavorable in a membrane milieu. Secondly, definitions of the environment classes are drastically different from those suggested for water-soluble proteins by Bowie and Eisenberg. The proposed method noticeably outstrip the 3D-1D approach in ability to discover close-to-native structures among a large set of artificial models.
The proposed method will be useful in optimization of TM domains of MPs, first of all G-protein coupled receptors.
(Authors: Anton Chugunov, Valery Novoseletsky, Roman Efremov.)
1. Selection residues for the training set
The data sets of MPs used in this study were constructed based on the list of 191 structures published on the web-site Membrane proteins of known 3D structure. All the non-helical proteins, low-resolution structures (with resolution >3.5 Å), NMR-derived models, and homologous proteins were omitted. In each family of closely related proteins only one member solved to the highest resolution was selected for the final set. This procedure yielded 26 X-ray structures of non-homologous α-helical TM proteins, the training set.
The TM domain structures of the following proteins were included in the training set: bacteriorhodopsin (PDB code: 1C3W), halorhodopsin (1E12), sensory rhodopsin I and II (1XIO and 1H68, respectively), visual rhodopsin (1U19), KcsA H+ gated potassium channel (1K4C), MscL mechanosensitive ˝hannel (1MSL), H+/Cl- exchange transporter (1KPL), aquaporin water channel (1J4N), aquaporin Z channel (1RC2), glycerol facilitator channel (1FX8), ammonia channel (1XQE), bacterial multi-drug efflux transporter (1IWG), glycerol-3-phosphate transporter (1PW4), vitamin B12 transporter (1L7V), calcium ATPase (1SU4), rotors of V- and F-type Na+-ATPases (2BL2 and 1YCE, respectively), fumarate reductase complex (1QLA), succinate dehydrogenase (1NEK), cytochrome C oxidase (1EHK), cytochrome bc1 complexes (1EZV and 1BCC), formate dehydrogenase (1KQF), nitrate reductase A (1Q16) and mitochondrial ADP/ATP carrier (1OKC).
Each model was arranged in such a way that the principal moment of inertia of its TM domain was aligned along Z-axis corresponding to the membrane normal. Directionality of the axis was chosen as follows: smaller values of Z corresponded to the cytoplasmic side of the plasma membrane or the matrix side of inner mitochondrial membrane. The model was then scanned along the Z-axis with a hydrophobic slab mimicking lipid bilayer, and the energy of protein–membrane interaction (Esolv) was calculated at each step. The slab was represented by two parallel planes XY separated with a distance D (the membrane thickness; varied in the range 20÷40 Å with 2 Å increment) and described by an effective potential based on the atomic solvation parameters (ASP) formalism.
Only residues in the delineated TM regions were further used to assess their environmental characteristics in different data sets. To prevent boundary effects, the calculations were done taking into account the atoms located within 5 Å along the Z-axis, outside the terminal residues of a given TM segment. In total, the training set comprised 217 TM α-helices containing 4991 residues.
2. Parameterization of the method
The environment of each residue in the 3D model was characterized by two parameters the fractions of its side chain surface area that are covered by polar (nitrogen or oxygen) and non-polar atoms of neighboring helices in the TM domain, Fp1 and Fnp1, respectively:
where Sp and Snp are polar and non-polar side chain areas of the residue, respectively (Fig. 1). Sp0 and Snp0 are the corresponding values calculated for the isolated TM α-helix containing the residue under consideration. S0 is the solvent-accessible surface area of the side chain in a Gly4–X–Gly5 polypeptide.
Parameters Fp1 and Fnp1 were chosen due to the following reasons. First, they are non-dimensional, and a universal approach to all types of residues is applicable. Second, there are some common algorithms of residue area calculation. Depending on its values of Fp1 and Fnp1, each residue in the training data set was assigned to one of five environment classes. The latters were defined by three parameters the values a, b, and tgα (Fig. 2B).
The compatibility of each of the twenty amino acids with each of five environment classes was expressed in terms of the membrane score values calculated as Sijmem=ln(Pij/Pj). Here Pij is the probability of finding residue of type i in environment j. Pj is the probability of finding any residue in environment j. The borderline values for the environment classes (Fig. 2B) were adjusted iteratively, the matrix Sijmem being recalculated each time, so as to maximize the total membrane score for the whole training set.
3. Limitations of the method
Due to the statistical nature of the present method, there might not be absolute correlation between RMS deviation of the model from the native structure and the membrane score value in all RMSD values ranges. The membrane score value is defined mainly by mutual orientations of residues sidechains, so the dependence of this scoring function on the backbone conformation (i.e., ability of low-resolution optimization) is mostly mediated by sidechain conformations. Conceivably, the methods capability to assess packing quality might be different in distinct resolution ranges and be not straightforward in the case of only Cα models since theres no clear information on the orientations of the sidechains.