Loading

JSM Chemistry

Predicting Protein-Protein Binding Affinity by In silico Docking

Review Article | Open Access | Volume 4 | Issue 1

  • 1. Department of Biophysics and Engineering of Cell, Institute of Biophysics and Cell Engineering of NAS of Belarus, Belarus
+ Show More - Show Less
Corresponding Authors
Valery Veresov, Department of Biophysics and Engineering of Cell, Institute of Biophysics and Cell Engineering of NAS of Belarus, Ackademicheskaya Str. 27, Minsk, Belarus, Tel: 375-172842252; Fax: 375172842359
Abstract

Protein-protein interactions, which result in either transient or long-lived complexes, play a central role in the processes happening in the cells. Perturbations in those interaction networks can lead to disease. Understanding the mechanisms of proteinprotein binding is inherently difficult due to the range of potential interactions, the molecular rearrangement associated with binding, and the time and length scales involved. In silico protein docking is a valuable tool in determination of the structures of protein–protein complexes that complements experimental methods with a level of details that are unattainable in experiment. Although the precise prediction of the structure of protein-protein complex is sufficient to describe the interaction in atomic details, it is the knowledge of affinity of the components for each other is necessary for the prediction of the existence of this assembly and whether it is transient or permanent. Recently, a structure-based binding affinity benchmark Version 1.0, which includes a non redundant set of 144 protein–protein complexes that have highresolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods, and the updated version, which contains 179 entries, have been presented. Because for all protein complexes in this benchmark the unbound structures of component proteins are available, these benchmarks allow for the assessment of conformational changes upon protein-protein binding. We use this assessment together with experimental data to explain failures and successes in prediction structures and affinity by in silico docking using the combination of rigid docking and RosettaDock refinements.

Keywords

 Protein-protein docking;  Structure;  Modeling

Citation

Veresov VG (2016) Predicting Protein-Protein Binding Affinity by In silico Docking. JSM Chem 4(1): 1019.

ABBREVIATIONS

FFT: Fast Fourier Transform; PRE: Paramagnetic Relaxation Enhancement; NMR: Nuclear Magnetic Resonance; PLS: Partial Least Squares; SVM: Support-Vector Machine; GP: Gaussian Process; RMSD: Root-Mean- Square-Deviation

INTRODUCTION

Relating structure to function is a major objective of structural biology. Although current structural biology tools have broaden essentially our knowledge in structures, dynamics and functions of individual proteins, the situation differs significantly in the case of protein-protein complexes while protein-protein interactions that result in either transient or long-lived complexes play a fundamental role in many biological functions and sometimes also result in diseases. Understanding the mechanisms of protein-protein binding is inherently difficult due to the range of potential interactions, the molecular rearrangement associated with binding, and the time and length scales involved. Physical– chemical and structural studies of molecular recognition between proteins are essential to understand protein function and hence life processes. Given the experimental difficulties for having information on protein–protein interactions at atomic level, computational docking is increasingly used as a complementary tool to predict the structure of a specific complex formed by two given interacting proteins. Although in recent years the field has experienced a rapid improvement, major challenges remain, such as the treatment of flexibility and reliable scoring (see recent reviews on protein–protein docking [1-8]). Most of the available docking methods treat the interacting proteins as rigid bodies during the whole process, or at least in a first stage. The majority of rigid-body docking methods are based on an exhaustive sampling of the rotational and translational space in search for geometric surface correlation mainly through FFT (Fast Fourier Transform), spherical polar Fourier, or geometric hashing algorithms. Although the precise prediction of the structure of protein-protein complex is sufficient to describe the interaction in atomic details, it is the knowledge of affinity of the components for each other is necessary to predict whether the assembly actually exists under a given condition of temperature, pH, and protein concentration, and whether it is transient or permanent. Even if docking methods are highly promising tools for modeling protein-protein complexes, they do not yet allow a reliable estimation of the binding affinity of the complex and successful docking procedures often give equally good scores for proteins that do not interact experimentally [9]. Therefore, the design of ideal computational tools for protein-protein complex modeling that would also predict the binding affinity of a complex is one of the challenges in structural bioinformatics. Such computational tools would open the route to in silico, large-scale annotation and prediction of complete interactomes.

The process of protein-protein complex formation comprises at least two steps [10]. During an initial diffusive approach possibly guided by electrostatic interactions the proteins encounter each other many times before an intermediate loosely bound state near the native binding geometry is reached [10] (also termed ‘encounter complex’). Upon formation of additional interactions and conformational changes at the interface regions the partners form the native complex with often (but not always) a high steric interface complementarity. The nature of the encounter state has been elusive for a long time, due to a lack of experimental methods to probe it [11]. Recently, new tools to study this preliminary step in complex formation have been reported that changed the view of this state and of proceeding from the encounter complex towards the final structure. The major progress is due to the application of NMR paramagnetic relaxation enhancement (PRE), a technique that is exquisitely sensitive to the presence of lowly populated states in the fast exchange regime [12-14]

Recently, a structure-based affinity benchmark of Kastritis et al (version 1.0), which includes a non redundant set of 144 protein–protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods [15], has been presented and very recently its updates by Vreven et al. (structure-based affinity benchmark version 2.0) has been reported [16]. Because for all these complexes the unbound structures of component proteins are available, these two benchmarks allow for the assessment of conformational changes upon protein-protein binding. We use the data from these benchmarks together with other experimental and theoretical data to gain structural insights into failures and successes in a structure and affinity prediction by in silico docking.

Protein-protein association. What rigid-body docking models describe?

The structure of a protein complex together with information about its affinity and other thermodynamic characteristics provide a “frozen” view of the complex and ignores the kinetic nature of protein-protein association. Since all interactions are of relatively short range, proteins, before their association, diffuse randomly in solution performing Brownian motion until they reach an area, known as “the steering region” or “longrange electrostatic steering region”, where mutual electrostatic attraction leads them to a ‘macrocollision’, resulting sometimes in the formation of a specific low-free energy encounter complexes [10,17-19]. The nature of these encounter complexes has been a matter of considerable research and debates [11, 20]. However, the encounter complex structures and configurations were elusive over many years to conventional structural and biophysical methods because their populations are low, their lifetimes are short, and they are difficult to trap. Considerable progress has been achieved with application of novel experimental and improved computational methods, developed during the last decade. One of the most promising approaches for studying the formation of encounter complexes is ‘paramagnetic relaxation enhancement’ (PRE) [12-14]. In this technique certain areas in one of the proteins are labeled with magnetic particles, which produce signals when the two proteins are close to each other. Repeating the measurement several times with the magnetic particles in different positions provides information about the overall structure of the complex. Computational modeling can then be used to work out the fine details of the structure, including the shapes of the intermediate structures made by the proteins as they interact.

Encounter complexes are best described as low free energy ensembles of states that are close to the final complex and in which the two molecules can rotationally diffuse along each other, or participate in a series of ‘microcollisions’ that properly accommodate the reactive surface groups (Figure 1)


Figure 1: Successive steps in a possible mechanism for protein–protein association, compatible with PRE data. A:BES stands for ensembles of encounter complexes between proteins A and B. Formation of transient encounter complexes is executed by nonspecific collisions, guided mostly by electrostatic interactions. Many encounter complexes separate rapidly. Some productive encounter complexes (A:BES,prod) reorient and come closer to the final, specific orientation with final fitting of interacting surfaces (AB), guided mostly by desolvation, as water molecules move away from the protein surfaces.

[11,20]. The encounter complexes are stabilized by a combination of electrostatic forces and desolvation, and are destabilized by unfavorable entropy. Specific short-range interactions do not seem to play an important role at this stage [10]. The formation of the final complex from the encounter complex requires desolvation and structural rearrangement of the side chains, thus lowering further the free energy of interaction (Figure 1). The macrocollisions that result in the formation of encounter complex seem at first glance as not requiring interaction forces between the macromolecules. However, the lifetime of such a macrocollision will often not be sufficient for a macromolecule to find a small binding site on the partner and the fraction of macrocollisions that results in a productive complex will be small. Electrostatic interactions have long been recognized as the dominant factor used by Nature in order to enhance protein association beyond Encounter complexes are best described as low free energy ensembles of states that are close to the final complex and in which the two molecules can rotationally diffuse along each other, or participate in a series of ‘microcollisions’ that properly accommodate the reactive surface groups (Figure 1) [11,20]. The encounter complexes are stabilized by a combination of electrostatic forces and desolvation, and are destabilized by unfavorable entropy. Specific short-range interactions do not seem to play an important role at this stage [10]. The formation of the final complex from the encounter complex requires desolvation and structural rearrangement of the side chains, thus lowering further the free energy of interaction (Figure 1). The macrocollisions that result in the formation of encounter complex seem at first glance as not requiring interaction forces between the macromolecules. However, the lifetime of such a macrocollision will often not be sufficient for a macromolecule to find a small binding site on the partner and the fraction of macrocollisions that results in a productive complex will be small. Electrostatic interactions have long been recognized as the dominant factor used by Nature in order to enhance protein association beyond

Where A and B are the free proteins, A:B is the encounter complex, and AB is the final complex. According to this scheme, the first step includes Brownian diffusion, macrocolision and the formation of the encounter complex (complexes). The second step of association consists of conformational rearrangements to the final complex. The macroscopic rate constant for formation of the productive complex is kon=k1 k2 /(k-1+k2 ), and for dissociation, koff=k-1k-2/(k-1+k2 ) [10]. In (Figure 2)


Figure 2: Energy diagrams. A: Final complex which is reached without favorable electrostatic forces and without formation of transient (encounter) complexes between the proteins. B: Final complex formed due electrostatic attraction without formation of encounter complexes; C: Proceeding to a stable final complex (AB) through formation of encounter complex (A:B); D: Like C, with a high-energy barrier between encounter complex and the final complex. The proteins can get trapped in a non-productive local energy minimum (non-productive encounter complex).

energy diagrams are shown to illustrate various possibilities. The encounter complex is often thought of as an ensemble of conformations (‘encounter complexes’) in which the two molecules can rotationally diffuse along each other, or participate in a series of ‘microcollisions’ that properly accommodate the reactive surface groups.

It is well known that, for most protein a global search via rigid docking gives low energy structures in several regions of the conformational space, some of which are far from the structure of the native complex. Physics-based energy functions are expected yet to be globally valid for modeling interactions between proteins, including the non-native states. Thus, one can assume that the rigid-body solutions with the energy values that are low relative to the average energy but still exceed the energy at the global minimum may represent relatively short-lived encounter complexes along the association pathways. Recently, exhaustive sampling of the conformational space in proteinprotein association using a physics-based energy function has been carried out [21]. The study has shown the agreement between experimental PRE data and theoretical PRE profiles calculated from the ensemble of structures generated by docking thus confirming the hypothesis that the structures of encounter complexes can be obtained simply as byproducts of docking. While this result seems not unexpected, in view of the limited structural information available on encounter complexes it is of great practical significance.

Predicting protein–protein affinity

Protein docking is generally applied to individual pairs of proteins that are known to interact, to model the threedimensional structure of the complexes they form. Docking programs usually comprise two standard steps: generation of thousands of alternative poses to sample all possible interaction modes, followed by scoring these poses using a ‘pseudo-energy’ function. A set of solutions is generated by the first step, containing often near-native solutions, but scoring functions often fail to rank them properly and pick out truly best poses [9]. Yet, the potential to use protein docking algorithms to infer protein–protein binding affinities has long been used and a number of algorithms have been proposed to relate the binding affinity of two proteins to the structure of their complex. These algorithms vary dramatically from one another in terms of their physical relevance, accuracy and computational cost. Exact methods such as free energy perturbation and thermodynamic integration [22], and free energy pathway methods such as the linear interaction energy (LIE) and molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) methods [23] are in principle highly accurate. However, because they use extensive molecular dynamics or Monte Carlo simulations their application is extremely limited and is usually only applicable where the bound and unbound states have significant overlap. Empirical energy functions are much faster, and either take into account the physical chemical properties of the interface between subunits, and assess the free energy of their interaction, or they use statistical potentials, or they rely on empirical scoring functions developed for protein–protein docking and use linear regression to obtain a weighted function of physical or knowledge-based terms, although machine-learning algorithms have been applied as well [24-30]. Such models can also use biological information on the conservation of the protein sequence or the effect of point mutations. Horton and Lewis obtained a high correlation coefficient of 0.96 with experimental measurements, using only three terms in a linear combination [24]. Later works using different methods and datasets also reported high correlation coefficients, ranging up to 0.95 values [25-30]. However, as outlined in several recent papers, these high accuracy predictions are due to limitations in the data sets used for training and testing the algorithms [9]. When a large number of approaches were tested on a larger and more reliable dataset no correlations higher than 0.53 was observed. A community-wide effort on the computational discrimination between binding and nonbinding protein pairs highlighted the critical role that the dataset plays for a balanced assessment of the methods [9,31]. Docking algorithms compute the structure of protein complexes, so in principle, they should be able to distinguish pairs of proteins by their binding affinities. However, given their poor performance in ranking correct conformations, it has widely been thought that docking programs are not yet accurate enough to predict binding affinities [9]. The main molecular origin why current models are limited in predicting binding affinities is thought to be a conformational flexibility of binding proteins [32]

In recent years several new affinity prediction algorithms have been proposed. The pyDock [33] uses a simple approach to scoring of rigid-body docking poses, which is based on Coulombic electrostatics with distance dependent dielectric constant, and implicit desolvation energy with atomic solvation parameters previously adjusted for rigid-body protein-protein docking. This scoring function is not highly dependent on specific geometry of the docking poses and therefore can be used in rigid-body docking sets generated by a variety of methods. The protocol has been tested in a large benchmark set of 80 unbound docking cases. The method was able to detect a near-native solution from 12,000 docking poses and place it within the 100 lowest-energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information. More specifically, a near-native solution will lie within the top 20 solutions in 37% of the cases.

In recent years several new affinity prediction algorithms have been proposed. The pyDock [33] uses a simple approach to scoring of rigid-body docking poses, which is based on Coulombic electrostatics with distance dependent dielectric constant, and implicit desolvation energy with atomic solvation parameters previously adjusted for rigid-body protein-protein docking. This scoring function is not highly dependent on specific geometry of the docking poses and therefore can be used in rigid-body docking sets generated by a variety of methods. The protocol has been tested in a large benchmark set of 80 unbound docking cases. The method was able to detect a near-native solution from 12,000 docking poses and place it within the 100 lowest-energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information. More specifically, a near-native solution will lie within the top 20 solutions in 37% of the cases.

The SIPPER [35] (scoring by intermolecular pairwise propensities of exposed residues) scoring uses statistical potentials extracted from intermolecular pairs of exposed residues in known complex structures, which were then used to score protein-protein docking poses. SIPPER combines the value of residue desolvation based on solvent-exposed area with the propensity-based contribution of intermolecular residue pairs. This scoring function has found a near-native orientation within the top 10 predictions in nearly one-third of the cases of a standard docking benchmark and proved to be also useful as a filtering step, drastically reducing the number of docking candidates needed by energy-based methods like pyDock

Moal et al have proposed a QSAR-like approach MARS (Multivariate Adaptive Regression Splines) [36] based on a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors were used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested.

ZAPP (Zlab Affinity for Protein–Protein interaction) predicts protein–protein binding free energies using a linear combination of nine energy terms and a constant [37]. Only one term uses the unbound structures in addition to the complex structures, while the other eight terms only require the complex structure.

SPA-PP (specificity and affinity of the protein–protein interactions) has been developed by incorporating both the specificity and affinity into the optimization strategy [38]. The authors of the method report testing results and comparisons with other scoring functions showing fine performance of the method on both predictions of binding poses and affinity

Zhou et al have used a biomacromolecular QSAR (Bio QSAR) scheme with 101 structural descriptors categorized into five groups to extract abundant information from a protein– protein interface: (1) constitutional descriptors (such as the numbers of different amino acids, groups and atoms present at the interface, the ratio of the residues at the interface to all residues in the complex, the quantity of hotspot residues, etc.) (2) Contacting descriptors (such as atomic contact vectors, residue pair numbers, residue interaction indices, empirical contact potentials, etc.) (3) Geometrical descriptors (such as the accessible surface area, interfacial connectivity index, interface volume, etc.) (4) physicochemical descriptors (such as electrostatic potential, interfacial polarity, molar refractivity, hydrophilicity/hydrophilicity, etc.) and (5) non bonded descriptors(such as the number of hydrogen bonds, hydrophobic forces, salt bridges, water-mediated hydrogen bonds, etc.) [39]. Three sophisticated regression methods, i.e. partial least squares (PLS), support-vector machine (SVM) and Gaussian process (GP), were employed to establish the linear and nonlinear correlations between the interface descriptors and binding affinity of the 144 studied protein complexes from the structure affinity benchmark [15]. The modeling performance and predictive power of Bio QSAR were reported to be comparable to or even better than that of traditional knowledge-based strategies, mechanismtype methods and empirical scoring algorithms, while Bio QSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency

The minimal affinity model of Janin takes into account only two structural features of the complex, the size of its interface, and the amplitude of the conformation change between the free and bound subunits [40].

Comparison of the Pearson coefficient R among several methods, where the same data set [15] was used is given in (Table 1).

Table 1: Comparison of the Pearson coefficient R among various methods, where the same data set [15] is used.

Method R Feature Ref.
ZAPP 0.63 Regression with nine 
terms
[37]
MARS 0.52 Machine learning [36]
BIOQSAR 0.82 Machine learning [39]
SPA-PP 0.39 Statistical potential [38]
ROSETTADOCK 0.42 Regression with nine 
terms
[53]
DFIRE 0.35 Statistical potential [52]
Minimalistic
Solvation-based 
Predictor
0.48 Solvation [43]
Minimal Model 
of Janin
0.63 for small 
I_rmsd
BSA and I_rmsd [40]
BSA stands for ‘Buried Surface Area’
I_rmsd stands for Interface Root- Mean- Square - Deviation

Luo et al have carried out a systematic physico-chemical and conformational studies with interfaces involved in different PPIs, including crystal packing, weak transient heterodimers, weak transient homodimers, strong transient heterodimers and homodimers. The comparative analysis has shown that the interfaces tend to be larger, less planar, and more tightly packed with the increase of the interaction strength. Meanwhile the strong interactions undergo greater conformational changes than the weak ones involving main chains as well as side chains. Finally, using 18 features derived from such an analysis, a support vector regression model was developed to predict the binding affinity [41].

SolveBind is a binding affinity prediction method based on the global surface model of Kastritis et al. [42], using information on the number of atoms in the interface (NAtomsINT) and the percentages of charged and polar residues in the non-interacting surface (%AAcharNIS and %AApolNIS). Properties of the noninteracting surface were found to correlate with affinity [42] and may regulate solvation and electrostatic contributions to binding affinity [42].

hakhnovich et al have proposed a minimalistic solvationbased model for predicting protein binding energy using the solvation factor described by a simple linear combination of buried surface areas according to amino-acid types [43]. The authors report that their minimalistic model demonstrates a predictive power comparable to more complex methods, making the proposed approach the basis for high throughput applications.

Moal and Fernandez-Recio [44] have used atomic and residue contact potentials derived directly from experimental binding free energy changes following mutation. The first set of potentials is obtained by unweighted least-squares fitting and bootstrap aggregating. The second set is calculated using a weighting scheme optimized against absolute binding affinity data, so as to account for the overrepresentation of certain complexes, residues, and families of interactions. More recently FernandezRecio and coworkers proposed a simplified model (‘surface energy model’) also parameterized using empirical changes in free energy upon mutations [45]. The model correlates well with empirical binding free energies of a functionally diverse set of rigid-body interactions (r = 0.66). When used to rerank docking poses, it could place near-native solutions in the top 10 for 37% of the complexes evaluated, and 82% in the top 100. The method shows that hydrophobic burial is the driving force for protein association, accounting for 50-95% of the cohesive energy.

FireDock [46] is an efficient method for the refinement and rescoring of rigid-body docking solutions. The refinement process consists of two main steps: (1) rearrangement of the interface side-chains and (2) adjustment of the relative orientation of the molecules. The method accounts for the observation that most interface residues that are important in recognition and binding do not change their conformation significantly upon complex formation. While allowing full side-chain flexibility is a common procedure in many refinement methods, often causing excessive conformational changes and distorting preformed structural patterns, which have been shown to be important for binding recognition, FireDock restricts side-chain movements, and thus manage to reduce the false-positive rate noticeably. In the later stages of FireDock procedure (orientation adjustments and scoring) it smooths the atomic radii. This allows for the minor backbone and side-chain movements and increases the sensitivity of the algorithm. FireDock succeeded in ranking a near-native structure within the top 15 predictions for 83% of the 30 enzyme-inhibitor test cases, and for 78% of the 18 semi unbound antibody-antigen complexes.

Very recently Vangone and Bonvin have demonstrated that the network of inter-residue contacts (IRCs) between two interacting proteins is a good descriptor for the binding affinity [47]. Using the structure-based binding affinity benchmark of Kastritis et al. [15] and removing the cases with ambiguity in the exact Kd values and all complexes with missing or unresolved residues (>2) at the interface it has been observed a good correlation between IRCs and experimentally determined binding affinity data (Kd or ΔG) for protein complexes (bound forms). The most relevant contributions to BA are the number of IRCs made by charged and polar residues (IRCs_charged/charged and IRCs_ polar/polar), while the apolar residues are only counted when interacting with charged and polar ones (IRCs_charged/apolar and IRCs_polar/apolar). A model combining IRCs contribution and that of non-interacting surfaces (IRCs/NIS-based model) has been reported to possess the best performance of any model developed so far, with R=−0.73 and RMSE=1.89 kcal mol-1 [47]. The correlation between the number of interfacial contacts and binding affinity is consistent both with the previously reported evidence that interfaces with a significant interaction strength are large and tightly packed [48] and with the simple BSA models introduced by Chothia and Janin [49] and Horton and Lewis [24]. Although BSA and the number of contacts at the interface are of course somewhat related, Vangone and Bonvin have shown that the number of interface contacts shows much better correlations with binding strength than the BSA. The comparison of the performance of 16 different predictors given in this work is presented in (Figure 3)

Figure 3: Comparison of the performance of different predictor models reported by Vangone and Bonvin [47], Moal et al [36]., and the CCHarPPI web server [50]. The performance is expressed as Pearson’s Correlation coefficient between experimental and predicted binding affinities. Predictions were made on the common set of 79 complexes. The considered models include: (1) the IRCs/NIS model of Vangone and Bonvin [47]; (2) the ‘global surface model’ of Kastritis et al [42]. (3) The BSA-based model of Horton and Lewis [24]; (4-6) the top three best performing models reported by Moal et al [36]. (their consensus model, DFIRE [51] and PMF [29]); the models based on composite scoring functions reported by the CCHarPPI web server [50], such as (7) ZRANK [34]; (8) ZRANK2 [34]; (9) RosettaDock [52]; (10) PyDock [33]; (11) SIPPER [35]; (12) PISA [53]; (13-15) FireDock, FireDock _AB and FireDock_EI, respectively [46]; (16) PIE [54]

Recently Erijman et, basing on the same benchmark of Kastritis et al [15], have carried out the exploration how different biophysical features derived from a protein-protein structure correlate with protein-protein binding affinity [55]. The features that showed the highest correlation with binding affinity were the total number of H bonds, geometric complementarity measured by the van der Waals energy, and side chain conformational changes for high-resolution X-ray structures [55].

Recently Erijman et, basing on the same benchmark of Kastritis et al [15],

have carried out the exploration how different biophysical features derived from a protein-protein structure correlate with protein-protein binding affinity [55]. The features that showed the highest correlation with binding affinity were the total number of H bonds, geometric complementarity measured by the van der Waals energy, and side chain conformational changes for high-resolution X-ray structures [55].

in six dimensions, utilizing a fast Fourier transform (FFT) for efficiency and softness for small overlaps [32]. Subsequently, one or more refinement and scoring steps of a set of preselected rigid docking solutions are added to achieve closer agreement with the native geometry and to recognize near-native docking solutions preferentially either as the best or among the best scoring complexes. The accuracy and speed of flexible refinement and rescoring of preselected docked protein structures are important for the success of the multistage docking protocol. A rescoring of docking decoys using more elaborate energy functions and filters has been shown to improve the selection of models [32]. Several strategies for scoring and structural refinement of docked complexes have been developed [56- 67]. In earlier works structural flexibility was treated by soft potentials [56], multicopy representations of side chains [57], or flexible loops [58,59], by using conjugate gradient minimization [60] followed by clustering to identify near-native structures. In an approach to refine a large number of complexes obtained by the FFT-based ZDOCK program [68] a combination of several descriptors showed significant improvement [65]. Often a single descriptor (e.g. surface complementarity) or a single binding energy component (e.g. vdWaals or electrostatic energy) is insufficient to distinguish near-native and non-native complexes. A combination of different surface and interface descriptors was found to be able to enrich the number of near-native solutions in the pool of best scoring docking solutions [36,47,55,65]. The increase in the number of experimentally solved protein–protein complex structures in recent years allows extraction of more data on the statistical residue and atom contact preferences at protein–protein interfaces. New effective knowledge-based scoring functions have been developed that are based on contact preferences of amino acids at known interfaces compared to interfaces of non-native decoy complexes [47, 64,65]. An iterative approach has been designed for the derivation of knowledge-based scoring functions that optimally distinguish between native binding modes and possible decoy poses [66]. The resulting distance dependent pair-potentials significantly improved scoring of near-native complexes for bound and unbound docking.

One more possibility to directly use computationally rapid rigid docking algorithms is to indirectly account for receptor flexibility by representing the receptor target as an ensemble of structures. The structural ensemble can, for example, be a set of structures obtained experimentally (e.g. from nuclear magnetic resonance (NMR) spectroscopy) or can be formed by several structural models of a protein. Docking to an ensemble increases the computational demand and due to the large number of protein conformations may also increase the number of false positive docking solutions. A variety of ensemble-based approaches have been developed in recent years in the field of small-molecule docking (reviewed in [70]). Cross docking to ensembles from MD simulations have also been used to implicitly account for conformational flexibility in protein docking [71]. Mustard and Ritchie generated protein structures deformed along directions compatible with a set of distance constraints reflecting largescale sterically allowed deformations [72]. Subsequently, the structures were used in rigid body docking searches to identify putative complex structures. Conformer selection and induced-fit mechanism of protein–protein association have been compared by ensemble docking methods using the RosettaDock approach [73] that allows side chain and limited backbone relaxation during docking refinement [74]. The method was able to successfully select binding-competent conformers out of the ensemble based on favorable interaction energy with the binding partner [73].

In a recent paper of Kozakov et al it has been shown using Principal Component Analysis that the energy landscape of 42 interacting proteins, at least within the 10 Å IRMSD neighborhood of the native state, always includes a permissive subspace (‘tunnel’) along which the conformation of the complex can substantially change without crossing significant energy barriers and that the energy landscape is smooth funnel in a two dimensional permissive subspace [21]. In all cases this subspace captured at least 75% of the total motion as the two molecules approach the native state. For each of the 42 complexes a high energy subspace has been detected, which reduces the dimensionality of the space available to encounter complexes along the association pathways. This suggests that methods such as molecular dynamics (MD) or Monte Carlo (MC) simulations that start from productive encounter complexes should fairly quickly converge to native structures (or near-native ones because of some inaccuracy of scoring functions) making these strategies as promising tools of the efficient refinement. The Monte Carlo approach is especially attractive as being much less computationally expensive as compared with MD. Instead of screening all possible conformations with a Fouriertransformable energy function, random starting decoys are refined by applying random translational and rotational moves and deciding on their acceptance using the Metropolis criterion [75,76]. While FFT methods have the advantage of great speed and complete sampling of the conformational space, Monte Carlo methods are able to generate more physical decoy distributions, can involve arbitrary energy functions, and might allow for structural flexibility.

Several docking protocols including rigid-body moves and Monte Carlo refinement have been proposed [75-80]. In ICM-DISCO [77], the simulations start with a rigid-body docking searched by a pseudo-Brownian Monte Carlo simulation. The second step is the Monte Carlo refinement of ligand side-chain torsion angles. In RosettaDock [75], the simulations start from random ligandreceptor orientations, followed by a low-resolution rigid-body docking. In a second step, RosettaDock optimizes side chains and rigid-body orientations simultaneously, based on the simulated annealing Monte Carlo simulation. In both approaches, the rigidbody docking is performed by Monte Carlo searches. In the first stage of RosettaDock the proteins are represented as backbones plus side-chain centroids, and the search is guided by a residuescale interaction potential. Benefiting from the simplified protein representation, the method was later extended to account for loop flexibility [80]. Lorenzen and Zhang [78] refined initial docking estimates of protein complex structures, generated by an FFT-based method, using a Monte Carlo approach including rigid body moves and side-chain optimization. During the simulation they gradually shifted from a smoothed van der Waals potential, which prevented trapping in local energy minima, to the standard Lennard–Jones potential. Following the simulation, the conformations were clustered to obtain the final predictions. The refinement procedure was able to generate near-native structures (interface rmsd<2.5 Å) as first model in 14 of 59 cases in the benchmark set. Pierce and Weng [79] refined global docking predictions from ZDOCK using RosettaDock, and selected the best models based on their ZRANK score. Refining docking benchmark predictions from ZDOCK led to improved structures of top ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. In addition, the ZRANK energy function was optimized using the refined models. With the new energy function, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. These results show that combinations of independently developed docking protocols (ZDOCK/ZRANK and RosettaDock) can substantially improve protein-docking results. Several studies show convincingly that the accuracy and reliability of docking results can be often significantly improved by combining different classes of methods. For example, Kozakov et al have studied the 30 clusters generated by FFT-based docking by starting RosettaDock runs from random points around the cluster centers, and observing whether a certain fraction of trajectories converge to a small region within the cluster [81]. A cluster was considered stable if such a strong attractor existed and contained a low-energy structure. It was shown that all clusters close to the native structure are stable, and that restricting considerations to stable clusters eliminate around half of the false positives. More generally, improving model accuracy using Monte Carlo methods enables the use of potentials that are more accurate, but also more sensitive to structural errors.

The analysis of docking predictions by the community-wide experiment on the Critical Assessment of Predicted Interactions (CAPRI) shows that Monte Carlo based methods can yield highly accurate models, but the search is restricted to a neighborhood of the starting structures [31]. In principle, a fairly accurate structure of the complex contains information about binding energy/affinity and a number of structural features have been shown to correlate with affinity [47]. We asked whether it is possible to predict affinity by rigid-body approaches followed by one or several RosettaDock refinements if productive encounter complex structure is known. The Version 1 structure-affinity Protein Docking Benchmark of four laboratories [15], which is a nonredundant set of 144 protein–protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods, and updates by Vreven et al [16] was used to assess the performance of RosettaDock refinements. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/ antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners’ affinity for each other, with Kd ranging between 10-5 and 10-14 M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. For sake of simplicity we considered only the cases either of small conformational changes (I_rmsd 2.5 Å).

With the aim of assessment the performance of RosettaDock refinements the unbound structures were superimposed over the bound complex and the resulting superposed structure was used as the starting structure for local docking. We first prepared each docking partner in isolation, optimizing their side-chain conformations prior to docking using docking_local_refine option. The same procedure was applied to the structures of complexes. –11.782 For the assessment of RosettaDock performance we used the convergence of the starting structures to the structure of the bound complex. To do that we calculated the number of refinement runs resulting in convergence (NRDcnv). We judged the RosettaDock refinements as convergent if I_rms deviation from the bound state was <0.5Å and the distinct funnel took place. If such convergence was not reached during 10 refinement runs and the deviation from the bound state remained stable during last five steps the refinement process was judged as nonconvergent. Our results also show a good correlation between binding affinity and interface energy score values (I_sc), which are the total score differences between the components together and the components pulled far apart from each other without their relaxation (repacking). While in our recent paper [82] we used the RosettaDock binding score (RDBS), which is the total score difference between the components together and the components pulled far apart after their relaxation (repacking), to rank different docking solutions by affinities, affinity prediction based on I_sc values seems to be more convenient because I_sc calculations are free from the problems associated with global minimization of pulled apart individual components; these problems are especially grave when membrane proteins or protein complexes are docked. The results of analysis of RosettaDock refinements performance are given in (Table 2)

Table 2: Performance of Rosetta Dock refinements on 25 Version 1 Structure-affinity Benchmark [15] complexes.

ref(a) Complex PDB Unbound 
component_1 PDB
Unbound 
component_1 PDB
Kd
(M)
I_rmsd
[7]
I_sc bound
and funnel (f) 
presence
nRDcnv(b) I_sc
unbound
superimposed on 
bound
after Ncnvb 
refinement runs
and funnel (f) 
presence
7 1AVX_A:B 1QQU_A 1BA7_B 4.8E-10 0.47 -10.3 (f) 3 -8.2 (f)
8 1AVZ_B:C 1AVV_A 1FYN_A 1.6E-05 0.73 -7.0 (f) 3 -6.2(f)
9 1AY7_A:B 1RGH_B 1A19_B 2.0E-10 0.54 -7.4 (mf) 8 -6.5(mf)
12 1BRS_A:D 1A2P_A 1A19_B 2.0E-13 0.42 -12.4(f) 4 -8.9(f)
13 1BUH_A:B 1HCL_A 1DKS_A 7.7E-08 0.75 -6.9(mf) 6 -7.5(f)
15 1BVN_P:T 1PIG_A 1HOE_A 9.2E-12 0.87 -12.5(f) 2 -8.0(f)
23 1E96_A:B 1MH1_A 1HH8_A 2.7E-06 0.71 -7.5(mf) 7 -6.6(mf)
24 1EAW_A:B 1EAX_A 9PTI_A 5.0E-11 0.54 -6.4(mf) 2 -6.6(mf)
26 1EFN_B:A 1AVV_A 1FYN_A 3.8E-08 0.90 -7.8(mf) 5 -5.8(f)
28 1EWY_A:C 1GJR_A 1CZP_A 3.6E-06 0.80 -7.5(mf) 5 -5.2(mf)
30 1F34_A:B 4PEP_A 1F32_A 1.0E-10 0.93 -8.6(f) 3 -6.3(f)
31 1F6M_A:C 1CL0_A 2TIR_A 2.7E-06 4.90 -6.1(bf) Ncnv10  
35 1FQJ_A:B 1TND_A 1FQI_A 6.7E-08 0.91 -10.2(f) 4 -6.7(f)
37 1GCQ_B:C 1GRI_B 1GCP_B 1.7E-05 0.92 -8.3(f) 7 -5.8(f)
40 1GPW_A:B 1THF_D 1K9V_F 5.0E-9 0.65 -10.0(f) 4 -9.0(f)
46 1HE8_B:A 821P_A 1E8Z_A 3.2E-06 0.92 -5.8(bf) 3 -6.0(f)
48 1I2M_A:B 1QG4_A 1A12_A 2.5E-12 2.12 -13.4(f) Ncnv10  
51 1IBR_A:B 1QG4_A 1F59_A 1.0E-09 2.54 -16.6(f) 7 Ncnv10
54 1J2J_A:B 1O3Y_A 1OXZ_A 1.1.E-6 0.63 -6.2(f) 4 -9.8(f)
58 1JTG_B:A 3GMU_B 1ZG4_A 4.0E-10 0.49 -12.8(f) 3 -6.2(bf)
64 1KTZ_A:B 1TGK_A 1M9Z_A 2.0E-07 0.39 -7.3(f) 3 -7.1(f)
66 1KXQ_H:A 1KXQ_H 1PPI_A 3.5E-09 0.72 -9.3(f) 4  
68 1M10_A:B 1AUQ_A 1M0Z_B 5.8E-09 2.10 -8.0(f) 3  
69 1MAH_A:F 1J06_B 1FSC_A 2.5E-11 0.61 -8.3(f) 3  
75 1NVU_R:S 1LF0_A 2II0_B 1.9E-06 3.09 -16.4 (f) Ncnv10  
80 1PPE_E:I 2PTN_A 1LU0_A 3.0E-12 0.34 -10.8(f) 2 -6.5(f)
81 1PXV_A:B 1BQU_A 1EMR_A 8.0E-06 0.34 -7.2(f) 3 -6.9(f)
82 1PXV_A:C 1X9Y_A 1NYC_A 3.1E-10 2.63 -14.8 (mf(d)) 7 -8.4 (f)
83 1QA9_A:B 1X9Y_A 1CCZ_A 9.0E-06 0.73 -4.6(bf) 3 -4.9(mf)
84 1R0R_E:I 1SCN_E 2GKR_I 2.9E-11 0.45 -11.1(mf) 7 -6.0(f)
89 1T6B_X:Y 1ACC_A 1SHU_X 1.7E-10 0.62 -8.9(f) 7 -7.3(f)
91 1UUG_A:B 3EUG_A 2UGI_B <1E-13 0.77 -10.9(f) 7 -6.9(f)
99 1YVB_A:I 2GHU_A 1CEW_I 6.5E-09 0.51 -9.8(f 7 -8.1(f)
99 2OUL_A:B 3BPF_A 2NNR_A 1.7E-09 0.53 -8.7(f) 4 -7.7(f)
100 1Z0K_A:B 2BME_A 1YZM_A 7.7E-06 0.53 -9.6(mf) 7 -7.3(f)
101 1ZHI_A:B 1M4Z_A 1Z1A_A 2.0E-07 0.68 -7.8(f) 5 -7.2(f)
102 1ZLI_A:B 1KWM_A 2JTO_A 1.3E-09 2.53 -13.8 Ncnv10  
103 1ZM4_A:B 1N0V_C 1XK9_A 1.3E-06 2.94 -6.7(bf) 4 -7.2(f)
104 2A9K_A:B 1U8Z_A 2C8B_X 6.0E-08 0.85 -11.6(f) 7 -7.2
106 2AJF_A:E 1R42_A 2GHV_E 1.6E-08 0.65 -5.8(bf) 9 -6.7(f)
108 2B42_A:B 2DCY_A 1T6E_X 1.1E-09 0.72 -14.5(f) 7 -9.1(f)
110 2BTF_A:P 1IJJ_B 1PNE_A 2.3E-06 0.75 -6.2(f) 5 -5.3(f)
111 2COL_A:B 1FCH_A 1C44_A 1.1E-07 2.62 -6.7 5 -7.0
113 2GOX_A:B 1C3D_A 2GOM_A 1.4E-09 0.60 -8.2(f) 4 -7.9(f)
118 2I9B_E:A 1YWH_A 2I9A_A 3.3E-10 3.79 -12.9(bf) Ncnv10 -5.7(bf)
123 2O3B_A:B 1ZM8_A 1J57_A 3.2E-12 3.13 -7.5(f) Ncnv10 -5.6(bf)
124 2OOB_A:B 2OOA_A 1VJ1_A 6.0E-05 0.85 -6.7(f) 8 -5.2(f)
127 2PCB_A:B 1CCP_A 1HRC_A 1.0E-05 0.45 -4.9(bf) 5 -5.1(f)
128 2PCC_A:B 1CCP_A 1YCC_A 1.6E-06 0.39 -4.9(bf) 3 -4.9(f)
129 2PTC_E:I 2PTN_A 9PTI_A 6.0E-14 0.28 -9.7 (f) Ncnv10 -5.5 (f)
130 2SIC_E:I 1SUP_A 3SSI_A 1.7E-11 0.36 -12.9 (f) 2 -9.9(f)
131 2SNI_E:I 1UBN_A 2CI2_I 2.0E-12 0.35 -9.4 (f) 4 8.3 (f)
132 2TGP_Z:I 1TGB_A 9PTI_A 2.4E-06 0.57 -8.9 (f) 5 -5.5 (f)
133 2UUY_A:B 2PTN_A 2UUX_A 5.6E-09 0.44 -8.9 (f) 5 -7.0 (f)
134 2VDB_A:B 3CX9_A 2J5Y_A 1.5E-10 0.47 -10.4(f) 3 -7.4 (f)
139 3SGB_E:I 2QA9_E 2OVO_A 1.8E-11 0.36 -10.4(f) 5 -7.7
(a) Stands for number in structure-affinity benchmark Version 1.0 [15] (b) Ncnvb stands for number of Rosetta Dock refinements runs resulting in I_rms- deviation from bound conformation < 0.5 Å (c) (f) denotes the presence of funnel (d) (mf) stands for moderate funnel
(e) (bf) stands for bad funnel
(f) Complexes with I_rmsd >2.0 are presented in bold

 It is seen from the Table that the convergence was not achieved for five complexes that display large conformation changes, and also for one complex with small conformational changes.

In the initial global search some unspecific sterical overlap between docking partners is typically tolerated to implicitly account for conformational adjustment of binding partners. For the success of a multistep docking strategy it is necessary that the set of selected rigidly docked structures would contain solutions sufficiently close to the native structure in order to allow for further improvement during the refinement process. Overall, good discrimination of near-native docking poses from the very early stages of rigid-body protein docking is essential step before applying more costly interface refinement to the correct docking solutions. Hence, the initial scoring has to be powerful enough to recognize and preselect a binding mode sufficiently close to the native position and it has to simultaneously tolerate possible inaccuracies, such as atomic overlaps at the interface. An ideal scoring function should recognize favorable native contacts as found in the bound complex and discriminate those from nonnative contacts with lower scores. However, even a best possible arrangement of rigid unbound partner structures (superimposed on the corresponding bound structures in the native complex) can result in small number of native contacts in case of significant backbone and side changes at the interface [5]. This indicates that in case of significant backbone and side changes at the interface the contacting residue or atom pairs of rigidly docked proteins may show little correspondence to the native interface structure. Thus, near-native poses might be overlooked during the rigid docking search even with a powerful scoring function. Recently, the limitations of rigid docking strategies combined with a rescoring step have been systematically investigated by Pons et al., [83]. The authors applied a combination of rigid FFT-correlation based docking and rescoring using the pyDock approach. The protocol showed very good performance for most proteins that undergo minor conformational changes upon complex formation (<1A? Rmsd between unbound and bound structures) but unsatisfactory results for cases in which binding induced significant conformational changes or applications that involved homology modeled proteins. These data and considerations of precedent section suggest that if conformational changes upon binding are moderate and if the productive encounter complex is reliably determined by rigid-body global docking it is possible to predict affinity with a fairly good accuracy. One may expect that the use of several successive refinements taking into account flexibility of side chains may be a promising strategy to predict structure of protein-protein complex and assess affinity. This raises at least two questions: (i) how to determine that some preselected pose determined by rigid docking represents a nearnative productive encounter complex; and (ii) how to assess the conformational flexibility in the absence of experimental data on the structures of both unbound proteins and their complexes. As to the latter question it is appropriate either to use direct experimental data on protein flexibility of proteins, or use implicit reasons based on protein selectivity [84], or alternatively make theoretical estimates [5]. The protein selectivity is a promising feature because it has been determined for a broad number of proteins and strongly correlates with the protein rigidity upon protein-protein association

The most difficult task is the establishment of the structure and affinity of a protein-protein complex when the transfer from encounter complex to final one is coupled with significant changes of backbone structure. Several current programs possess the capacity to take into account backbone flexibility. For example, in Fiber Dock, backbone mobility is modeled by an unlimited number of low and frequency normal modes [85], while in ATTRACT, it is modeled by the first few lowest frequency modes [86].

Classification of transient protein-protein interactions by RosettaDock I_sc scores

Nearly a decade ago, Nooren and Thornton have proposed a two-class classification of transient protein-protein interactions based on the lifetime (or stability) of the protein-protein complex [48]. In this classification weak transient interactions that feature a dynamic oligomeric equilibrium in solution, where the interaction is broken and formed continuously, and strong transient interactions that require a molecular trigger to shift the oligomeric equilibrium have been distinguished. More recently, another classification, based on the binding affinities, has been reported [9]. In accordance with this classification the binding affinities have been categorized into five classes: Very High (Kd 10-5M). Because in accordance with the classification of Nooren and Thornton and experimental data available (reviewed in [9]) Very High and Very Low affinities correspond to permanent binding and weak transient interactions, respectively, whereas the interactions of the remaining three classes can be assigned to Strong Transient ones as requiring a trigger for protein-protein complex dissociation, we have recently categorized all interactions into five classes (Permanent (Kd 10-5M)) [82]. Although a good correlation between binding affinity and scores is absent for most computational docking algorithms, we used an approximate correlation between RosettaDock binding score (RDBS), which was calculated by subtracting the corresponding score of complexes from those of the individual chains, and binding affinity [9] to assign transient interactions to the above four transient interaction classes. RDBS values were used for the qualitative ranking between different complexes by affinities also because we have revealed that for a number of protein-protein complexes from the ProteinProtein Docking Benchmark 3.0 [88], affinity data of which have been measured with isothermal titration calorimetry , the interactions with (RDBS > -5), (-5> RDBS >-30), (-30> RDBS >-80), (-80> RDBS >-180) might be assigned, to a fairly good approximation, to Weak transient (Kd >10-5M), Low – Strong transient (10-6M< Kd 10- 5 M)). The data given in (Table 2) support a quite satisfactory and fairly universal interrelation between RosettaDock interface energy score (I_sc) and the above five classes of transient interaction, whereby for a significant number of protein-protein complexes the interactions with (I_sc > -4), (-4> I_sc >-5.5), (-6.5> I_sc >-8.5), (-8.5> I_sc >-12), (-1610-5M), Low – Strong transient (10-6M< Kd <10-5M), Medium - Strong transient (10-8M< Kd <10-6M), High - Strong transient (10-10M< Kd <10-8M) and Very High-Strong transient interactions (10-14M

 

 

DISCUSSION & CONCLUSION

Protein– protein docking aims to predict the threedimensional structure of a complex from the knowledge of the structure of the individual proteins in aqueous solution. Many docking methods are now able to predict the three-dimensional structure of binary assemblies if the protein partners do not display important conformational changes between their bound and unbound forms. However, large-scale motion upon binding is still a major and unresolved problem in the absence of experimental constraints. This motion generally involves the displacement or the internal rearrangement of loops or domains and can also be characterized by the simultaneous movement of several flexible parts. However, even though the motion upon binding is moderate protein docking often requires the effective usage of several steps to produce accurate predictions. The refinements using approaches sampling conformation space more thoroughly as compared with rigid docking methods, even though in ‘semi flexible manner’, such as this does Monte Carlo minimization method being applied to side chains in RosettaDock protocols [75], present efficient approach to produce accurate predictions of 3D-structure of the protein-protein complexes and affinity. However when the conformational changes upon protein binding are large there are no universally reliable tools to assure accurate predictions. In order to avoid futile attempts to dock flexible proteins using rigid methods, even though with semi flexible refinements, it is necessary to develop and use methods that can predict protein flexibility [89–91] and such approaches allowing the assessment of the flexibility should be used prior to docking

ACKNOWLEDGEMENTS

This work was supported by the Grant M15-055 of Belarusian Basic Research Foundation.

REFERENCES

1. Smith GR, Sternberg MJ. Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol. 2002; 12: 28-35.

2. Gray JJ. High-resolution protein-protein docking. Curr Opin Struct Biol. 2006; 16: 183-193.

3. Ritchie DW. Recent progress and future directions in protein-protein docking. Curr Protein Pept Sci. 2008; 9: 1-15.

4. Andrusier N, Mashiach E, Nussinov R, Wolfson HJ. Principles of flexible protein-protein docking. Proteins. 2008; 73: 271-289.

5. Zacharias M. Accounting for conformational changes during proteinprotein docking. Curr Opin Struct Biol. 2010; 20: 180-186.

6. Kastritis PL, Bonvin AM. Molecular origins of binding affinity: seeking the Archimedean point. Curr Opin Struct Biol. 2013; 23: 868-877.

7. Tuffery P, Derreumaux P. Flexibility and binding affinity in proteinligand, protein-protein and multi-component protein interactions: limitations of current computational approaches. J R Soc Interface 2012; 9: 20-33.

8. Vakser IA. Protein-protein docking: from interaction to interactome. Biophys J. 2014; 107: 1785-1793.

9. Kastritis PL, Bonvin AM. Are scoring functions in protein–protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res. 2010; 9: 2216-2225.

10. Schreiber G, Haran G, Zhou HX. Fundamental aspects of proteinprotein association kinetics. Chem Rev. 2009; 109: 839-860.

11. Ubbink M. The courtship of proteins: understanding the encounter complex. FEBS Lett. 2009; 583: 1060-1066.

12. Tang C, Iwahara J, Clore GM. Visualization of transient encounter complexes in protein-protein association. Nature. 2006; 444: 383- 386.

13. Fawzi NL, Doucleff M, Suh JY, Clore GM. Mechanistic details of a protein-protein association pathway revealed by paramagnetic relaxation enhancement titration measurements. Proc Natl Acad Sci. 2010; 107: 1379–1384.

14. Clore GM. Exploring sparsely populated states of macromolecules by diamagnetic and paramagnetic NMR relaxation. Protein Sci. 2011; 20: 229-246.

15. Kastritis PL, Moal IH, Hwang H, Weng Z, Bates PA, Bonvin AM, et al. A structure-based benchmark for protein-protein binding affinity. Protein Sci. 2011; 20: 482-491.

16. Vreven T, Moal IH, Vangone A, Pierce BG, Kastritis PL, Torchala M, et al. Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. J Mol Biol. 2015; 427: 3031-30341.

17. Berg OG, von Hippel PH. Diffusion-controlled macromolecular interactions. Annu Rev Biophys Biophys Chem. 1985; 14: 131-160.

18. von Hippel PH, Berg OG. Facilitated target location in biological systems. J Biol Chem. 1989; 264: 675-678.

19. Zhou HX, Bates PA. Modeling protein association mechanisms and kinetics. Curr Opin Struct Biol. 2013; 23: 887-893.

20. Schilder J, Ubbink M. Formation of transient protein complexes. Curr Opin Struct Biol. 2013; 23: 911-918.

21. Kozakov D, Li K, Hall DR, Beglov D, Zheng J, Vakili P, et al. Encounter complexes and dimensionality reduction in protein-protein association. Elife. 2014; 3.

22. Kollman P. Free energy calculations: Applications to chemical and biochemical phenomena. Chem Rev 1993; 93: 2395–2417.

23. .Gilson MK, Zhou HX. Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct. 2007; 36: 21-42.

24. Horton N, Lewis M. Calculation of the free energy of association for protein complexes. Protein Sci. 1992; 1: 169-181.

25. Jiang L, Gao Y, Mao F, Liu Z, Lai L. Potential of mean force for proteinprotein interaction studies. Proteins. 2002; 46: 190-196.

26. Ma XH, Wang CX, Li CH, Chen WZ. A fast empirical approach to binding free energy calculations based on protein interface information. Protein Eng. 2002; 15: 677-681.

27. Audie J, Scarlata S. A novel empirical free energy function that explains and predicts protein-protein binding affinities. Biophys Chem. 2007; 129: 198-211.

28. Li YC, Zeng ZH. Empirical parameters for estimating protein-protein binding energies: number of short- and long-distance atom-atom contacts. Protein Pept Lett. 2008; 15: 223–231.

29. Su Y, Zhou A, Xia X, Li W, Sun Z. Quantitative prediction of proteinprotein binding affinity with a potential of mean force considering volume correction. Protein Sci. 2009; 18: 2550-2558.

30. Bai H, Yang K, Yu D, Zhang C, Chen F, Lai L. Predicting kinetic constants of protein-protein interactions based on structural properties. Proteins. 2011; 79: 720-734.

31. Lensink MF, Wodak SJ. Docking, scoring, and affinity prediction in CAPRI. Proteins. 2013; 81: 2082-2095.

32. Vajda S, Kozakov D. Convergence and combination of methods in protein-protein docking. Curr Opin Struct Biol. 2009; 19: 164-170.

33. Cheng TM, Blundell TL, Fernandez-Recio J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking. Proteins. 2007; 68: 503-515.

34. Pierce B, Weng Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins. 2008; 72: 270-279.

35. Pons C, Talavera D, de la Cruz X, Orozco M, Fernandez-Recio J. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein-protein docking. J Chem Inf Model. 2011; 51: 370-377. 

36. Moal IH, Agius R, Bates PA. Protein-protein binding affinity prediction on a diverse set of structures. Bioinformatics. 2011; 27: 3002-3009.

37. Vreven T, Hwang H, Pierce BG, Weng Z. Prediction of protein-protein binding free energies. Protein Sci. 2012; 21: 396-404.

38. Yan Z, Guo L, Hu L, Wang J. Specificity and affinity quantification of protein-protein interactions. Bioinformatics. 2013; 29: 1127-1133.

39. Zhou P, Wang C, Tian F, Ren Y, Yang C, Huang J. Bio macromolecular quantitative structure-activity relationship (BioQSAR): a proof-ofconcept study on the modeling, prediction and interpretation of protein-protein binding affinity. J Comput Aided Mol Des. 2013; 27: 67-78.

40. Janin J. A minimal model of protein-protein binding affinities. Protein Sci. 2014; 23: 1813-1817.

41. Luo J, Guo Y, Zhong Y, Ma D, Li W, Li M. A functional feature analysis on diverse protein-protein interactions: application for the prediction of binding affinity. J Comput Aided Mol Des. 2014; 28: 619-629.

42. Kastritis PL, Rodrigues JP, Folkers GE, Boelens R, Bonvin AM. Proteins feel more than they see: fine-tuning of binding affinity by properties of the non-interacting surface. J Mol Biol. 2014; 426: 2632-2652.

43. Choi JM, Serohijos AW, Murphy S, Lucarelli D, Lofranco LL, Feldman A, et al. Minimalistic predictor of protein binding energy: contribution of solvation factor to protein binding. Biophys J. 2015; 108: 795-798.

44. Moal IH, Fernandez-Recio J. Intermolecular Contact Potentials for Protein-Protein Interactions Extracted from Binding Free Energy Changes upon Mutation. J Chem Theory Comput. 2013; 9: 3715-3727.

45. Moal IH, Dapk?nas J, Fernández-Recio J. Inferring the microscopic surface energy of protein-protein interfaces from mutation data. Proteins. 2015; 83: 640-650.

46. Andrusier N, Nussinov R, Wolfson HJ. Fire Dock: fast interaction refinement in molecular docking. Proteins. 2007; 69: 139-159.

47. Vangone A, Bonvin AM. Contacts-based prediction of binding affinity in protein-protein complexes. Elife. 2015; 4.

48. Nooren IM, Thornton JM. Structural characterisation and functional significance of transient protein-protein interactions. J Mol Biol. 2003; 325: 991-1018.

49. Chothia C, Janin J. Principles of protein-protein recognition. Nature. 1975; 256: 705-708.

50. Moal IH, Jiménez-García B, Fernández-Recio J. CCharPPI web server: computational characterization of protein-protein interactions from structure. Bioinformatics. 2015; 31: 123-125.

51. Liu S, Zhang C, Zhou H, Zhou Y. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins. 2004; 56: 93-101.

52. Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010; 26: 689-691.

53. Viswanath S, Ravikant DV, Elber R. Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins. 2013; 81: 592-606.

54. Ravikant DV, Elber R. PIE-efficient filters and coarse grained potentials for unbound protein-protein docking. Proteins. 2010; 78: 400-419.

55. Erijman A, Rosenthal E, Shifman JM. How structure defines affinity in protein-protein interactions. PLoS One. 2014; 9: 110085.

56. Vakser IA. Low-resolution docking: prediction of complexes for underdetermined structures. Biopolymers. 1996; 39: 455-464.

57. Lorber DM, Udo MK, Shoichet BK. Protein-protein docking with multiple residue conformations and residue substitutions. Protein Sci. 2002; 11: 1393-1408.

58. Zacharias M. Protein-protein docking with a reduced protein model accounting for side-chain flexibility. Protein Sci. 2003; 12: 1271-1282.

59. Bastard K, Prévost C, Zacharias M. Accounting for loop flexibility during protein-protein docking. Proteins. 2006; 62: 956-969.

60. Tovchigrechko A, Vakser IA. Development and testing of an automated approach to protein docking. Proteins. 2005; 60: 296-301.

61. Zhang Q, Sanner M, Olson AJ. Shape complementarity of proteinprotein complexes at multiple resolutions. Proteins. 2009; 75: 453- 467.

62. Audie J. Development and validation of an empirical free energy function for calculating protein-protein binding free energy surfaces. Biophys Chem. 2009; 139: 84-91.

63. Martin O, Schomburg D. Efficient comprehensive scoring of docked protein complexes using probabilistic support vector machines. Proteins. 2008; 70: 1367-1378.

64. Pierce B, Weng Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins. 2008; 72: 270-279.

65. Liang S, Meroueh SO, Wang G, Qiu C, Zhou Y. Consensus scoring for enriching near-native structures from protein-protein docking decoys. Proteins. 2009; 75: 397-403.

66. Huang SY, Zou X. An iterative knowledge-based scoring function for protein-protein recognition. Proteins. 2008; 72: 557-579.

67. Ravikant DV, Elber R. PIE-efficient filters and coarse grained potentials for unbound protein-protein docking. Proteins. 2010; 78: 400-419.

68. Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein-docking algorithm. Proteins. 2003; 52: 80-87.

69. Kowalsman N, Eisenstein M. Combining interface core and whole interface descriptors in posts can processing of protein-protein docking models. Proteins. 2009; 77: 297-318.

70. Totrov M, Abagyan R. Flexible ligand docking to multiple receptor conformations: a practical alternative. Curr Opin Struct Biol. 2008; 18: 178-184.

71. Król M, Chaleil RA, Tournier AL, Bates PA. Implicit flexibility in protein docking: cross-docking and local refinement. Proteins. 2007; 69: 750- 757.

72. Mustard D, Ritchie DW. Docking essential dynamics eigen structures. Proteins. 2005; 60: 269-274.

73. Chaudhury S, Gray JJ. Conformer selection and induced fit in flexible backbone protein-protein docking using computational and NMR ensembles. J Mol Biol. 2008; 381: 1068-1087.

74. Wang C, Bradley P, Baker D. Protein-protein docking with backbone flexibility. J Mol Biol. 2007; 373: 503-519.

75. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, et al. Protein-protein docking with simultaneous optimization of rigidbody displacement and side-chain conformations. J Mol Biol. 2003; 331: 281-299.

76. Schueler-Furman O, Wang C, Baker D. Progress in protein-protein docking: atomic resolution predictions in the CAPRI experiment using Rosetta Dock with an improved treatment of side-chain flexibility. Proteins. 2005; 60: 187-194.

77. Fernández-Recio J, Totrov M, Abagyan R. ICM-DISCO docking by global energy optimization with fully flexible side-chains. Proteins. 2003; 52: 113-117. 

78. Lorenzen S, Zhang Y. Monte Carlo refinement of rigid-body protein docking structures with backbone displacement and side-chain optimization. Protein Sci. 2007; 16: 2716-2725.

79. Pierce B, Weng Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins. 2008; 72: 270-279.

80. Wang C, Bradley P, Baker D. Protein-protein docking with backbone flexibility. J Mol Biol. 2007; 373: 503-519.

81. Kozakov D, Schueler-Furman O, Vajda S. Discrimination of near-native structures in protein-protein docking by testing the stability of local minima. Proteins. 2008; 72: 993-1004.

82. Veresov VG, Davidovskii AI. Structural insights into proapoptotic signaling mediated by MTCH2, VDAC2, TOM40 and TOM22. Cell Signal. 2014; 26: 370-382.

83. Pons C, Grosdidier S, Solernou A, Pérez-Cano L, Fernández-Recio J. Present and future challenges and limitations in protein-protein docking. Proteins. 2010; 78: 95-108.

84. Schreiber G, Keating AE. Protein binding specificity versus promiscuity. Curr Opin Struct Biol. 2011; 21: 50-61.

85. Mashiach E, Nussinov R, Wolfson HJ. Fiber-Dock: a web server for flexible induced-fit backbone refinement in molecular docking. Nucleic Acids Res. 2010; 38.

86. Jeetendra E, Ghosh K, Odell D, Li J, Ghosh HP, Whitt MA. The membrane-proximal region of vesicular stomatitis virus glycoprotein G ectodomain is critical for fusion and virus infectivity. J Virol. 2003; 77: 12807-12818.

87. May A, Zacharias M. 2008 Energy minimization in low-frequency normal modes to efficiently allow for global flexibility during systematic protein– protein docking. Proteins. 2008; 70: 794 –809.

88. Hwang H, Pierce B, Mintseris J, Janin J, Weng Z. Protein-protein docking benchmark version 3.0. Proteins 2008; 73: 705-709.

89. Cavasotto CN, Kovacs JA, Abagyan RA. Representing receptor flexibility in ligand docking through relevant normal modes. J Am Chem Soc. 2005; 127: 9632–9640.

90. Garzón JI, Kovacs J, Abagyan R, Chacón P. DFprot: a webtool for predicting local chain deformability. Bioinformatics. 2007; 23: 901- 902.

91. Dobbins SE, Lesk VI, Sternberg MJ. Insights into protein flexibility: The relationship between normal modes and conformational change upon protein-protein docking. Proc Natl Acad Sci U S A. 2008; 105: 10390- 10395.

Veresov VG (2016) Predicting Protein-Protein Binding Affinity by In silico Docking. JSM Chem 4(1): 1019

Received : 19 Mar 2016
Accepted : 11 Apr 2016
Published : 13 Apr 2016
Journals
Annals of Otolaryngology and Rhinology
ISSN : 2379-948X
Launched : 2014
JSM Schizophrenia
Launched : 2016
Journal of Nausea
Launched : 2020
JSM Internal Medicine
Launched : 2016
JSM Hepatitis
Launched : 2016
JSM Oro Facial Surgeries
ISSN : 2578-3211
Launched : 2016
Journal of Human Nutrition and Food Science
ISSN : 2333-6706
Launched : 2013
JSM Regenerative Medicine and Bioengineering
ISSN : 2379-0490
Launched : 2013
JSM Spine
ISSN : 2578-3181
Launched : 2016
Archives of Palliative Care
ISSN : 2573-1165
Launched : 2016
JSM Nutritional Disorders
ISSN : 2578-3203
Launched : 2017
Annals of Neurodegenerative Disorders
ISSN : 2476-2032
Launched : 2016
Journal of Fever
ISSN : 2641-7782
Launched : 2017
JSM Bone Marrow Research
ISSN : 2578-3351
Launched : 2016
JSM Mathematics and Statistics
ISSN : 2578-3173
Launched : 2014
Journal of Autoimmunity and Research
ISSN : 2573-1173
Launched : 2014
JSM Arthritis
ISSN : 2475-9155
Launched : 2016
JSM Head and Neck Cancer-Cases and Reviews
ISSN : 2573-1610
Launched : 2016
JSM General Surgery Cases and Images
ISSN : 2573-1564
Launched : 2016
JSM Anatomy and Physiology
ISSN : 2573-1262
Launched : 2016
JSM Dental Surgery
ISSN : 2573-1548
Launched : 2016
Annals of Emergency Surgery
ISSN : 2573-1017
Launched : 2016
Annals of Mens Health and Wellness
ISSN : 2641-7707
Launched : 2017
Journal of Preventive Medicine and Health Care
ISSN : 2576-0084
Launched : 2018
Journal of Chronic Diseases and Management
ISSN : 2573-1300
Launched : 2016
Annals of Vaccines and Immunization
ISSN : 2378-9379
Launched : 2014
JSM Heart Surgery Cases and Images
ISSN : 2578-3157
Launched : 2016
Annals of Reproductive Medicine and Treatment
ISSN : 2573-1092
Launched : 2016
JSM Brain Science
ISSN : 2573-1289
Launched : 2016
JSM Biomarkers
ISSN : 2578-3815
Launched : 2014
JSM Biology
ISSN : 2475-9392
Launched : 2016
Archives of Stem Cell and Research
ISSN : 2578-3580
Launched : 2014
Annals of Clinical and Medical Microbiology
ISSN : 2578-3629
Launched : 2014
JSM Pediatric Surgery
ISSN : 2578-3149
Launched : 2017
Journal of Memory Disorder and Rehabilitation
ISSN : 2578-319X
Launched : 2016
JSM Tropical Medicine and Research
ISSN : 2578-3165
Launched : 2016
JSM Head and Face Medicine
ISSN : 2578-3793
Launched : 2016
JSM Cardiothoracic Surgery
ISSN : 2573-1297
Launched : 2016
JSM Bone and Joint Diseases
ISSN : 2578-3351
Launched : 2017
JSM Bioavailability and Bioequivalence
ISSN : 2641-7812
Launched : 2017
JSM Atherosclerosis
ISSN : 2573-1270
Launched : 2016
Journal of Genitourinary Disorders
ISSN : 2641-7790
Launched : 2017
Journal of Fractures and Sprains
ISSN : 2578-3831
Launched : 2016
Journal of Autism and Epilepsy
ISSN : 2641-7774
Launched : 2016
Annals of Marine Biology and Research
ISSN : 2573-105X
Launched : 2014
JSM Health Education & Primary Health Care
ISSN : 2578-3777
Launched : 2016
JSM Communication Disorders
ISSN : 2578-3807
Launched : 2016
Annals of Musculoskeletal Disorders
ISSN : 2578-3599
Launched : 2016
Annals of Virology and Research
ISSN : 2573-1122
Launched : 2014
JSM Renal Medicine
ISSN : 2573-1637
Launched : 2016
Journal of Muscle Health
ISSN : 2578-3823
Launched : 2016
JSM Genetics and Genomics
ISSN : 2334-1823
Launched : 2013
JSM Anxiety and Depression
ISSN : 2475-9139
Launched : 2016
Clinical Journal of Heart Diseases
ISSN : 2641-7766
Launched : 2016
Annals of Medicinal Chemistry and Research
ISSN : 2378-9336
Launched : 2014
JSM Pain and Management
ISSN : 2578-3378
Launched : 2016
JSM Women's Health
ISSN : 2578-3696
Launched : 2016
Clinical Research in HIV or AIDS
ISSN : 2374-0094
Launched : 2013
Journal of Endocrinology, Diabetes and Obesity
ISSN : 2333-6692
Launched : 2013
Journal of Substance Abuse and Alcoholism
ISSN : 2373-9363
Launched : 2013
JSM Neurosurgery and Spine
ISSN : 2373-9479
Launched : 2013
Journal of Liver and Clinical Research
ISSN : 2379-0830
Launched : 2014
Journal of Drug Design and Research
ISSN : 2379-089X
Launched : 2014
JSM Clinical Oncology and Research
ISSN : 2373-938X
Launched : 2013
JSM Bioinformatics, Genomics and Proteomics
ISSN : 2576-1102
Launched : 2014
Journal of Trauma and Care
ISSN : 2573-1246
Launched : 2014
JSM Surgical Oncology and Research
ISSN : 2578-3688
Launched : 2016
Annals of Food Processing and Preservation
ISSN : 2573-1033
Launched : 2016
Journal of Radiology and Radiation Therapy
ISSN : 2333-7095
Launched : 2013
JSM Physical Medicine and Rehabilitation
ISSN : 2578-3572
Launched : 2016
Annals of Clinical Pathology
ISSN : 2373-9282
Launched : 2013
Annals of Cardiovascular Diseases
ISSN : 2641-7731
Launched : 2016
Journal of Behavior
ISSN : 2576-0076
Launched : 2016
Annals of Clinical and Experimental Metabolism
ISSN : 2572-2492
Launched : 2016
Clinical Research in Infectious Diseases
ISSN : 2379-0636
Launched : 2013
JSM Microbiology
ISSN : 2333-6455
Launched : 2013
Journal of Urology and Research
ISSN : 2379-951X
Launched : 2014
Journal of Family Medicine and Community Health
ISSN : 2379-0547
Launched : 2013
Annals of Pregnancy and Care
ISSN : 2578-336X
Launched : 2017
JSM Cell and Developmental Biology
ISSN : 2379-061X
Launched : 2013
Annals of Aquaculture and Research
ISSN : 2379-0881
Launched : 2014
Clinical Research in Pulmonology
ISSN : 2333-6625
Launched : 2013
Journal of Immunology and Clinical Research
ISSN : 2333-6714
Launched : 2013
Annals of Forensic Research and Analysis
ISSN : 2378-9476
Launched : 2014
JSM Biochemistry and Molecular Biology
ISSN : 2333-7109
Launched : 2013
Annals of Breast Cancer Research
ISSN : 2641-7685
Launched : 2016
Annals of Gerontology and Geriatric Research
ISSN : 2378-9409
Launched : 2014
Journal of Sleep Medicine and Disorders
ISSN : 2379-0822
Launched : 2014
JSM Burns and Trauma
ISSN : 2475-9406
Launched : 2016
Chemical Engineering and Process Techniques
ISSN : 2333-6633
Launched : 2013
Annals of Clinical Cytology and Pathology
ISSN : 2475-9430
Launched : 2014
JSM Allergy and Asthma
ISSN : 2573-1254
Launched : 2016
Journal of Neurological Disorders and Stroke
ISSN : 2334-2307
Launched : 2013
Annals of Sports Medicine and Research
ISSN : 2379-0571
Launched : 2014
JSM Sexual Medicine
ISSN : 2578-3718
Launched : 2016
Annals of Vascular Medicine and Research
ISSN : 2378-9344
Launched : 2014
JSM Biotechnology and Biomedical Engineering
ISSN : 2333-7117
Launched : 2013
Journal of Hematology and Transfusion
ISSN : 2333-6684
Launched : 2013
JSM Environmental Science and Ecology
ISSN : 2333-7141
Launched : 2013
Journal of Cardiology and Clinical Research
ISSN : 2333-6676
Launched : 2013
JSM Nanotechnology and Nanomedicine
ISSN : 2334-1815
Launched : 2013
Journal of Ear, Nose and Throat Disorders
ISSN : 2475-9473
Launched : 2016
JSM Ophthalmology
ISSN : 2333-6447
Launched : 2013
Journal of Pharmacology and Clinical Toxicology
ISSN : 2333-7079
Launched : 2013
Annals of Psychiatry and Mental Health
ISSN : 2374-0124
Launched : 2013
Medical Journal of Obstetrics and Gynecology
ISSN : 2333-6439
Launched : 2013
Annals of Pediatrics and Child Health
ISSN : 2373-9312
Launched : 2013
JSM Clinical Pharmaceutics
ISSN : 2379-9498
Launched : 2014
JSM Foot and Ankle
ISSN : 2475-9112
Launched : 2016
JSM Alzheimer's Disease and Related Dementia
ISSN : 2378-9565
Launched : 2014
Journal of Addiction Medicine and Therapy
ISSN : 2333-665X
Launched : 2013
Journal of Veterinary Medicine and Research
ISSN : 2378-931X
Launched : 2013
Annals of Public Health and Research
ISSN : 2378-9328
Launched : 2014
Annals of Orthopedics and Rheumatology
ISSN : 2373-9290
Launched : 2013
Journal of Clinical Nephrology and Research
ISSN : 2379-0652
Launched : 2014
Annals of Community Medicine and Practice
ISSN : 2475-9465
Launched : 2014
Annals of Biometrics and Biostatistics
ISSN : 2374-0116
Launched : 2013
JSM Clinical Case Reports
ISSN : 2373-9819
Launched : 2013
Journal of Cancer Biology and Research
ISSN : 2373-9436
Launched : 2013
Journal of Surgery and Transplantation Science
ISSN : 2379-0911
Launched : 2013
Journal of Dermatology and Clinical Research
ISSN : 2373-9371
Launched : 2013
JSM Gastroenterology and Hepatology
ISSN : 2373-9487
Launched : 2013
Annals of Nursing and Practice
ISSN : 2379-9501
Launched : 2014
JSM Dentistry
ISSN : 2333-7133
Launched : 2013
Author Information X