WANG Lu ZHANG Yn-Min LU Shui TANG Wei-Fng CHEN Y-Dong LU To, b LIU Hi-Chun
?
Protein Flexibility and Multiple Docking in Ligand Docking and Virtual Screening to the BRAF (Type I1/2) Inhibitors①
WANG LuaZHANG Yan-MinaLU ShuaiaTANG Wei-FangaCHEN Ya-DongaLU Taoa, b②LIU Hai-Chuna②
a(211198)b(210009)
BRAF has been recognized as a promising target for cancer therapy. A number of crystal structures have been published. Molecular docking is one of the most effective techniques in the field of computer-aided drug design (CADD). Appropriate protein conformation and docking method are essential for the successful virtual screening experiments. One approach considering protein flexibility and multiple docking methods was proposed in this study. Six DFG-in/C-helix-out crystal structures of BRAF, three docking programs (Glide, GOLD and LigandFit) and 12 scoring functions were applied for the best combination by judging from the results of pose prediction and retrospective virtual screening (VS). The most accurate results (mean RMSD of about 0.6?) of pose prediction were obtained with two complex structures (PDB: 3C4C and 3SKC) using Glide SP. From the retrospective VS, the most active compounds were identified by using the complex structure of 3SKC, indicated by a ROC/AUC score of 0.998 and an EF of 20.6 at 5% of the database screen with Glide-SP. On the whole, PDB 3SKC could achieve a higher rate of correct reproduction, a better enrichment and more diverse compounds. A comparison of 3SKC and the other X-ray crystal structures led to a rationale for the docking results. PDB 3SKC could achieve a broad range of sulfonamide substitutions through an expanded hydrophobic pocket formed by a further shift of theC-helix. Our study emphasized the necessity and significance of protein flexibility and scoring functions in both ligand docking and virtual screening.
BRAF type I1/2 inhibitors, protein flexibility, multiple docking methods, pose prediction, virtual screening;
The Ras/Raf/MEK/ERK (mitogen activated protein kinase, MAPK) signaling pathway trans- duces signals from cell surface receptors to the nucleus, leading to cellular proliferation, differentia- tion, and survival[1]. The Raf serine/threonine protein kinases are key components of the MAPK cascade. Mutations of BRAF gene lead to MAPK pathway amplification via constitutive activation of BRAF kinase and are detected in approximately 7% of all cancers[2]. Over 90% of the mutations in BRAF are glutamate for valine substitution at residue 600 (V600E). This mutation leads to the elevation of constitutive kinase activity for about 500~700 fold greater than BRAFWT[3-5]. On the basis, BRAF has been recognized as an excellent target for cancer therapy. Development of small- molecule inhibitors of BRAFV600Eis an attractive strategy for cancer therapy, especially melanoma[6-8]. In fact, many classes of BRAF inhibitors have been discovered[9, 10]. The selective BRAF inhibitors, vemurafenib (PLX4032)[11]and dabrafenib (GSK2118436)[12]have shown impressive results against metastatic and unresectable melanoma, and received FDA’s approval (Vemurafenib in 2011; dabrafenib in 2013). These two BRAF inhibitors which recognize the BRAF in an engaging DFG-in/C-helix-out inactive conformation are classified as Type I1/2 inhibitors. They can confer significant enhancements in potency and selectivity.
Computational techniques have become routine in drug discovery and development.Molecular docking is one of the most powerful in silico techniques in the field of molecular modeling and CADD[13-20].Particularly with more and more crystal structures solved for proteins, molecular docking is increasingly used for lead discovery and optimization. Two major challenges of molecular docking are protein flexibility and scoring function. Appropriate protein conformation and docking method (or scoring function) are essential for successful virtual screening.
In this study, our approach was evaluated on the basis of virtual screening for BRAF (Type I1/2) inhibitors by utilizing six crystal complex structures and three different docking algorithms, to account for several protein structural variations and scoring functions. Molecular docking was performed by three different programs: Glide[21], GOLD[22], and LigandFit[23], along with 12 different representative force-filed based, empirical, and knowledge based scoring functions. In particular, GOLD was used with the GOLD scoring function for three different speed modes, while Glide was used with the default scoring functions for three different levels of docking accuracy, and LigandFit with six different scoring functions, including LigScore1, LigScore2, PLP1, PLP2, Jain, and PMF. The results underscore the necessity and significance of protein flexibility and scoring functions in both pose prediction and virtual screening. A comparison of the six X-ray crystal structures of BRAF led to a rationale for the final docking results. The effects of protein flexibility were analyzed.
The six PDB files 3C4C, 3C4D, 3OG7, 3SKC, 3TV4 and 3TV6 (resolution 2.45~3.40 ?) for the BRAF kinase adopted in this study were obtained from the Protein Data Bank[24-27]. All the structures (Table 1) were prepared by Discovery Studio 2.5. All these complex structures were superimposed onto the reference structure 3OG7 (resolution 2.45 ?). The corresponding ligands (Fig. 1) were extracted and atom types were assigned using CHARMm force field and partial charges were added by Gasteiger algorithm.
Table 1. Protein Structures Used in This Work
Fig. 1. BRAF type I1/2 ligand structures in PDBs
Decoy database containing BRAF kinase’s DUD of Type I1/2 inhibitors collected from litera- tures[24-28]and deceptive decoys was generated using the online automated tool, DUD-E (http://de- coys.docking.org)[29]. The active inhibitors collected almost covered all types of reported Type I1/2 BRAF kinase inhibitors to increase scaffold diver- sity. Decoys to each inhibitor were obtained from the ZINC database (http://zinc.docking.org)[30]. Physical properties of decoys were similar to inhibi- tors, while chemical structures were dissimilar.
2. 3. 1 Glide
Glide 5.5[21]is the first docking program used in this study. Glide grids were generated with ligand scaling of 0.8 for the van der Waals (vdW) radii. The force field used for grid generation was OPLS_2005. Inhibitors were docked into their respective binding pockets and ranked using high throughput virtual screening (HTVS), standard precision (SP) and extra precision (XP) with dif- ferent levels of accuracy in the docking and scoring processes. Docking experiments were performed with default settings and the top ranked pose per ligand was saved for further consideration.
2. 3. 2 GOLD
The second docking program used is GOLD 4.0[22]. GOLD score was chosen as a fitness func- tion, and the three different speed modes-default settings (default), 2~3 times speed up settings (3) and 7~8 times speedup settings (7~8) were set in all calculations. Default cutoff values of 2.5 ? for hydrogen-bonds and 4.0 ? for van der Waals interactions were employed. All single bonds were treated as rotatable and a 10.0 ? radius from the original position of the co-crystallized reference ligand was used to define the active site. The top ranked docking pose was saved for each ligand.
2. 3. 3 LigandFit
The third docking program used is LigandFit[23]implemented in Discovery Studio 2.5. The binding sites were defined by collecting a cavity within a 10 ? radius from the ligand atoms. Conformational sampling of the ligands was performed employing Monte Carlo simulations (10,000 trials). Each docked pose was further fitted into the binding pocket through a number of rigid body minimi- zation. A maximum of 10 poses was saved for each molecule to subsequently be scored by LigScore1, LigScore2, PLP1, PLP2, Jain and PMF. Shape similarity with the binding pocket was used as a metric for pose selection, followed by a calculation of the interaction energies for those poses that do not fit with the shape criteria. Both van der Waals and electrostatic energy terms were considered in the calculations.
2. 4. 1 Pose prediction accuracy
Cognate docking and cross-docking experiments were performed using different crystal structures with Glide, GOLD and LigandFit. It is more objective to assess the accuracy of cognate and crossing docking with pose similarity between docking and crystal conformations. Docking accura- cy is considered higher if the RMSD between the docked and co-crystal pose is lower. To cognate docking, the common threshold to be considered docked correctly is 2.0 ?. However, to cross- docking, due to variations in the position of the backbone when overlaying the structures, we pre- ferred to use a threshold value of 2.5 ?.
2. 4. 2 Virtual screening accuracy
Capability of identifying the bioactive ones was evaluated by virtual screening a compound database containing 83 known BRAF inhibitors and 4650 drug-like decoys. ROC/AUC score and enrichment factor (EF) values were applied as the criteria to judge the influence on accuracy made by different protein structures and scoring functions[31, 32].
Active ligands are usually top ranked in a successful docking-based virtual screening model. Receiver Operating Curves (ROC) is a useful tool to characterize the ability of a model to distinguish active compounds from inactive ones[33-35]. A ROC curve plots the percentage rank of true positives n (actives found) on the Y axis against the percentage rank of false positives (decoys found) on the X axis. The area under the ROC curve is used to measure the overall performance of each model. If the plot is a diagonal line, the AUC has a value of 0.5. This curve results from a model that cannot discriminate active and decoy compounds. The ideal perfor- mance, where all active compounds are identified before any decoys, gives a curve that rises vertically from (0, 0) to (0, 100) and then runs to (100, 100). This curve has an AUC of 1.0.
EF is used as another measure method to assess a model’s performance. EFs were calculated at 0.5, 1, 2 and 5 % of the virtual screening results using the following equation[36]:
where Hits refers to the number of actives and N refers to the number of compounds. EF is employed for recovering a fraction of the known actives after a fraction of the database has been screened. EF(x%) is defined as the fraction of the actives recovered after x% of the database with decoys and actives have been screened. The maxima possible enrich- ment factor (EFmax) at 0.5, 1, 2 and 5% of the screen were calculated respectively using the equation:
In this study, an extensive comparison of different crystal structures and docking protocols was carried out to identify a reliable model. The available crystal structures of BRAF kinase in complex with various inhibitors were selected and prepared. The performance of reproducing crystal conformation and distinguishing inhibitors from decoys was evaluated. These data, coupled crystal structures were used to analyze the effects of protein flexibility.
A suitable X-ray complex structure is a critical determinant for the ligand pose reproduced. First, the accuracy in pose prediction was evaluated by cognate-docking the BRAF PDBs containing the cognate ligands using afore mentioned docking/scoring protocols.Results indicated that many disadvantages still exist, in particular, related to protein flexibility. The use of a single rigid protein conformation could yield significant docking errors. Cross-docking to ensembleprotein conformations is one solution to protein flexibility. The accuracy of cognate- and cross-docking and the impact of protein flexibility and docking/scoring protocols in ligand docking were reported.
3. 1. 1 Accuracy of cognate-docking
The six different BRAF co-crystallized ligands were self-docked into their respective protein structures using the three docking programs. Table 2 and Fig. 2 show the RMSDs returned for each PDB and each docking function with the highest score pose.
Table 2. RMSD Values between Crystal Structure and Top Scoring Pose (Cognate Docking)
The native ligands were docked to the BRAF structures with remarkable accuracy, especially Type I1/2 BRAF inhibitors. The rates of correct pose reproduction were different for the individual dif-ferent docking methods. As shown in Table 2, Glide, GOLD and LigandFit docking had the 94% (except HTVS-3TV4 RMSD 2.36 ?), 100% and 75% accurate docking of Type I1/2 native ligands within 2.0 ?. For the individual docking modes, the following rates were achieved: Glide-SP, Glide-XP, GOLD default settings (default), 2~3 times speed up settings (3) and 7~8 times speedup settings (7~8) algorithms have similar correct rates 100%. The Glide-HTVS and LigandFit with LigScore1, LigScore2, PLP1, PLP2 and Jain scoring functions also have similar correct rates of 5/6. LigandFit with PMF score has the worst hit rate, only 2 correct docking in cognate-docking.
Fig. 2. RMSD values between crystal structure and top scoring pose (cognate docking)
When docking the ligands into the target, a program that is good for docking into one protein structure is not necessarily good for docking into another protein structure. The accuracy for ligands docked with the correct pose depends on the protein structure and the docking program. All crystal structures had correct docking poses within 2.0 ? RMSD using all GOLD speedup settings and Glide SP, XP modes. 3TV4 had RMSD 2.36 ? with Glide HTVS docking mode. For LigandFit method, the native 3OG7 ligand was docked without 2.0 ? RMSD with LigScore1, LigScore2, PLP1, PLP2 and Jain score. Only the 3C4C and 3TV4 ligands were docked within 2.0 ? RMSD with PMF. Asawhole, Glide and GOLD docking score functions are superior to LigandFit docking score functions, and all of their scoring functions achieved satisfactory results for all crystal structures. All crystal structures except 3OG7 achieved satisfactory results for all scoring functions. It is hard to judge which protein structure or docking program is better by cognate ligand docking.
3. 1. 2 Accuracy of cross-docking
The cross-docking strategy makes use of an ensemble of N crystal structures of a target protein bound with different ligands. Each ligand is then extracted from its native crystal structure and docked to each of the protein structures of the ensemble yielding a matrix of N × N docked complexes. The accuracy of cross-docking and the impact of receptor flexibility in ligand docking were reported in Table S1 (Table 3 and Fig. 3 shown cross-docking RMSD values with PDB 3SKC). As shown in Table S1, depending on the protein structure and docking program, the ratio of the ligands docked with the correct pose (RMSD < 2.5 ?) was from 0.00% to 100%. On average, only 74.3% of the ligands were cross-docked correctly.
As a result, the cross-docking ability of the GOLD 7-8, 3and default docking-scoring method was similar, and they had the same 100% rate of correct pose reproduction with the Type I1/2 ligands. Mean-while, these three modes obtainedsimilar average RMSD, about 1 ?. Glide-SP and Glide-XP modes havelower average RMSD (SP-0.65 ?, XP-0.72 ?). However, Glide-HTVS and the LigandFit with six scores had the worst results: 66.7% and 36.1%~61.1%. One notable exception in the performance was with the 3C4D and 3OG7 con-formers with which LigandFit with six scores could only recover one or two ligands within 2.5 ? RMSD while Glide-HTVS recovered all the ligands with the 3C4D conformer and did not recoverone ligand with the 3OG7 conformer within 2.5 ? RMSD.
Table 3. Cross Docking RMSD Values between Crystal Structure and Top Scoring Pose (3SKC)
Comparing the results for PDBs, it can be observed that PDB 3C4C resulted in the lowest average RMSD value (1.88 ?). In addition, PDB 3C4C was the receptor structure with the highest crystallographic resolution (2.45 ?). It can be also observed that PDB 3C4D and 3OG7 resulted in the highest average RMSD value (2.69 ? and 3.98 ?). The large error pose predictions were from LigandFit docking. For LigandFit, using RMSD to evaluate the pose prediction is related to the molecular size. It is well noted for the larger ligands that the RMSD values obtained were clearly higher than the results obtained for the small organic ligand co-crystallized in the rest of PDBs. The 3SKC, 3TV4 and 3TV6 structures had the greatest number of correct pose reproductions below 2.5 ? forall the docking modes (83.3%, 84.7% and 83.3%).
However, if regardless of this LgandFit docking circumstances, only when comparing the results of Glide and GOLD, the cross-docking results were similar with cognate-docking. The rate of correct pose reproduction with the six structures of DFG-in/C-out conformation was the highest achieved (100%) with all three GOLD modes and Glide SP, XP modes. Meanwhile, PDB 3OG7 and 3SKC using the GOLD three docking modes hadlower average RMSD, about 1 ?. While by Glide-SP, Glide-XP docking modes, PDB 3C4C and 3SKC had the lowest average RMSD. PDB 3C4C had the average RMSD 0.5696 ? (SP) and 0.6528 ? (XP); PDB 3SKC had the average RMSD 0.6043 ? (SP) and 0.7051 ? (XP). The most rigorous computational docking method XP performed not better than the faster SP in recovering the native poses within 2.5 ? RMSD.
In terms of conformations for all cases, SP performed better than XP although not by a remarkable margin. In general, we observed that if a particular protein conformation was used in the docking, SP was slightly better than XP. The results further illustrated the importance of protein flexi-bility. As expected, we can get more rapid and better docking results if good structure was selected. As expected, the fastest and most crude docking method, Glide HTVS, was clearly inferior to the docking performance of SP and XP. Meanwhile, SP and XP were much better than GOLD which was much better than LgandFit, Glide HTVS. Overall, PDB 3C4C, 3SKC with Glide and PDB 3OG7, 3SKC with GOLD had the satisfactory results and PDB 3C4C with LigandFit obtained the acceptable result. In all cases, flexible receptor docking with the Glide/SP protocol achieved very accurate RMSD values.
Fig. 3. Cross docking RMSD values with crystal structure (PDB 3SKC)
Indeed, the results showed that some ligands could dock to alternate protein conformations with lower RMSD than when docked to their cognate receptor conformation.
The choice of the most appropriate receptor structure and screening method are the key for successful virtual screening. The ideal structure should have a high success rate in the prediction of binding mode and in recovery of known ligands. Prediction of binding modes considers the most successful area in docking and scoring, whereas scoring functions remain too primitive to rank a docking hit-list correctly. In reality, possible lead compounds are generally chosen according to scoring rank. Therefore, we determined the optimal structures for virtual screening mainly by the enrichment in recovery of known ligands (retrospective virtual screening). The virtual screenings were conducted using a library of 4,650 drug-like decoy compounds, which enriched with 83 known BRAF inhibitors. In addition, we also performed flexible docking with Glide, GOLD and LigandFit to assess 12 score functions.ROC-AUC values were shown for each PDB and each scoring function. We also showed the early recovery performance in terms of EF and true Actives at 0.5%, 1%, 2% and 5% of the screened database, respectively. All the results were presented in Figs. 4~6 and 7~9, summarized in Tables S2~4.
Fig. 4. Results of Glide docking model for BRAF structures based on the enrichment factor obtained at 0.5%, 1%, 2% and 5% of the virtual screening
Fig. 5. Results of GOLD docking model for BRAF structures based on the enrichment factor obtained at 0.5%, 1%, 2% and 5% of the virtual screening
Fig. 6. Results of LigandFit docking model for BRAF structures based on the enrichment factor obtained at 0.5%, 1%, 2% and 5% of the virtual screening
3. 2. 1 Enrichment factor evaluation
For the six protein structures tested, the Glide score functions provided very good results for all protein structures. The 96% active compounds were retrieved in the top 5% of the database, and 50% were present in the top 1%, indicating that Glide was an efficient program. Comparing HTVS, SP, XP modes, SP modes displayed the best performance at 0.5%, 1%, 2% and 5% of the screened database on all structures. HTVS performed worse than SP and XP modes at 2% and 5% of the screened database on all structures. But it was interesting to note that HTVS had equal and even better performance than Glide two advanced modes within the top 0.5% and 1%. According to the results, we suggested that Glide HTVS docking mode first was used to reduce screened database compound number. Comparing SP and XP modes, SP and XP modes displayed similar performance at 5% of the screened database. However, at 0.5%, 1% and 2% of the screened database, enrichment factors with SP modes were more successful than XP modes. Overall, Glide-SP appeared to be the best scoring function.
GOLDscore was poor since the results were unable to retrieve at least 20% of the actives in the top 5% of the database with most protein structures. For LigandFit docking method, Jain and PMF scoring functions did not perform well on all the six protein structures. LigScore1 and LigScore2 displayed similar performance on all structures, with a slight advantage for LigScore2 (more efficient on PDB 3SKC and PDB 3TV6). Both PLP scoring functions succeeded in retrieving all actives within the top 0.5% and 1%, retrieving most of the actives within the top 5% on PDB 3SKC and 3TV6 (i.e., about 200 compounds). Yet, we noted that PLP1 tended to be better than PLP2 in most cases.
Comparing early recovery values obtained for all scores, it can be observed that Glide-SP/Glidescore obtains the highest average EF (54.65 at 0.5%, 54.65 at 1%, 45.62 at 2% and 19.57 at 5% of the screened database) taking into account all PDBs used. Indeed, our results showed that Glide-SP docking mode and Glidescore were successful with regard to the enrichment of actives for the proteins investigated. Glide-SP docking mode and Glidescore scoring outperformed the other scoring functions. The Glide-SP/Glidescore was the best combination.
Then, we searched for each structure how many actives were present in the top 0.5%, 1%, 2% and 5% molecules with Glide-SP/Glidescore. After Glide- SP/Glidescore protocol, PDB 3C4C and 3TV6 had 100% actives at 0.5% of the screened database. While at 1% and 2% of the screened database, PDB 3TV6 had the highest actives. But at 5% of the screened database, PDB 3SKC had the highest 100% actives in the top about 200 molecules.
Taking into account other scoring functions used, PDB 3SKC and 3TV6 had 100% actives at 0.5% and 1% of database screened, and 3TV4 also had 100% actives at 0.5% of the screened database after LigandFit and PLP1, PLP2 redocking/scoring. PDB 3SKC had 100% actives at 0.5% and 1% of database screened, and 3TV6 had 100% actives at 1% of the screened database after LigandFit and LigScore1, LigScore2 redocking/scoring. In the top 5% about 200 molecules, PDB 3SKC also had about 94% actives, EF 18.85. All PDB cases were not treated better by GOLD at all level of selection.
We thus observed for PDB 3SKC and 3TV6 an improved enrichment as compared to our computa- tions while the results for other PDBs were not only slightly better. Concretely, comparing early recovery values obtained for all PDBs, it can be observed that PDB 3SKC and 3TV6 obtained the highest top EF values (values shown in red bold in Table S2-4) and also the highest average EF by all scoring functions used, even when scoring functions with poorest results.
3. 2. 2 ROC/AUC values evaluation
It can be interesting to use other representations than enrichment graphs to show virtual screening results. Several authors have suggested the impor- tance of using Receiver Operating Curves (ROC) analysis and/or to compute the area under the ROC curve. We applied this approach to our screening experiments and the results are presented in Table S2~4 and Fig. 7~9. Fig. 7~9 showed ROC/AUC values for the virtual screening performed on BRAF binding site conformations. Comparing AUC values of screened decoys for all PDBs taking into account all scoring functions used, ROC/AUC values ranged from 0.574 to 0.998.
By considering each scoring function for all PDBs, we found that Glide scoring function was very successful. Glidescores from three levels of Glide docking had ROC/AUC values from 0.795 to 0.998. Almost all ROC/AUC values were above 0.8 except PDB 3OG7/Glide-HTVS. 2/3 of all ROC/AUC values were larger than 0.9. All ROC/AUC values calculated by Glide-SP and Glide-XP were above 0.9.
Fig. 7. A 3D graph showing the ROC/AUC values obtained with three differentlevels glide docking protocols: HTVS, SP and XP
For GOLD scores from three GOLD speed modes, and ROC/AUC values ranged from 0.615 to 0.922. 1/3 of all ROC/AUC values were below 0.8, 1/2 of all ROC/AUC values were within 0.8~0.9, and only 1/6 of all ROC/AUC values were above 0.9. GOLD docking and score methods did not get relatively good results.
Fig. 8. A 3D graph showing the ROC/AUC values obtained with three different speed GOLD docking protocols of 7~8 times speed up (7~8) and three times speed-up (3) and the gold standard mode (default) speed-up settings
After analysis of the present LigandFit docking results on the virtual screening, 75% and 36% of all ROC/AUC values were higher than 0.8 and 0.9, respectively. LigandFit docking and scores had relatively satisfactory performance. Particularly, LigScore and PLP scores all gave good performances with ROC/AUCs of 0.948 to 0.989 for PDB 3SKC and 3TV6.
Fig. 9. A 3D graph showing the ROC/AUC values obtained for various scoring functions with LigandFit docking: Ligscore1, Ligscore2, PLP1, PLP2, Jain, and PMF
Overall analyses of all PDBs, PDB 3SKC and 3TV6 obtained 60% of ROC/AUCs above 0.9. Clearly, for some structures, some scoring functions performed better, but in most cases, when initiating a screening project on a new target, there were generally no evidences that one scoring function and receptor structure performed better than the others. As such, we believe that when a few conformations and docking methods are available, it is reasonable to use the best conformation and scoring function that are known to perform reasonably well. For example, the structure 3SKC identified the most active compounds with a ROC/AUC of 0.998 and an enrichment factor of 20.6 at 5% of the database screen by Glide-SP docking method (Table S2).
Overall, the best docking scoring function after docking by Glide, GOLD and LigandFit was Glidescore with Glide-SP mode (Figs. 7~9, Table S2~S4). These data suggested that Glidecore with Glide-SP mode was a stable and efficient scoring function. It can be observed that PDB 3SKC and 3TV6 obtained the highest top ROC/AUC values (values are shown in bold in Table S2~S4), the highest average AUC value (0.82) among all scoring functions.
As shown in the above data, only some PDB structures obtained the greatest number of correct pose reproductions, while the others identified the most active compounds from all the docking modes. But on the whole, PDB 3SKC could achieve a higher rate of correct reproduction, a better enrichment and more diverse compounds. The results suggested that PDB 3SKC was the most suitable structure of receptor that performed best in both prediction of binding modes and recovery of known ligands.
Fig. 10. An overlay of the six BRAF crystal structures. The conformations aligned by the binding site residues shows the flexibility of the Hinge region, DFG motif, αC-helix, especially for glycine rich loop (circular red circle), which are essential in determining the binding pocket shape (purple for 3SKC)
A comparison of the X-ray crystal structures of BRAF led to a reasonable explanation for the above unforeseen docking results. As shown in Fig. 10, a newly hydrophobic pocket between theC-helix and the N-lobesheets was formed by an outward shift of theC-helix. TheC-helix is similarly positioned in all crystal structures, but the orientation of Phe595 and Gly596 from the DFG sequence is shifted. The substituent group of 3SKC occupying the newly hydrophobic pocket is the aryl sulfonamide, while substituent groups from other crystal complex structures are the propyl. Phe595 and Gly596 are further moved by the aryl group resides. This movement can expand the hydrophobic pocket, enough to accommodate a propyl group as large as a substituted phenyl. The results of protein flexibility analysis effectively supported the above docking data. The above docking data, coupled with the 3SKC crystal structure analysis, suggested that PDB 3SKC could achieve a broad range of sulfonamide substitutions through an expanded hydrophobic pocket formed by a further shift of theC-helix.
A number of crystal structures have been public- shed for drug target. These structures of the same protein have enough conformational diversity. The accuracy of docking is sensitive to different crystal structures and scoring functions. Using multiple structures of the same protein with enough con- formational diversity and various scores according to the different principlemay enhance the accuracy of molecular docking. In this study, docking evaluation of six BRAF crystal structures with 12 scoring functions was carried out in terms of pose prediction and retrospective docking-based virtual screening. With respect to pose prediction, the most accurate results (mean RMSD of about 0.6 ?) were obtained by using PDB 3C4C, 3SKC and Glide-SP/Glidescore scoring function. Analyzing the data obtained from the retrospective docking-based virtual screenings, PDB 3SKC and 3TV6 obtains the highest top ROC/AUC values, also the highest average AUC value (0.82) among all scoring functions. The structure 3SKC identified the most active compounds with a ROC/AUC of 0.998 and an enrichment factor of 20.6 at 5% of the screened database with Glide-SP. Glide was well suited to deal with docking-based virtual screening for BRAF inhibitors. On the whole, PDB 3SKC could achieve a higher rate of correct reproduction, a better enrichment and more diverse compounds. Protein flexibility was analyzed by a comparison of the X-ray crystal structures of BRAF. The above docking data, coupled with protein flexibility analysis, suggested that PDB 3SKC via a further expand of the hydrophobic pocket can be achieved for a broad range of sulfonamide substitu- tions.
When these programs were applied to other targets, or with parameter changes, the balance between computational speed and accuracy will be broken. It is interesting to note that HTVS had equal and even better performance than the two advanced modes (Glide-SP and Glide-XP) within the top 1% of the screened database. The results suggested Glide- HTVS could be firstly used to reduce the compound number.
(1) Peyssonnaux, C.; Eychene, A. The Raf/MEK/ERK pathway: new concepts of activation.2001, 93, 53–62.
(2) Davies, H.; Bignell, G. R.; Cox, C.; Stephens, P.; Edkins, S.; Clegg, S.; Teague, J.; Woffendin, H.; Garnett, M. J.; Bottomley, W.; Davis, N.; Dicks, E.; Ewing, R.; Floyd, Y.; Gray, K.; Hall, S.; Hawes, R.; Hughes, J.; Kosmidou, V.; Menzies, A.; Mould, C.; Parker, A.; Stevens, C.; Watt, S.; Hooper, S.; Wilson, R.; Jayatilake, H.; Gusterson, B. A.; Cooper, C.; Shipley, J.; Hargrave, D.; Pritchard-Jones, K.; Maitland, N.; Chenevix-Trench, G.; Riggins, G. J.; Bigner, D. D.; Palmieri, G.; Cossu, A.; Flanagan, A.; Nicholson, A.; Ho, J. W. C.; Leung, S. Y.; Yuen, S. T.; Weber, B. L.; Seigler, H. F.; Darrow, T. L.; Paterson, H.; Marais, R.; Marshall, C. J.; Wooster, R.; Stratton, M. R.; Futreal, P. A. Mutations of the BRAF gene in human cancer.2002, 417, 949–954.
(3) Wan, P. T.; Garnett, M. J.; Roe, S. M.; Lee, S.; Niculescu-Duvaz, D.; Good, V. M.; Jones, C. M.; Marshall, C. J.; Springer, C. J.; Barford, D.; Marais, R. Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF.. 2004, 116, 855–867.
(4) Samowitz, W. S.; Sweeney, C.; Herrick, J.; Albertsen, H.; Levin, T. R.; Murtaugh, M. A.; Wolff, R. K.; Slattery, M. L. Poor survival associated with the BRAF V600E mutation in microsatellite-stable colon cancers.. 2005, 65, 6063–6069.
(5) Houben, R.; Becker, J. C.; Kappel, A.; Terheyden, P.; Br?cker, E. B.; Goetz, R.; Rapp, U. R. Constitutive activation of the Ras-Raf signaling pathway in metastatic melanoma is associated with poor prognosis.2004, 3, 6–18.
(6) Kuman, R.; Angelini, S.; Czene, K.; Sauroja, I.; Hahka-Kemppinen, M.; Pyrhonen, S.; Hemminki, K. B-Raf mutations in metastatic melanoma: a possible association with clinical outcome.2003, 9, 3362–3368.
(7) Shepherd, C.; Puzanov, I.; Sosman, J. A. B-RAF inhibitors: an evolving role in the therapy of malignant melanoma.2010, 12, 146?152.
(8) Puzanov, I.; Flaherty, K. T. Targeted molecular therapy in melanoma.2010, 29, 196?201.
(9) Zambon, A.; Niculescu-Duvaz, I.; Niculescu-Duvaz, D.; Marais, R.; Springer, C. J. Small molecule inhibitors of BRAF in clinical trials.2012, 22, 789?792.
(10) Kim, D. H.; Sim, T. Novel small molecule Raf kinase inhibitors for targeted cancer therapeutics.2012, 35, 605–615.
(11) Bollag, G.; Tsai, J.; Zhang, J.; Zhang, C.; Ibrahim, P.; Nolop, K.; Hirth, P. Vemurafenib: the first drug approved for BRAF-mutant cancer.2012, 11, 873–886.
(12) Rheault, T. R.; Stellwagen, J. C.; Adjabeng, G. M.; Hornberger, K. R.; Petro, K. G.; Waterson, A. G.; Dickerson, S. H.; Mook, R. A. Jr.; Laquerre, S. G.; King, A. J.; Rossanese, O. W.; Arnone, M. R.; Smitheman, K. N.; Kane-Carson, L. S.; Han, C.; Moorthy, G. S.; Moss, K. G.; Uehling, D. E. Discovery of dabrafenib: a selective inhibitor of Raf Kinases with antitumor activity against B-Raf-driven tumors.2013, 4, 358–362.
(13) Sousa, S. F.; Ribeiro, A. J.; Coimbra, J. T. S.; Neves, R. P. P.; Martins, S. A.; Moorthy, H. N. S.; Fernandes, P. A.; Ramos, M. J. Protein-ligand docking in the new millennium - a retrospective of 10 years in the field.2013, 20, 2296–314.
(14) Meng, X. Y.; Zhang, H. X.; Mezei, M.; Cui, M. Molecular docking: a powerful approach for structure-based drug discovery.2011, 7, 146–157.
(15) Xiong, D.; Ma, Y. Z.; Zhao, Z. X.; Liu, Y. X.; Xiang, Y. Docking and 3D-QSAR analysis on a series of pyridone-based EZH2 inhibitors.2017, 36, 575–588.
(16) Wang, L.; Zhang, Y. M.; Lu, S.; Chen, Y. D.; Lu, T.; Liu, H. C. Molecular docking and 3D-QSAR studies on a series of fused heterocyclic amides as B-Raf inhibitors.2017, 36, 1568–1585.
(17) Huang, C. S.; Tu, W. T.; Luo, M.; Shi, J. C. Molecular docking and design of novel heterodimers of donepezil and huperzine fragments as acetylcholinesterase inhibitors.2016, 35, 839–848.
(18) Qi, C. Y.; Zhang, R.; Huang, G. D.; Wu, W. J. Studies on the conformations and hydrogen bonding of ACE inhibitory tripeptide VEF by all-atom molecular dynamics simulations and molecular docking.. 2017, 36, 189–196.
(19) Ding, K. K.; Kong, X. T.; Wang, J. P.; Lu, L. P.; Zhou, W. F.; Zhan, T. J.; Zhang,C. L.; Zhuang, S. L. Side chains of parabens modulate antiandrogenic activity: in vitro and molecular docking studies.. 2017, 51, 6452–6460.
(20) Zhuang, S. L.; Wang, H. F.; Ding, K. K.; Wang, J. Y.; Pan, L. M.; Lu, Y. L.; Liu, Q. J.; Zhang, C. L. Interactions of benzotriazole UV stabilizers with human serum albumin: atomic insights revealed by biosensors, spectroscopies and molecular dynamics simulations.2016, 144, 1050–1059.
(21) Maestro. V9.0. Schr?dinger, LLC. New York 2011.
(22) GOLD. Version 4.0. Astex Technology, Cambridge 2001.
(23) Discovery Studio Client. v2.5. Accelrys Inc., San Diego 2008.
(24) Tsai, J.; Lee, J. T.; Wang, W.; Zhang, J.; Cho, H.; Mamo, S.; Bremer, R.; Gillette, S.; Kong, J.; Haass, N. K.; Sproesser, K.; Li, L.; Smalley, K. S.; Fong, D.; Zhu, Y. L.; Marimuthu, A.; Nguyen, H.; Lam, B.; Liu, J.; Cheung, I.; Rice, J.; Suzuki, Y.; Luu, C.; Settachatgul, C.; Shellooe, R.; Cantwell, J.; Kim, S. H.; Schlessinger, J.; Zhang, K. Y.; West, B. L.; Powell, B.; Habets, G.; Zhang, C.; Ibrahim, P. N.; Hirth, P.; Artis, D. R.; Herlyn, M.; Bollag, G. Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity.2008, 105, 3041-3046.
(25) Bollag, G.; Hirth, P.; Tsai, J.; Zhang, J.; Ibrahim, P. N.; Cho, H.; Spevak, W.; Zhang, C.; Zhang, Y.; Habets, G.; Burton, E. A.; Wong, B.; Tsang, G.; West, B. L.; Powell, B.; Shellooe, R.; Marimuthu, A.; Nguyen, H.; Zhang, K. Y. J.; Artis, D. R.; Schlessinger, J.; Su, F.; Higgins, B.; Iyer, R.; D’Andrea, K.; Koehler, A.; Stumm, M.; Lin, P. S.; Lee, R. J.; Grippo, J.; Puzanov, I.; Kim, K. B.; Ribas, A.; McArthur, G. A.; Sosman, J. A.; Chapman, P. B.; Flaherty, K. T.; Xu, X.; Nathanson, K. L.; Nolop, K. Clinical efficacy of a RAF inhibitor needs broad target blockade in BRAF-mutant melanoma.2010, 467, 596?599.
(26) Wenglowsky, S.; Ahrendt, K. A.; Buckmelter, A. J.; Feng, B.; Gloor, S. L.; Gradl, S.; Grina, J.; Hansen, J. D.; Laird, E. R.; Lunghofer, P.; Mathieu, S.; Moreno, D.; Newhouse, B.; Ren, L.; Risom, T.; Rudolph, J.; Seo, J.; Sturgis, H. L.; Voegtli, W. C.; Wen, Z. Pyrazolopyridine inhibitors of B-Raf V600E. Part 2: Structure-activity relationships.2011, 21, 5533–5537.
(27) Wenglowsky, S.; Ren, L.; Ahrendt, K. A.; Laird, E. R.; Aliagas, I.; Alicke, B.; Buckmelter, A. J.; Choo, E. F.; Dinkel, V.; Feng, B.; Gloor, S. L.; Gould, S. E.; Gross, S.; Gunzner-Toste, J.; Hansen, J. D.; Hatzivassiliou, G.; Liu, B.; Malesky, K.; Mathieu, S.; Newhouse, B.; Raddatz, N. J.; Ran, Y.; Rana, S.; Randolph, N.; Risom, T.; Rudolph, J.; Savage, S.; Selby, L. T.; Shrag, M.; Song, K.; Sturgis, H. L.; Voegtli, W. C.; Wen, Z.; Willis, B. S.; Woessner, R. D.; Wu, W.; Young, W. B.; Grina, J. Pyrazolopyridine inhibitors of B-RafV600E. Part 1: the development of selective, orally bioavailable, and efficacious inhibitors.2011, 2, 342–347.
(28) Wenglowsky, S.; Moreno, D.; Rudolph, J.; Ran, Y.; Ahrendt, K. A.; Arrigo, A.; Colson, B.; Gloor, S. L.; Hastings, G. Pyrazolopyridine inhibitors of B-RafV600E. Part 3: an increase in aqueous solubility via the disruption of crystal packing.2012, 22, 912-915.
(29) Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better bench marking.2012, 55, 6582–6594.
(30) Irwin, J. J.; Shoichet, B. K. ZINC-a free database of commercially available compounds for virtual screening.2005, 45, 177–182.
(31) McGann, M.; Nicholls, A.; Enyedy, I. The statistics of virtual screening and lead optimization.2015, 29, 923–936.
(32) Swift, R. V.; Jusoh, S. A.; Offutt, T. L.; Li, E. S.; Amaro, R. E. Knowledge-based methods to train and optimize virtual screening ensembles.2016, 56, 830–842.
(33) Fawcett, T. An introduction to ROC analysis.2006, 27, 861–874.
(34) Wang, X.; Ma, J.; George, S. L. ROC curve estimation under test-result-dependent sampling.2013, 14, 160–172
(35) Yavuz, B. M.; Jones, R. M.; DeFlorio-Barker, S.; Vannoy, E.; Dorevitch, S. Receiver-operating characteristics analysis: a new approach to predicting the presence of pathogens in surface waters.2014, 48, 5628–5635.
(36) Anighoro, A.; Rastelli, G. Enrichment factor analyses on G-protein coupled receptors with known crystal structure.2013, 53, 739–743.
Table of contents
1. Table S1. RMSD values between crystal structure and top scoring pose(Cross Docking)
2. Table S2. EF values obtained at 0.5% , 1%, 2% and 5% of the virtual screen with Glide docking: HTVS, SP and XP.
3. Table S3. EF values obtained at 0.5%, 1%, 2% and 5% of the virtual screen with Gold docking: 7-8 times speed up (7-8x), three times speed-up (3x) and the gold standard mode (default) speed-up settings.
4. Table S4. EF values obtained at 0.5%, 1%, 2% and 5% of the virtual screen with LigandFit docking: Ligscore1, Ligscore2, PLP1, PLP2, Jain and PMF.
Table S1. RMSD values between crystal structure and top scoring pose(Cross Docking)
Table S2. EF values obtained at 0.5% , 1%, 2% and 5% of the virtual screen with Glide docking: HTVS, SP and XP.
PDBModes0.5% (24)Actives&EF1% (48)Actives&EF2% (95)Actives&EF5% (236)Actives&EFAUC 3C4CHTVS2457.023845.143923.414210.150.828 SP2457.024755.847746.228019.330.982 XP1842.773541.586036.027818.850.984 3C4D*HTVS2354.654452.274627.614611.110.840 SP2252.274351.087343.828019.330.981 XP921.382124.954426.416816.430.973 30G7*HTVS2354.653238.023319.81368.6990.795 SP2354.654654.657746.228119.570.990 XP1740.393440.396639.627818.850.988 3SKCHTVS2354.654553.465130.615312.810.891 SP2149.904249.907444.428320.060.998 XP2149.904249.907142.628219.810.997 3TV4HTVS2354.653946.334124.614410.630.803 SP2354.654755.847645.628019.330.980 XP1126.142327.325030.017217.400.977 3TV6HTVS2457.024755.845231.215413.050.894 SP2457.024857.028148.628219.810.995 XP2354.654553.467343.828320.060.998 AverageHTVS2354.654148.714426.414611.11 SP2354.654654.657645.628119.57 XP1740.393339.206136.627718.61 EFmax57.0257.0249.8220.06
*Data in bold represents the best values obtained for each docking fuction and protein structure.
Table S3. EF values obtained at 0.5%, 1%, 2% and 5% of the virtual screen with Gold docking: 7-8 times speed up (7-8x), three times speed-up (3x) and the gold standard mode (default) speed-up settings.
PDBModes0.5% (24)Actives&EF1% (48)Actives&EF2% (95)Actives&EF5% (236)Actives&EFAUC 3C4C7-8x00000010.240.615 3x00000010.240.670 default000000000.657 3C4D*7-8x0011.1931.80112.660.849 3x000031.80102.420.848 default000010.6061.450.812 30G7*7-8x12.3822.38116.60317.490.859 3x12.3833.5684.80276.520.870 default0011.1921.20143.380.833 3SKC7-8x0022.3842.40102.420.756 3x000010.6071.690.731 default000010.6040.970.675 3TV47-8x1023.761821.383118.614310.390.891 3x614.261315.442917.415413.050.922 default819.011821.383118.615312.810.922 3TV67-8x37.13910.692213.21419.910.902 3x24.751011.882213.21409.670.896 default24.7578.32159.00307.250.879 Average7-8x2.35.475.36.3011.87.0822.85.51 3x1.53.564.35.1110.56.3023.25.61 default1.74.044.35.118.34.9817.84.30 EFmax57.0257.0249.8220.06
*Data in bold represents the best values obtained for each docking fuction and protein structure.
Table S4. EF values obtained at 0.5%, 1%, 2% and 5% of the virtual screen with LigandFit docking: Ligscore1, Ligscore2, PLP1, PLP2, Jain and PMF
PDBModes0.5% (24)Actives&EF1% (48)Actives&EF2% (95)Actives&EF5% (236)Actives&EFAUC 3C4CLigscore1819.011011.88148.404266.2820.898 Ligscore2511.8867.12895.402194.5910.833 -PLP137.1367.128106.003225.3160.804 -PLP237.1355.9495.402174.1080.786 Jain0011.18810.630.7250.625 -PMF12.3822.37631.801174.1080.829 3C4D*Ligscore11433.262327.323722.214911.840.886 Ligscore21740.392529.72917.41368.6990.919 -PLP11535.641922.572615.61358.4570.901 -PLP21433.261922.572414.41358.4570.889 Jain511.8867.12884.802184.3490.744 -PMF511.881517.822917.414811.60.934 30G7*Ligscore1819.01910.69116.603194.5910.760 Ligscore21023.761214.26148.404194.5910.852 -PLP11330.891315.44169.604204.8330.820 -PLP21126.141315.44148.404194.5910.836 Jain511.8855.9453.00171.6910.594 -PMF12.37633.564106.003256.0410.862 3SKCLigscore12457.024857.027444.427718.610.980 Ligscore22457.024857.027545.027818.850.989 -PLP12457.024857.027243.227818.850.982 -PLP22457.024857.027444.427818.850.981 Jain2149.93642.775935.417217.40.952 -PMF0011.18831.801143.3830.887 3TV4Ligscore12354.653035.643319.81419.9070.895 Ligscore21945.142630.893118.61399.4230.894 -PLP12457.023035.643219.21348.2150.764 -PLP22457.023035.643018.01327.7320.713 Jain1638.022732.083018.01307.2490.574 -PMF0022.37642.401153.6240.746 3TV6Ligscore12457.024654.655231.215814.010.948 Ligscore22457.024553.465432.416214.980.959 -PLP12457.024857.025834.816816.430.955 -PLP22457.024857.025935.416716.190.957 Jain1945.143238.024828.815312.810.876 -PMF24.75233.56484.802358.4570.903 AverageLigscore116.839.9227.732.9136.822.094510.87 Ligscore216.539.22732.0835.321.1942.210.2 -PLP117.240.8727.332.4335.721.4342.810.34 -PLP216.739.6827.232.313521.0141.39.979 Jain1126.1417.821.1525.215.1330.57.37 -PMF1.53.5644.335.1449.55.70225.76.21 EFmax57.0257.0249.8220.06
*Data in bold represents the best values obtained for each docking fuction and protein structure.
17 November 2017;
8 March 2018
① This project was supported by the National Natural Science Foundation of China (21102181, 81302634 and 21572273)
Lu Tao. E-mail: lutao@cpu.edu.cn; Liu Hai-Chun. E-mail: tuna888@126.com
10.14102/j.cnki.0254-5861.2011-1893