Scoring functions for docking explained

In the fields of computational chemistry and molecular modelling, scoring functions are mathematical functions used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor.[1] Scoring functions have also been developed to predict the strength of intermolecular interactions between two proteins[2] or between protein and DNA.[3]


Scoring functions are widely used in drug discovery and other molecular modelling applications. These include:[4]

A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation calculations.[8]


Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict.

For currently used methods aiming to predict affinities of ligands for proteins the following must first be known or predicted:

The above information yields the three-dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilized within the docking run.


There are four general classes of scoring functions:[9] [10] [11]

The first three types, force-field, empirical and knowledge-based, are commonly referred to as classical scoring functions and are characterized by assuming their contributions to binding are linearly combined. Due to this constraint, classical scoring functions are unable to take advantage of large amounts of training data.[35]


Since different scoring functions are relatively co-linear, consensus scoring functions may not improve accuracy significantly.[36] This claim went somewhat against the prevailing view in the field, since previous studies had suggested that consensus scoring was beneficial.[37]

A perfect scoring function would be able to predict the binding free energy between the ligand and its target. But in reality both the computational methods and the computational resources put restraints to this goal. So most often methods are selected that minimize the number of false positive and false negative ligands. In cases where an experimental training set of data of binding constants and structures are available a simple method has been developed to refine the scoring function used in molecular docking.[38]

Notes and References

  1. Jain AN . Scoring functions for protein-ligand docking . Current Protein & Peptide Science . 7 . 5 . 407–20 . October 2006 . 17073693 . 10.2174/138920306778559395 .
  2. Lensink MF, Méndez R, Wodak SJ . Docking and scoring protein complexes: CAPRI 3rd Edition . Proteins . 69 . 4 . 704–18 . December 2007 . 17918726 . 10.1002/prot.21804 . 25383642 .
  3. Robertson TA, Varani G . An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure . Proteins . 66 . 2 . 359–74 . February 2007 . 17078093 . 10.1002/prot.21162 . 24437518 .
  4. Rajamani R, Good AC . Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development . Current Opinion in Drug Discovery & Development . 10 . 3 . 308–15 . May 2007 . 17554857 .
  5. Seifert MH, Kraus J, Kramer B . Virtual high-throughput screening of molecular databases . Current Opinion in Drug Discovery & Development . 10 . 3 . 298–307 . May 2007 . 17554856 .
  6. Böhm HJ . Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs . Journal of Computer-Aided Molecular Design . 12 . 4 . 309–23 . July 1998 . 9777490 . 10.1023/A:1007999920146 . 1998JCAMD..12..309B . 7474036 .
  7. Joseph-McCarthy D, Baber JC, Feyfant E, Thompson DC, Humblet C . Lead optimization via high-throughput molecular docking . Current Opinion in Drug Discovery & Development . 10 . 3 . 264–74 . May 2007 . 17554852 .
  8. Foloppe N, Hubbard R . Towards predictive ligand design with free-energy based computational methods? . Current Medicinal Chemistry . 13 . 29 . 3583–608 . 2006 . 17168725 . 10.2174/092986706779026165 .
  9. Book: Dhoti . Harren . Leach . Andrew R. . Structure-Based Drug Discovery . Luca A. . Fenu . Richard A. . Lewis . Andrew C. . Good . Michael . Bodkin . Jonathan W. . Essex . vanc . Chapter 9: Scoring Functions: From Free-energies of Binding to Enrichment in Virtual Screening . 2007 . Springer . Dordrecht . 978-1-4020-4407-6 . 223–246 . .
  10. Book: Sotriffer . Christoph . Sotriffer . Christoph . Hans . Matter . vanc . Virtual Screening: Principles, Challenges, and Practical Guidelines . Chapter 7.3: Classes of Scoring Functions . . 48 . 2011 . John Wiley & Sons, Inc. . 978-3-527-63334-0 .
  11. Ain QU, Aleksandrova A, Roessler FD, Ballester PJ . Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening . Wiley Interdisciplinary Reviews: Computational Molecular Science . 5 . 6 . 405–424 . 2015-11-01 . 27110292 . 4832270 . 10.1002/wcms.1225 .
  12. Genheden S, Ryde U . The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities . Expert Opinion on Drug Discovery . 10 . 5 . 449–61 . May 2015 . 25835573 . 4487606 . 10.1517/17460441.2015.1032936 .
  13. Schneider N, Lange G, Hindle S, Klein R, Rarey M . A consistent description of HYdrogen bond and DEhydration energies in protein-ligand complexes: methods behind the HYDE scoring function . Journal of Computer-Aided Molecular Design . 27 . 1 . 15–29 . January 2013 . 23269578 . 10.1007/s10822-012-9626-2 . 2013JCAMD..27...15S . 1545277 .
  14. Lange G, Lesuisse D, Deprez P, Schoot B, Loenze P, Bénard D, Marquette JP, Broto P, Sarubbi E, Mandine E . Requirements for specific binding of low affinity inhibitor fragments to the SH2 domain of (pp60)Src are identical to those for high affinity binding of full length inhibitors . Journal of Medicinal Chemistry . 46 . 24 . 5184–95 . November 2003 . 14613321 . 10.1021/jm020970s .
  15. Muegge I . PMF scoring revisited . Journal of Medicinal Chemistry . 49 . 20 . 5895–902 . October 2006 . 17004705 . 10.1021/jm050038s .
  16. Ballester PJ, Mitchell JB . A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking . Bioinformatics . 26 . 9 . 1169–75 . May 2010 . 20236947 . 3524828 . 10.1093/bioinformatics/btq112 .
  17. Li H, Leung KS, Wong MH, Ballester PJ . Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets . Molecular Informatics . 34 . 2–3 . 115–26 . February 2015 . 27490034 . 10.1002/minf.201400132 . 3444365 .
  18. Ashtawy HM, Mahapatra NR . A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction . IEEE/ACM Transactions on Computational Biology and Bioinformatics . 12 . 2 . 335–47 . 2015-04-01 . 26357221 . 10.1109/TCBB.2014.2351824 . free .
  19. Zhan W, Li D, Che J, Zhang L, Yang B, Hu Y, Liu T, Dong X . Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors . European Journal of Medicinal Chemistry . 75 . 11–20 . March 2014 . 24508830 . 10.1016/j.ejmech.2014.01.019 .
  20. Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE . A machine learning-based method to improve docking scoring functions and its application to drug repurposing . Journal of Chemical Information and Modeling . 51 . 2 . 408–19 . February 2011 . 21291174 . 3076728 . 10.1021/ci100369f .
  21. Li H, ((Sze K-H)), Lu G, Ballester PJ . Machine-Learning Scoring Functions for Structure-Based Drug Lead Optimization . Wiley Interdisciplinary Reviews: Computational Molecular Science . 2020-02-05 . 10 . 5 . 10.1002/wcms.1465 . free .
  22. Li L, Wang B, Meroueh SO . Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries . Journal of Chemical Information and Modeling . 51 . 9 . 2132–8 . September 2011 . 21728360 . 3209528 . 10.1021/ci200078f .
  23. Durrant JD, Friedman AJ, Rogers KE, McCammon JA . Comparing neural-network scoring functions and the state of the art: applications to common library screening . Journal of Chemical Information and Modeling . 53 . 7 . 1726–35 . July 2013 . 23734946 . 3735370 . 10.1021/ci400042y .
  24. Ding B, Wang J, Li N, Wang W . Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening . Journal of Chemical Information and Modeling . 53 . 1 . 114–22 . January 2013 . 23259763 . 3584174 . 10.1021/ci300508m .
  25. Wójcikowski M, Ballester PJ, Siedlecki P . Performance of machine-learning scoring functions in structure-based virtual screening . Scientific Reports . 7 . 46710 . April 2017 . 28440302 . 5404222 . 10.1038/srep46710 . 2017NatSR...746710W .
  26. Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR . Protein-Ligand Scoring with Convolutional Neural Networks . Journal of Chemical Information and Modeling . 57 . 4 . 942–957 . April 2017 . 28368587 . 5479431 . 10.1021/acs.jcim.6b00740 . 1612.02751 .
  27. Li H, Peng J, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ . The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction . Biomolecules . 8 . 1 . 12 . March 2018 . 29538331 . 5871981 . 10.3390/biom8010012 . free .
  28. Imrie F, Bradley AR, Deane CM . Generating Property-Matched Decoy Molecules Using Deep Learning . Bioinformatics . btab080 . February 2021 . 37 . 2134–2141 . 33532838 . 10.1093/bioinformatics/btab080 . 8352508 .
  29. Adeshina YO, Deeds EJ, Karanicolas J . Machine learning classification can reduce false positives in structure-based virtual screening . Proceedings of the National Academy of Sciences of the United States of America . 117 . 31 . 18477–18488 . August 2020 . 32669436 . 7414157 . 10.1073/pnas.2000585117 . 2020PNAS..11718477A . free .
  30. Xiong GL, Ye WL, Shen C, Lu AP, Hou TJ, Cao DS . Improving structure-based virtual screening performance via learning from scoring function components . Briefings in Bioinformatics . bbaa094 . June 2020 . 22 . 32496540 . 10.1093/bib/bbaa094 .
  31. Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T . From Machine Learning to Deep Learning: Advances in Scoring Functions for Protein–ligand Docking . Wiley Interdisciplinary Reviews: Computational Molecular Science . 2019-06-27 . 10 . 10.1002/wcms.1429 . 198336898 .
  32. Yang X, Wang Y, Byrne R, Schneider G, Yang S . Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery . Chemical Reviews . 119 . 18 . 10520–10594 . 2019-07-11 . 10.1021/acs.chemrev.8b00728 . 31294972 . free .
  33. Li H, ((Sze K-H)), Lu G, Ballester PJ . Machine-Learning Scoring Functions for Structure-Based Virtual Screening . Wiley Interdisciplinary Reviews: Computational Molecular Science . 2020-04-22 . 11 . 10.1002/wcms.1478 . 219089637 .
  34. Ballester PJ . Selecting machine-learning scoring functions for structure-based virtual screening . Drug Discovery Today: Technologies . 32-33 . 81–87 . December 2019 . 33386098 . 10.1016/j.ddtec.2020.09.001 . 224968364 . free .
  35. Li H, Peng J, Sidorov P, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ . Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data . Bioinformatics . Oxford, England . 35. 20. 3989–3995. March 2019 . 30873528 . 10.1093/bioinformatics/btz183 .
  36. Englebienne P, Moitessier N . Docking ligands into flexible and solvated macromolecules. 4. Are popular scoring functions accurate for this class of proteins? . Journal of Chemical Information and Modeling . 49 . 6 . 1568–80 . June 2009 . 19445499 . 10.1021/ci8004308 .
  37. Oda A, Tsuchida K, Takakura T, Yamaotsu N, Hirono S . Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes . Journal of Chemical Information and Modeling . 46 . 1 . 380–91 . 2006 . 16426072 . 10.1021/ci050283k .
  38. Hellgren M, Carlsson J, Ostberg LJ, Staab CA, Persson B, Höög JO . Enrichment of ligands with molecular dockings and subsequent characterization for human alcohol dehydrogenase 3 . Cellular and Molecular Life Sciences . 67 . 17 . 3005–15 . September 2010 . 20405162 . 10.1007/s00018-010-0370-2 . 2391130 . 11115504 .