A philosophy for defining what constitutes a virtual high-throughput screen is discussed, and the choices that influence decisions at each stage of the computational funnel are investigated, including an in-depth discussion of the generation of molecular libraries. Additionally, we provide advice on the storing, analysis, and visualization of data on the basis of extensive experience in our research group.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Reymond J-L, van Deursen R, Blum LC, Ruddigkeit L. 1.  2010. Chemical space as a source for new drugs. Med. Chem. Commun. 1:30 [Google Scholar]
  2. Cedar G, Persson K. 2.  2013. How supercomputers will yield a golden age of materials science. Sci. Am.Nov. 19 [Google Scholar]
  3. Lipinski C, Hopkins A. 3.  2004. Navigating chemical space for biology and medicine. Nature 432:855–61 [Google Scholar]
  4. Wermuth C. 4.  2006. Selective optimization of side activities: the SOSA approach. Drug Discov. Today 11:160–64 [Google Scholar]
  5. Wang M, Hu X, Beratan DN, Yang W. 5.  2006. Designing molecules by optimizing potentials. J. Am. Chem. Soc. 128:3228–32 [Google Scholar]
  6. Balawender R, Welearegay MA, Lesiuk M, Proft FD, Geerlings P. 6.  2013. Exploring chemical space with the alchemical derivatives. J. Chem. Theory Comput. 9:5327–40 [Google Scholar]
  7. Tu M, Rai BK, Mathiowetz AM, Didiuk M, Pfefferkorn JA. 7.  et al. 2012. Exploring aromatic chemical space with NEAT: Novel and Electronically equivalent Aromatic Template. J. Chem. Inform. Model. 52:1114–23 [Google Scholar]
  8. Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN. 8.  2013. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J. Am. Chem. Soc. 135:7296–303 [Google Scholar]
  9. Ehrlich HC, Henzler AM, Rarey M. 9.  2013. Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. J. Chem. Inform. Model. 53:1676–88 [Google Scholar]
  10. Hoksza D, Škoda P, Voršilák M, Svozil D. 10.  2014. Molpher: a software framework for systematic chemical space exploration. J. Cheminform. 6:7 [Google Scholar]
  11. Fink T, Bruggesser H, Reymond J-L. 11.  2005. Virtual exploration of the small-molecule chemical universe below 160 Daltons. Angew. Chem. Int. Ed. 44:1504–8 [Google Scholar]
  12. Blum LC, Reymond J-L. 12.  2009. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 131:8732–33 [Google Scholar]
  13. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. 13.  2012. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inform. Model. 52:2864–75 [Google Scholar]
  14. Taniguchi M, Du H, Lindsey JS. 14.  2011. Virtual libraries of tetrapyrrole macrocycles. Combinatorics isomers, product distributions, and data mining. J. Chem. Inform. Model. 51:2233–47 [Google Scholar]
  15. Yu MJ. 15.  2011. Natural product-like virtual libraries: recursive atom-based enumeration. J. Chem. Inform. Model. 51:541–57 [Google Scholar]
  16. Massarotti A, Brunco A, Sorba G, Tron GC. 16.  2014. ZINClick: a database of 16 million novel patentable, and readily synthesizable 1,4-disubstituted triazoles. J. Chem. Inform. Model. 54:396–406 [Google Scholar]
  17. Koutsoukas A, Paricharak S, Galloway WRJD, Spring DR, IJzerman AP. 17.  et al. 2014. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J. Chem. Inform. Model. 54:230–42 [Google Scholar]
  18. Roth HJ. 18.  2005. There is no such thing as ‘diversity’!. Curr. Opin. Chem. Biol. 9:293–95 [Google Scholar]
  19. Riniker S, Landrum GA. 19.  2013. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform. 5:26 [Google Scholar]
  20. Maggiora G, Vogt M, Stumpfe D, Bajorath J. 20.  2014. Molecular similarity in medicinal chemistry. J. Med. Chem. 57:3186–204 [Google Scholar]
  21. Gillet V, Johnson A, Mata P, Sike S, Williams P. 21.  1993. SPROUT: a program for structure generation. J. Comput. Aided Mol. Des. 7:127–53 [Google Scholar]
  22. Pearlman D, Murcko M. 22.  1996. CONCERTS: dynamic connection of fragments as an approach to de novo ligand design. J. Med. Chem. 39:1651–63 [Google Scholar]
  23. Schneider G, Lee M, Stahl M, Schneider P. 23.  2000. De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J. Comput. Aided Mol. Des. 14:487–94 [Google Scholar]
  24. Gillet V, Willett P, Fleming P, Green D. 24.  2002. Designing focused libraries using MoSELECT. J. Mol. Graph. Model. 20:491–98 [Google Scholar]
  25. Vinkers H, de JM, Daeyaert F, Heeres J, Koymans L. 25.  et al. 2003. SYNOPSIS: SYNthesize and OPtimize System in Silico. J. Med. Chem. 46:2765–73 [Google Scholar]
  26. Brown N, McKay B, Gasteiger J. 26.  2004. The de novo design of median molecules within a property range of interest. J. Comput. Aided Mol. Des. 18:761–71 [Google Scholar]
  27. Nicolaou C, Brown N, Pattichis C. 27.  2007. Molecular optimization using computational multi-objective methods. Curr. Opin. Drug Discov. Devel. 10:316–24 [Google Scholar]
  28. Liu Q, Masek B, Smith K, Smith J. 28.  2007. Tagged fragment method for evolutionary structure-based de novo lead generation and optimization. J. Med. Chem. 50:5392–402 [Google Scholar]
  29. Dey F, Caflisch A. 29.  2008. Fragment-based de novo ligand design by multiobjective evolutionary optimization. J. Chem. Inform. Model. 48:679–90 [Google Scholar]
  30. Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM. 30.  et al. 2012. Automated design of ligands to polypharmacological profiles. Nature 492:215–20 [Google Scholar]
  31. Osedach TP, Andrew TL, Bulović V. 31.  2013. Effect of synthetic accessibility on the commercial viability of organic photovoltaics. Energy Environ. Sci. 6:711–18 [Google Scholar]
  32. O'Boyle NM, Campbell CM, Hutchison GR. 32.  2011. Computational design and selection of optimal organic photovoltaic materials. J. Phys. Chem. C 115:16200–10 [Google Scholar]
  33. Kanal IY, Owens SG, Bechtel JS, Hutchison GR. 33.  2013. Efficient computational screening of organic polymer photovoltaics. J. Phys. Chem. Lett. 4:1613–23 [Google Scholar]
  34. Bertz SH. 34.  1981. The first general index of molecular complexity. J. Am. Chem. Soc. 103:3599–601 [Google Scholar]
  35. Boda K, Johnson A. 35.  2006. Molecular complexity analysis of de novo designed ligands. J. Med. Chem. 49:5869–79 [Google Scholar]
  36. Bonnet P. 36.  2012. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur. J. Med. Chem. 54:679–89 [Google Scholar]
  37. Podolyan Y, Walters MA, Karypis G. 37.  2010. Assessing synthetic accessibility of chemical compounds using machine learning methods. J. Chem. Inform. Model. 50:979–91 [Google Scholar]
  38. Warr WA. 38.  2014. A short review of chemical reaction database systems computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inf. 33:469–76 [Google Scholar]
  39. Hachmann J, Olivares-Amaya R, Atahan-Evrenk S, Amador-Bedolla C, Sánchez-Carrera RS. 39.  et al. 2011. The Harvard Clean Energy Project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2:2241–51 [Google Scholar]
  40. Olivares-Amaya R, Amador-Bedolla C, Hachmann J, Atahan-Evrenk S, Sánchez-Carrera RS. 40.  et al. 2011. Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics. Energy Environ. Sci. 4:4849–61 [Google Scholar]
  41. Huskinson B, Marshak MP, Suh C, Er S, Gerhardt MR. 41.  et al. 2014. A metal-free organic–inorganic aqueous flow battery. Nature 505:195–98 [Google Scholar]
  42. Er S, Suh C, Marshak MP, Aspuru-Guzik A. 41a.  2015. A computational design of molecules for an all-quinone redox flow battery. Chem. Sci. In press [Google Scholar]
  43. Goushi K, Yoshida K, Sato K, Adachi C. 42.  2012. Organic light-emitting diodes employing efficient reverse intersystem crossing for triplet-to-singlet state conversion. Nat. Photonics 6:253–58 [Google Scholar]
  44. Zhang Q, Li B, Huang S, Nomura H, Tanaka H, Adachi C. 43.  2014. Efficient blue organic light-emitting diodes employing thermally activated delayed fluorescence. Nat. Photonics 8:326–32 [Google Scholar]
  45. Korth M. 44.  2014. Large-scale virtual high-throughput screening for the identification of new battery electrolyte solvents: evaluation of electronic structure theory methods. Phys. Chem. Chem. Phys. 16:7919–26 [Google Scholar]
  46. Ong SP, Richards WD, Jain A, Hautier G, Kocher M. 45.  et al. 2013. Python Materials Genomics (pymatgen): a robust open-source python library for materials analysis. Comput. Mater. Sci. 68:314–19 [Google Scholar]
  47. Kresse G, Furthmüller J. 46.  1996. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6:15–50 [Google Scholar]
  48. Blöchl PE. 47.  1994. Projector augmented-wave method. Phys. Rev. B 50:17953–79 [Google Scholar]
  49. Perdew JP, Burke K, Ernzerhof M. 48.  1996. Generalized gradient approximation made simple. Phys. Rev. Lett. 77:3865–68 [Google Scholar]
  50. Anisimov VI, Zaanen J, Andersen OK. 49.  1991. Band theory and Mott insulators: Hubbard U instead of Stoner I. Phys. Rev. B 44:943–54 [Google Scholar]
  51. Jain A, Hautier G, Moore CJ, Ong SP, Fischer CC. 50.  et al. 2011. A high-throughput infrastructure for density functional theory calculations. Comput. Mater. Sci. 50:2295–310 [Google Scholar]
  52. Jain A, Ong SP, Hautier G, Chen W, Richards WD. 51.  et al. 2013. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1:011002 [Google Scholar]
  53. Hachmann J, Olivares-Amaya R, Jinich A, Appleton AL, Blood-Forsythe MA. 52.  et al. 2014. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry—the Harvard Clean Energy Project. Energy Environ. Sci. 7:698–704 [Google Scholar]
  54. Scharber MC, Mühlbacher D, Koppe M, Denk P, Waldauf C. 53.  et al. 2006. Design rules for donors in bulk-heterojunction solar cells—towards 10% energy-conversion efficiency. Adv. Mater. 18:789–94 [Google Scholar]
  55. Shockley W, Queisser HJ. 54.  1961. Detailed balance limit of efficiency of p-n junction solar cells. J. Appl. Phys. 32:510 [Google Scholar]
  56. Kolossváry I, Guida WC. 55.  1996. Low mode search. An efficient automated computational method for conformational analysis: application to cyclic and acyclic alkanes and cyclic peptides. J. Am. Chem. Soc. 118:5011–19 [Google Scholar]
  57. Sadowski J, Gasteiger J, Klebe G. 56.  1994. Comparison of automatic three-dimensional model builders using 639 X-ray structures. J. Chem. Inform. Model. 34:1000–8 [Google Scholar]
  58. Mayo SL, Olafson BD, Goddard WA. 57.  1990. DREIDING: a generic force field for molecular simulations. J. Phys. Chem. 94:8897–909 [Google Scholar]
  59. Parker CN, Shamu CE, Kraybill B, Austin CP, Bajorath J. 58.  2006. Measure, mine, model, and manipulate: the future for HTS and chemoinformatics?. Drug Discov. Today 11:863–65 [Google Scholar]
  60. Tamura SY, Bacha PA, Gruver HS, Nutt RF. 59.  2002. Data analysis of high-throughput screening results: application of multidomain clustering to the NCI anti-HIV data set. J. Med. Chem. 45:3082–93 [Google Scholar]
  61. Harper G, Pickett SD. 60.  2006. Methods for mining HTS data. Drug Discov. Today 11:694–99 [Google Scholar]
  62. Ling X. 61.  2008. High throughput screening informatics. Comb. Chem. High Throughput Screen. 11:249–57 [Google Scholar]
  63. Medina-Franco J, Martínez-Mayorga K, Giulianotti M, Houghten R, Pinilla C. 62.  2008. Visualization of the chemical space in drug discovery. Comput. Aided Drug Des. 4:322–33 [Google Scholar]
  64. Goktug AN, Chai SC, Chen T. 63.  2013. Drug discovery. Pharmacology and Therapeutics S Gowder, Chapter 7 Rijeka, Croatia: InTech [Google Scholar]
  65. García-Domenech R, Gálvez J, de Julián-Ortiz JV, Pogliani L. 64.  2008. Some new trends in chemical graph theory. Chem. Rev. 108:1127–69 [Google Scholar]
  66. Suh C, Sieg SC, Heying MJ, Oliver JH, Maier WF, Rajan K. 65.  2009. Visualization of high-dimensional combinatorial catalysis data. J. Comb. Chem. 11:385–92 [Google Scholar]
  67. Awale M, van Deursen R, Reymond J-L. 66.  2013. MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J. Chem. Inform. Model. 53:509–18 [Google Scholar]
  68. Klopmand G. 67.  1992. Concepts and applications of molecular similarity, by Mark A. Johnson and Gerald M. Maggiora. John Wiley & Sons, New York, 1990 393 Price: $65.00 J. Comput. Chem. 13539–40 [Google Scholar]
  69. Willett P, Barnard J, Downs G. 68.  1998. Chemical similarity searching. J. Chem. Inform. Model. 38:983–96 [Google Scholar]
  70. Chen X, Reynolds C. 69.  2002. Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J. Chem. Inform. Model. 42:1407–14 [Google Scholar]
  71. Godden JW, Bajorath J. 70.  2006. A distance function for retrieval of active molecules from complex chemical space representations. J. Chem. Inform. Model. 46:1094–97 [Google Scholar]
  72. Haranczyk M, Holliday J. 71.  2008. Comparison of similarity coefficients for clustering and compound selection. J. Chem. Inform. Model. 48:498–508 [Google Scholar]
  73. Coifman RR, Lafon S. 72.  2006. Diffusion maps. Appl. Comput. Harmon. Anal. 21:5–30 [Google Scholar]
  74. Platts J, Butina D, Abraham M, Hersey A. 73.  1999. Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inform. Model. 39:835–45 [Google Scholar]
  75. Liu ZK, Chen LQ, Rajan K. 74.  2006. Linking length scales via materials informatics. JOM 58:42–50 [Google Scholar]
  76. Balabin RM, Lomakina EI. 75.  2011. Support vector machine regression—an alternative to artificial neural networks for the analysis of quantum chemistry data?. Phys. Chem. Chem. Phys. 13:11710 [Google Scholar]
  77. Balabin RM, Lomakina EI. 76.  2009. Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. J. Chem. Phys. 131:074104 [Google Scholar]
  78. Pilania G, Wang C, Jiang X, Rajasekaran S, Ramprasad R. 77.  2013. Accelerating materials property predictions using machine learning. Sci. Rep. 3:2810 [Google Scholar]
  79. Rajan K, Suh C, Mendez PF. 78.  2009. Principal component analysis and dimensional analysis as materials informatics tools to reduce dimensionality in materials science and engineering. Stat. Anal. Data Min. 1:361–71 [Google Scholar]
  80. Dewar MJS, Trinajstic N. 79.  1969. Ground states of conjugated molecules—XIV. Tetrahedron 25:4529–34 [Google Scholar]
  81. Bajorath J. 80.  2001. Selected concepts and investigations in compound classification molecular descriptor analysis, and virtual screening. J. Chem. Inform. Model. 41:233–45 [Google Scholar]
  82. Searls DB. 81.  2005. Data integration: challenges for drug discovery. Nat. Rev. Drug Discov. 4:45–58 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error