1932

Abstract

Chemical engineering is being rapidly transformed by the tools of data science. On the horizon, artificial intelligence (AI) applications will impact a huge swath of our work, ranging from the discovery and design of new molecules to operations and manufacturing and many areas in between. Early adoption of data science, machine learning, and early examples of AI in chemical engineering has been rich with examples of molecular data science—the application tools for molecular discovery and property optimization at the atomic scale. We summarize key advances in this nascent subfield while introducing molecular data science for a broad chemical engineering readership. We introduce the field through the concept of a molecular data science life cycle and discuss relevant aspects of five distinct phases of this process: creation of curated data sets, molecular representations, data-driven property prediction, generation of new molecules, and feasibility and synthesizability considerations.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-chembioeng-101220-102232
2021-06-07
2024-04-18
Loading full text...

Full text loading...

/deliver/fulltext/chembioeng/12/1/annurev-chembioeng-101220-102232.html?itemId=/content/journals/10.1146/annurev-chembioeng-101220-102232&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Beck DAC, Carothers JM, Subramanian VR, Pfaendtner J. 2016. Data science: accelerating innovation and discovery in chemical engineering. AIChE J 62:51402–16
    [Google Scholar]
  2. 2. 
    Jain A, Ong SP, Hautier G, Chen W, Richards WD et al. 2013. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater 1:011002
    [Google Scholar]
  3. 3. 
    Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C. 2013. Materials design and discovery with high-throughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65:111501–9
    [Google Scholar]
  4. 4. 
    Curtarolo S, Setyawan W, Wang S, Xue J, Yang K et al. 2012. AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58:227–35
    [Google Scholar]
  5. 5. 
    Draxl C, Scheffler M. 2019. The NOMAD laboratory: from data sharing to artificial intelligence. J. Phys. Mater. 2:3036001
    [Google Scholar]
  6. 6. 
    Graves A, Schmidhuber J. 2005. Framewise phoneme classification with bidirectional LSTM networks. Neural Netw 18:5–6602–10
    [Google Scholar]
  7. 7. 
    Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z et al. 2019. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571:776395–98
    [Google Scholar]
  8. 8. 
    Kim E, Huang K, Tomala A, Matthews S, Strubell E et al. 2017. Machine-learned and codified synthesis parameters of oxide materials. Sci. Data 4:1170127
    [Google Scholar]
  9. 9. 
    Jensen Z, Kim E, Kwon S, Gani TZH, Román-Leshkov Y et al. 2019. A machine learning approach to zeolite synthesis enabled by automatic literature data extraction. ACS Cent. Sci. 5:5892–99
    [Google Scholar]
  10. 10. 
    Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E. 2017. Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29:219436–44
    [Google Scholar]
  11. 11. 
    Kim E, Huang K, Jegelka S, Olivetti E. 2017. Virtual screening of inorganic materials synthesis parameters with deep learning. npj Comput. Mater. 3:53
    [Google Scholar]
  12. 12. 
    Mysore S, Jensen Z, Kim E, Huang K, Chang H-S et al. 2019. The materials science procedural text corpus: annotating materials synthesis procedures with shallow semantic structures. arXiv: 1905.06939 [cs.CL]
  13. 13. 
    Spangler S, Myers JN, Stanoi I, Kato L, Lelescu A et al. 2014. Automated hypothesis generation based on mining scientific literature. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining1877–86 New York: Assoc. Comput. Mach.
    [Google Scholar]
  14. 14. 
    Bakkar N, Kovalik T, Lorenzini I, Spangler S, Lacoste A et al. 2018. Artificial intelligence in neurodegenerative disease research: use of IBM Watson to identify additional RNA-binding proteins altered in amyotrophic lateral sclerosis. Acta Neuropathol 135:2227–47
    [Google Scholar]
  15. 15. 
    Wu HC, Luk RWP, Wong KF, Kwok KL. 2008. Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. 26:31–37
    [Google Scholar]
  16. 16. 
    Mikolov T, Chen K, Corrado G, Dean J. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
  17. 17. 
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 3111–19 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  18. 18. 
    Swain MC, Cole JM. 2016. ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56:101894–904
    [Google Scholar]
  19. 19. 
    Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. 2012. Enumeration of 166 billion organic small molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 52:112864–75
    [Google Scholar]
  20. 20. 
    Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA 2014. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1:140022
    [Google Scholar]
  21. 21. 
    Klein C, Sallai J, Jones TJ, Iacovella CR, McCabe C, Cummings PT 2016. A hierarchical, component based approach to screening properties of soft matter. Foundations of Molecular Modeling and Simulation: Select Papers from FOMMS 2015 RQ Snurr, CS Adjiman, DA Kofke 79–92 Singapore: Springer
    [Google Scholar]
  22. 22. 
    Klein C, Summers AZ, Thompson MW, Gilmer JB, McCabe C et al. 2019. Formalizing atom-typing and the dissemination of force fields with foyer. Comput. Mater. Sci. 167:215–27
    [Google Scholar]
  23. 23. 
    Mobley DL, Bannan CC, Rizzi A, Bayly CI, Chodera JD et al. 2018. Escaping atom types in force fields using direct chemical perception. J. Chem. Theory Comput. 14:116076–92
    [Google Scholar]
  24. 24. 
    Jo S, Kim T, Iyer VG, Im W. 2008. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 29:111859–65
    [Google Scholar]
  25. 25. 
    Martínez L, Andrade R, Birgin EG, Martínez JM. 2009. PACKMOL: a package for building initial configurations for molecular dynamics simulations. J. Comput. Chem. 30:132157–64
    [Google Scholar]
  26. 26. 
    Adorf CS, Dodd PM, Ramasubramani V, Glotzer SC. 2018. Simple data and workflow management with the signac framework. Comput. Mater. Sci. 146:220–29
    [Google Scholar]
  27. 27. 
    Adorf CS, Ramasubramani V, Dice BD, Henry MM, Dodd PM, Glotzer SC. 2019. glotzerlab/signac Zenodo. https://doi.org/10.5281/zenodo.3603501
    [Crossref]
  28. 28. 
    Humbert MT, Zhang Y, Maginn EJ. 2019. PyLAT: Python LAMMPS Analysis Tools. J. Chem. Inf. Model. 59:41301–5
    [Google Scholar]
  29. 29. 
    Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O. 2011. MDAnalysis: a toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 32:102319–27
    [Google Scholar]
  30. 30. 
    Mol. Sci. Softw. Inst 2020. What is SEAMM?. https://molssi-seamm.github.io/
  31. 31. 
    Hachmann J, Afzal MAF, Haghighatlari M, Pal Y. 2018. Building and deploying a cyberinfrastructure for the data-driven design of chemical systems and the exploration of chemical space. Mol. Simul. 44:11921–29
    [Google Scholar]
  32. 32. 
    Behler J, Parrinello M. 2007. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98:14146401
    [Google Scholar]
  33. 33. 
    Gao X, Ramezanghorbani F, Isayev O, Smith JS, Roitberg AE. 2020. TorchANI: a free and open source PyTorch-based deep learning implementation of the ANI neural network potentials. J. Chem. Inf. Model. 60:73408–15
    [Google Scholar]
  34. 34. 
    Furka A. 2002. Combinatorial chemistry: 20 years on. Drug Discov. Today 7:1–4
    [Google Scholar]
  35. 35. 
    Feher M, Schmidt JM. 2003. Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 43:1218–27
    [Google Scholar]
  36. 36. 
    Balkenhohl F, von dem Bussche-Hünnefeld C, Lansky A, Zechel C 1996. Combinatorial synthesis of small organic molecules. Angew. Chem. Int. Ed. Engl. 35:202288–337
    [Google Scholar]
  37. 37. 
    Smith GP. 1985. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228:47051315–17
    [Google Scholar]
  38. 38. 
    Sidhu SS. 2000. Phage display in pharmaceutical biotechnology. Curr. Opin. Biotechnol. 11:6610–16
    [Google Scholar]
  39. 39. 
    Häse F, Roch LM, Aspuru-Guzik A. 2019. Next-generation experimentation with self-driving laboratories. Trends Chem 1:3282–91
    [Google Scholar]
  40. 40. 
    Häse F, Roch LM, Aspuru-Guzik A. 2018. Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories. Chem. Sci. 9:397642–55
    [Google Scholar]
  41. 41. 
    Gromski PS, Granda JM, Cronin L. 2020. Universal chemical synthesis and discovery with “The Chemputer.”. Trends Chem 2:14–12
    [Google Scholar]
  42. 42. 
    Bai Y, Wilbraham L, Slater BJ, Zwijnenburg MA, Sprick RS, Cooper AI. 2019. Accelerated discovery of organic polymer photocatalysts for hydrogen evolution from water through the integration of experiment and theory. J. Am. Chem. Soc. 141:229063–71
    [Google Scholar]
  43. 43. 
    Roch LM, Häse F, Kreisbeck C, Tamayo-Mendoza T, Yunker LPE et al. 2020. ChemOS: an orchestration software to democratize autonomous discovery. PLOS ONE 15:4e0229862
    [Google Scholar]
  44. 44. 
    Brown E. 2016. Disability awareness: the fight for accessibility. Nature 532:7597137–39
    [Google Scholar]
  45. 45. 
    Soong R, Agmata K, Doyle T, Jenne A, Adamo T, Simpson A. 2018. Combining the maker movement with accessibility needs in an undergraduate laboratory: a cost-effective text-to-speech multipurpose, universal chemistry sensor hub (MUCSH) for students with disabilities. J. Chem. Educ. 95:122268–72
    [Google Scholar]
  46. 46. 
    Soong R, Agmata K, Doyle T, Jenne A, Adamo A, Simpson AJ. 2019. Rethinking a timeless titration experimental setup through automation and open-source robotic technology: making titration accessible for students of all abilities. J. Chem. Educ. 96:71497–501
    [Google Scholar]
  47. 47. 
    Tran O'Leary J, Peek N. 2019. Machine-o-Matic: a programming environment for prototyping digital fabrication workflows. The Adjunct Publication of the 32nd Annual ACM Symposium on User Interface Software and Technology F Guimbretiére 134–36 New York: Assoc. Comput. Mach https://doi.org/10.1145/3332167.3356897
    [Crossref] [Google Scholar]
  48. 48. 
    Peek N, Neil G. 2018. Mods: browser-based rapid prototyping workflow composition. Recalibration on Imprecision and Infidelity: Proceedings of the 38th Annual Conference of the Association for Computer Aided Design in Architecture P Anzalone, M Del Signore, AJ Wit 66–71 Fargo, ND: Acadia Publ.
    [Google Scholar]
  49. 49. 
    Pendleton IM, Cattabriga G, Li Z, Najeeb MA, Friedler SA et al. 2019. Experiment specification, capture and laboratory automation technology (ESCALATE): a software pipeline for automated chemical experimentation and data management. MRS Commun 9:3846–59
    [Google Scholar]
  50. 50. 
    Li Z, Najeeb MA, Alves L, Sherman A, Parrilla PC et al. 2019. Robot-Accelerated Perovskite Investigation and Discovery (RAPID): 1. Inverse temperature crystallization. ChemRxiv. https://doi.org/10.26434/chemrxiv.10013090.v1
    [Crossref]
  51. 51. 
    Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B et al. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4:2268–76
    [Google Scholar]
  52. 52. 
    Gupta A, Müller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G. 2018. Generative recurrent networks for de novo drug design. Mol. Inform. 37:1–21700111
    [Google Scholar]
  53. 53. 
    Segler MHS, Kogej T, Tyrchan C, Waller MP. 2018. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4:1120–31
    [Google Scholar]
  54. 54. 
    Estrada E, Guevara N, Gutman I, Rodriguez L. 1998. Molecular connectivity indices of iterated line graphs. A new source of descriptors for QSPR and QSAR studies. SAR QSAR Environ. Res. 9:3–4229–40
    [Google Scholar]
  55. 55. 
    Estrada E. 1996. Spectral moments of the edge adjacency matrix in molecular graphs. 1. Definition and applications to the prediction of physical properties of alkanes. J. Chem. Inf. Comput. Sci. 36:4844–49
    [Google Scholar]
  56. 56. 
    Liu S, Cao C, Li Z. 1998. Approach to estimation and prediction for normal boiling point (NBP) of alkanes based on a novel molecular distance-edge (MDE) vector, λ. J. Chem. Inf. Comput. Sci. 38:3387–94
    [Google Scholar]
  57. 57. 
    Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. 2017. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57:81757–72
    [Google Scholar]
  58. 58. 
    Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T et al. 2015. Convolutional networks on graphs for learning molecular fingerprints. Advances in Neural Information Processing Systems 28 C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett 2224–32 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  59. 59. 
    Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. 2016. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30:8595–608
    [Google Scholar]
  60. 60. 
    Elton DC, Boukouvalas Z, Fuge MD, Chung PW. 2019. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Des. Eng. 4:4828–49
    [Google Scholar]
  61. 61. 
    O'Boyle NM. 2012. Towards a universal SMILES representation: a standard method to generate canonical SMILES based on the InChI. J. Cheminformat. 4:22
    [Google Scholar]
  62. 62. 
    Weininger D. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 28:131–36
    [Google Scholar]
  63. 63. 
    Koichi S, Iwata S, Uno T, Koshino H, Satoh H. 2007. Algorithm for advanced canonical coding of planar chemical structures that considers stereochemical and symmetric information. J. Chem. Inf. Model. 47:51734–46
    [Google Scholar]
  64. 64. 
    Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN 2013. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J. Am. Chem. Soc. 135:197296–303
    [Google Scholar]
  65. 65. 
    Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J et al. 2017. The ChEMBL database in 2017. Nucleic Acids Res 45:D945–54
    [Google Scholar]
  66. 66. 
    Beckner W, Ashraf CM, Lee J, Beck DAC, Pfaendtner J. 2020. Continuous molecular representations of ionic liquids. J. Phys. Chem. B 124:388347–57
    [Google Scholar]
  67. 67. 
    Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A. 2020. Self-Referencing Embedded Strings (SELFIES): a 100% robust molecular string representation. arXiv:1905.13741 [cs.LG]
  68. 68. 
    Xue L, Godden J, Gao H, Bajorath J. 1999. Identification of a preferred set of molecular descriptors for compound classification based on principal component analysis. J. Chem. Inf. Comput. Sci. 39:4699–704
    [Google Scholar]
  69. 69. 
    McGregor MJ, Muskal SM. 1999. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 39:3569–74
    [Google Scholar]
  70. 70. 
    Durant JL, Leland BA, Henry DR, Nourse JG. 2002. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42:61273–80
    [Google Scholar]
  71. 71. 
    Rogers D, Hahn M. 2010. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50:5742–54
    [Google Scholar]
  72. 72. 
    Carhart RE, Smith DH, Venkataraghavan R. 1985. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25:264–73
    [Google Scholar]
  73. 73. 
    Nilakantan R, Bauman N, Dixon JS, Venkataraghavan R. 1987. Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 27:282–85
    [Google Scholar]
  74. 74. 
    Hall LH, Kier LB. 1995. Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 35:61039–45
    [Google Scholar]
  75. 75. 
    Gedeck P, Rohde B, Bartels C. 2006. QSAR—how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J. Chem. Inf. Model. 46:51924–36
    [Google Scholar]
  76. 76. 
    Stiefl N, Watson IA, Baumann K, Zaliani A. 2006. ErG: 2D pharmacophore descriptions for scaffold hopping. J. Chem. Inf. Model. 46:1208–20
    [Google Scholar]
  77. 77. 
    Landrum G, Tosco P, Kelley B, Sriniker, Gedeck et al. 2020. rdkit/rdkit: 2020_03_1 (Q1 2020) Release Rec., Zenodo. https://zenodo.org/record/3732262#.X672_shKhjV
  78. 78. 
    Cao D-S, Xu Q-S, Hu Q-N, Liang Y-Z. 2013. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 29:81092–94
    [Google Scholar]
  79. 79. 
    Axen SD, Huang X-P, Cáceres EL, Gendelev L, Roth BL, Keiser MJ. 2017. A simple representation of three-dimensional molecular structure. J. Med. Chem. 60:177393–409
    [Google Scholar]
  80. 80. 
    Artese A, Cross S, Costa G, Distinto S, Parrotta L et al. 2013. Molecular interaction fields in drug discovery: recent advances and future perspectives. WIREs Comput. Mol. Sci. 3:6594–613
    [Google Scholar]
  81. 81. 
    Kuhn C, Beratan DN. 1996. Inverse strategies for molecular design. J. Phys. Chem. 100:2510595–99
    [Google Scholar]
  82. 82. 
    Kingma DP, Welling M. 2014. Auto-encoding variational Bayes. arXiv:1312.6114 [cs.Stat]
  83. 83. 
    Mills EJ. 1884. On melting-point and boiling-point as related to chemical composition. Lond. Edinb. Dublin Philos. Mag. J. Sci. 17:105173–87
    [Google Scholar]
  84. 84. 
    Leo A, Hansch C, Church C. 1969. Comparison of parameters currently used in the study of structure-activity relationships. J. Med. Chem. 12:5766–71
    [Google Scholar]
  85. 85. 
    Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II et al. 2014. QSAR modeling: Where have you been? Where are you going to?. J. Med. Chem. 57:124977–5010
    [Google Scholar]
  86. 86. 
    Hansch C, Fujita T. 1964. p-σ-π analysis. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 86:81616–26
    [Google Scholar]
  87. 87. 
    Wells PR. 1963. Linear free energy relationships. Chem. Rev. 63:2171–219
    [Google Scholar]
  88. 88. 
    Roy K, Das RN. 2014. A review on principles, theory and practices of 2D-QSAR. Curr. Drug Metab. 15:4346–79
    [Google Scholar]
  89. 89. 
    Rasulev B 2016. Recent developments in 3D QSAR and molecular docking studies of organic and nanostructures. Handbook of Computational Chemistry J Leszczynski, A Kaczmarek-Kedziera, T Puzyn, MG Papadopoulos, H Reis, MK Shukla 2133–61 Cham, Switz: Springer Int.
    [Google Scholar]
  90. 90. 
    Hornik K, Stinchcombe M, White H. 1989. Multilayer feedforward networks are universal approximators. Neural Netw 2:5359–66
    [Google Scholar]
  91. 91. 
    Danielson ML, Hu B, Shen J, Desai PV 2017. In silico ADME techniques used in early-phase drug discovery. Translating Molecules into Medicines: Cross-Functional Integration at the Drug Discovery-Development Interface SN Bhattachar, JS Morrison, DR Mudra, DM Bender 81–117 Cham, Switz: Springer Int.
    [Google Scholar]
  92. 92. 
    Lake BM, Salakhutdinov R, Tenenbaum JB. 2015. Human-level concept learning through probabilistic program induction. Science 350:62661332–38
    [Google Scholar]
  93. 93. 
    Montavon G, Rupp M, Gobre V, Vazquez-Mayagoitia A, Hansen K et al. 2013. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 15:9095003
    [Google Scholar]
  94. 94. 
    Ulissi ZW, Tang MT, Xiao J, Liu X, Torelli DA et al. 2017. Machine-learning methods enable exhaustive searches for active bimetallic facets and reveal active site motifs for CO2 reduction. ACS Catal 7:106600–8
    [Google Scholar]
  95. 95. 
    Häse F, Valleau S, Pyzer-Knapp E, Aspuru-Guzik A. 2016. Machine learning exciton dynamics. Chem. Sci. 7:85139–47
    [Google Scholar]
  96. 96. 
    Beckner W, Mao CM, Pfaendtner J. 2018. Statistical models are able to predict ionic liquid viscosity across a wide range of chemical functionalities and experimental conditions. Mol. Syst. Des. Eng. 3:253–63
    [Google Scholar]
  97. 97. 
    Zubatyuk R, Smith JS, Leszczynski J, Isayev O. 2019. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 5:8eaav6490
    [Google Scholar]
  98. 98. 
    Yamada H, Liu C, Wu S, Koyama Y, Ju S et al. 2019. Predicting materials properties with little data using shotgun transfer learning. ACS Cent. Sci. 5:101717–30
    [Google Scholar]
  99. 99. 
    Sharma AK, Srivastava GN, Roy A, Sharma VK. 2017. ToxiM: a toxicity prediction tool for small molecules developed using machine learning and chemoinformatics approaches. Front. Pharmacol. 8:880
    [Google Scholar]
  100. 100. 
    Raissi M, Perdikaris P, Karniadakis GE. 2019. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378:686–707
    [Google Scholar]
  101. 101. 
    Liu H, Fu Z, Yang K, Xu X, Bauchy M. 2019. Machine learning for glass science and engineering: a review. J. Non-Cryst. Solids X 4:100036
    [Google Scholar]
  102. 102. 
    Moon S, Zhung W, Yang S, Lim J, Kim WY. 2020. PIGNet: a physics-informed deep learning model toward generalized drug-target interaction predictions. arXiv:2008.12249 [cs.Q-Bio]
  103. 103. 
    Pun GPP, Batra R, Ramprasad R, Mishin Y. 2019. Physically informed artificial neural networks for atomistic modeling of materials. Nat. Commun. 10:2339
    [Google Scholar]
  104. 104. 
    Reymond J-L. 2015. The chemical space project. Acc. Chem. Res. 48:3722–30
    [Google Scholar]
  105. 105. 
    Ferguson AL. 2017. Machine learning and data science in soft materials engineering. J. Phys. Condens. Matter 30:4043002
    [Google Scholar]
  106. 106. 
    Holland JH. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence Cambridge, MA: MIT Press, 1st ed..
  107. 107. 
    Blommers MJJ, Lucasius CB, Kateman G, Kaptein R. 1992. Conformational analysis of a dinucleotide photodimer with the aid of the genetic algorithm. Biopolymers 32:145–52
    [Google Scholar]
  108. 108. 
    Froemming NS, Henkelman G. 2009. Optimizing core-shell nanoparticle catalysts with a genetic algorithm. J. Chem. Phys. 131:23234103
    [Google Scholar]
  109. 109. 
    Douguet D, Thoreau E, Grassy G. 2000. A genetic algorithm for the automated generation of small organic molecules: drug design using an evolutionary algorithm. J. Comput. Aided Mol. Des. 14:5449–66
    [Google Scholar]
  110. 110. 
    Silva CM, Biscaia EC. 2003. Genetic algorithm development for multi-objective optimization of batch free-radical polymerization reactors. Comput. Chem. Eng. 27:81329–44
    [Google Scholar]
  111. 111. 
    Anijdan SHM, Bahrami A, Hosseini HRM, Shafyei A. 2006. Using genetic algorithm and artificial neural network analyses to design an Al-Si casting alloy of minimum porosity. Mater. Des. 27:7605–9
    [Google Scholar]
  112. 112. 
    d'Avezac M, Luo J-W, Chanier T, Zunger A. 2012. Genetic-algorithm discovery of a direct-gap and optically allowed superstructure from indirect-gap Si and Ge semiconductors. Phys. Rev. Lett. 108:2027401
    [Google Scholar]
  113. 113. 
    Kanal IY, Hutchison GR. 2017. Rapid computational optimization of molecular properties using genetic algorithms: searching across millions of compounds for organic photovoltaic materials. arXiv:1707.02949 [physics.ap-ph]
  114. 114. 
    Jensen JH. 2019. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10:123567–72
    [Google Scholar]
  115. 115. 
    Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. arXiv. 1406.2661 [stat.ML]
  116. 116. 
    Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A. 2017. Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC). ChemRxiv. https://doi.org/10.26434/chemrxiv.5309668.v2
    [Crossref]
  117. 117. 
    Sherstinsky A. 2020. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica 404:132306
    [Google Scholar]
  118. 118. 
    Kim Y. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882 [cs.CL]
  119. 119. 
    Kaelbling LP, Littman ML, Moore AW. 1996. Reinforcement learning: a survey. J. Artif. Intell. Res. 4:237–85
    [Google Scholar]
  120. 120. 
    Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I et al. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602 [cs.LG]
  121. 121. 
    Silver D, Huang A, Maddison C, Guez A, Sifrev L et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529:484
    [Google Scholar]
  122. 122. 
    Putin E, Asadulaev A, Vanhaelen Q, Ivanenkov Y, Aladinskaya AV et al. 2018. Adversarial threshold neural computer for molecular de novo design. Mol. Pharm. 15:104386–97
    [Google Scholar]
  123. 123. 
    Dan Y, Zhao Y, Li X, Li S, Hu M, Hu J. 2020. Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials. npj Comput. Mater. 6:84
    [Google Scholar]
  124. 124. 
    Metz L, Poole B, Pfau D, Sohl-Dickstein J. 2017. Unrolled generative adversarial networks. arXiv:1611.02163 [cs.Stat]
  125. 125. 
    Arjovsky M, Chintala S, Bottou L. 2017. Wasserstein GAN. arXiv:1701.07875 [stat.ML]
  126. 126. 
    Simonovsky M, Komodakis N. 2018. GraphVAE: towards generation of small graphs using variational autoencoders. arXiv:1802.03480 [cs.LG]
  127. 127. 
    Jin W, Barzilay R, Jaakkola T. 2018. Junction tree variational autoencoder for molecular graph generation. arXiv:1802.04364 [cs.LG]
  128. 128. 
    Greener JG, Moffat L, Jones DT. 2018. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8:16189
    [Google Scholar]
  129. 129. 
    Noh J, Kim J, Stein HS, Sanchez-Lengeling B, Gregoire JM et al. 2019. Inverse design of solid-state materials via a continuous representation. Matter 1:51370–84
    [Google Scholar]
  130. 130. 
    Yao Z, Sanchez-Lengeling B, Bobbitt NS, Bucior BJ, Kumar SGH et al. 2020. Inverse design of nanoporous crystalline reticular materials with deep generative models. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12186681.v1
    [Crossref]
  131. 131. 
    Voršilák M, Kolář M, Čmelo I, Svozil D. 2020. SYBA: Bayesian estimation of synthetic accessibility of organic compounds. J. Cheminform. 12:35
    [Google Scholar]
  132. 132. 
    Ertl P, Schuffenhauer A. 2009. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1:8
    [Google Scholar]
  133. 133. 
    Huang Q, Li L-L, Yang S-Y. 2011. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J. Chem. Inf. Model. 51:102768–77
    [Google Scholar]
  134. 134. 
    Gillet VJ, Myatt G, Zsoldos Z, Johnson AP. 1995. SPROUT, HIPPO and CAESA: tools for de novo structure generation and estimation of synthetic accessibility. Perspect. Drug Discov. Des. 3:134–50
    [Google Scholar]
  135. 135. 
    Coley CW, Rogers L, Green WH, Jensen KF. 2018. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58:2252–61
    [Google Scholar]
  136. 136. 
    Lawson AJ, Swienty-Busch J, Géoui T, Evans D 2014. The making of Reaxys—towards unobstructed access to relevant chemistry information. The Future of the History of Chemical Information LR McEwen, RE Buntrock 127–48 Washington, DC: Am. Chem. Soc.
    [Google Scholar]
  137. 137. 
    Popova M, Isayev O, Tropsha A. 2018. Deep reinforcement learning for de novo drug design. Sci. Adv. 4:7eaap7885
    [Google Scholar]
  138. 138. 
    Gao W, Coley CW. 2020. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60:125714–23
    [Google Scholar]
  139. 139. 
    Bradshaw J, Paige B, Kusner MJ, Segler MHS, Hernández-Lobato JM. 2019. A model to search for synthesizable molecules. arXiv:1906.05221 [Phys. Stat]
  140. 140. 
    Feng F, Lai L, Pei J 2018. Computational chemical synthesis analysis and pathway design. Front. Chem. 6:199
    [Google Scholar]
  141. 141. 
    Cadeddu A, Wylie EK, Jurczak J, Wampler-Doty M, Grzybowski BA. 2014. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew. Chem. Int. Ed. 53:318108–12
    [Google Scholar]
  142. 142. 
    Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA et al. 2019. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5:91572–83
    [Google Scholar]
  143. 143. 
    Schwaller P, Petraglia R, Zullo V, Nair VH, Haeuselmann RA et al. 2020. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11:123316–25
    [Google Scholar]
  144. 144. 
    Beck D, Pfaendtner J, Curtis C, Prakash A, Wolf C, Montoni N. 2020. UWDIRECT/UWDIRECT.github.io v2020a Rec., Zenodo. https://zenodo.org/record/3572827#.X673VchKhjU
/content/journals/10.1146/annurev-chembioeng-101220-102232
Loading
/content/journals/10.1146/annurev-chembioeng-101220-102232
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error