Machine Learning in Chemoinformatics and Medicinal Chemistry

Raquel Rodríguez-Pérez; Filip Miljković; Jürgen Bajorath

doi:10.1146/annurev-biodatasci-122120-124216

Annual Review of Biomedical Data Science

Volume 5, 2022

Review Article

Free

Machine Learning in Chemoinformatics and Medicinal Chemistry

Raquel Rodríguez-Pérez^1,2, Filip Miljković^1,3, and Jürgen Bajorath¹
View Affiliations Hide Affiliations

Affiliations: ¹Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; email: [email protected] ²Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland ³Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
Vol. 5:43-65 (Volume publication date August 2022) https://doi.org/10.1146/annurev-biodatasci-122120-124216
First published as a Review in Advance on April 19, 2022
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains.

Keyword(s): chemoinformatics, data structures, deep learning, learning strategies, machine learning, medicinal chemistry

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122120-124216

2022-08-10

2024-04-19

Full text loading...

/deliver/fulltext/biodatasci/5/1/annurev-biodatasci-122120-124216.html?itemId=/content/journals/10.1146/annurev-biodatasci-122120-124216&mimeType=html&fmt=ahah

Literature Cited

1.
Lavecchia A. 2015. Machine-learning approaches in drug discovery: methods and applications. Drug. Discov. Today 20:318–31
[Google Scholar]
2.
Ekins S, Puhl AC, Zorn KM, Lane TR, Russo DP et al. 2019. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18:435–41
[Google Scholar]
3.
Gawehn E, Hiss JA, Schneider G. 2016. Deep learning in drug discovery. Mol. Inform. 35:3–14
[Google Scholar]
4.
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. 2018. The rise of deep learning in drug discovery. Drug Discov. Today 23:1241–50
[Google Scholar]
5.
Varnek A, Baskin I. 2012. Machine learning methods for property prediction in cheminformatics: Quo vadis?. J. Chem. Inf. Model. 52:1413–37
[Google Scholar]
6.
Mitchell JBO. 2014. Machine learning methods in chemoinformatics. WIREs Comput. Mol. Sci. 4:468–81
[Google Scholar]
7.
Weaver DC. 2014. Applying data mining techniques to library design, lead generation and lead optimization. Curr. Opin. Chem. Biol. 8:264–70
[Google Scholar]
8.
Livingstone DJ. 2000. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40:195–209
[Google Scholar]
9.
Grisoni F, Consonni V, Todeschini R. 2018. Impact of molecular descriptors on computational models. Methods Mol. Biol. 1825:171–209
[Google Scholar]
10.
Hansch C, Maloney P, Fujita T, Muir R. 1962. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194:178–80
[Google Scholar]
11.
Egan WJ, Merz KM, Baldwin JJ. 2000. Prediction of drug absorption using multivariate statistics. J. Med. Chem. 43:3867–77
[Google Scholar]
12.
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II et al. 2014. QSAR modeling: Where have you been? Where are you going to?. J. Med. Chem. 57:4977–5010
[Google Scholar]
13.
Svetnik V, Liaw A, Tong C, Culberson C, Sheridan RP, Feuston BP. 2003. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43:1947–58
[Google Scholar]
14.
Svetnik V, Wang T, Tong C, Liaw A, Sheridan RP, Song Q. 2005. Boosting: an ensemble learning tool for compound classification and QSAR modeling. J. Chem. Inf. Model. 45:786–99
[Google Scholar]
15.
Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM. 2016. Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem. Inf. Model. 56:2353–60
[Google Scholar]
16.
Burbidge R, Trotter M, Buxton B, Holden S. 2001. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput. Chem. 26:5–14
[Google Scholar]
17.
Koutsoukas A, Lowe R, Kalantarmotamedi Y, Mussa HY, Klaffke W et al. 2013. In silico target predictions: defining a benchmarking dataset and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt Window. J. Chem. Inf. Model. 53:1957–66
[Google Scholar]
18.
Obrezanova O, Segall MD. 2010. Gaussian processes for classification: QSAR modeling of ADMET and target activity. J. Chem. Inf. Model. 50:1053–61
[Google Scholar]
19.
Hiller SA, Golender VE, Rosenblit AB, Rastrigin LA, Glaz AB. 1973. Cybernetic methods of drug design. I. Statement of the problem—the perceptron approach. Comput. Biomed. Res. 6:411–21
[Google Scholar]
20.
Aoyama T, Suzuki Y, Ichikawa H. 1990. Neural networks applied to structure-activity relationships. J. Med. Chem. 33:905–8
[Google Scholar]
21.
Baskin II, Winkler D, Tetko IV. 2016. A renaissance of neural networks in drug discovery. Expert Opin. Drug Discov. 11:785–95
[Google Scholar]
22.
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. 2015. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55:263–74
[Google Scholar]
23.
Stone M. 1974. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. 36:111–47
[Google Scholar]
24.
Sheridan RP. 2013. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 53:783–90
[Google Scholar]
25.
Geppert H, Vogt M, Bajorath J. 2010. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J. Chem. Inf. Model. 50:205–16
[Google Scholar]
26.
Rohrer SG, Baumann K. 2009. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J. Chem. Inf. Model. 49:169–84
[Google Scholar]
27.
Ripphausen P, Wassermann AM, Bajorath J. 2011. REPROVIS-DB: a benchmark system for ligand-based virtual screening derived from reproducible prospective applications. J. Chem. Inf. Model. 51:2467–73
[Google Scholar]
28.
Koutsoukas A, Monaghan KJ, Li X, Huan J. 2017. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminform. 9e42
29.
Martin EJ, Polyakov VR, Perez RC. 2017. Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds. J. Chem. Inf. Model. 57:2077–88
[Google Scholar]
30.
Whitehead TM, Irwin BWJ, Hunt P, Segall MD, Conduit GJ. 2019. Imputation of assay bioactivity data using deep learning. J. Chem. Inf. Model. 59:1197–204
[Google Scholar]
31.
Xu Y, Ma J, Liaw A, Sheridan RP. 2017. Demystifying multitask deep neural networks for quantitative structure-activity relationships. J. Chem. Inf. Model. 57:2490–504
[Google Scholar]
32.
Rodríguez-Pérez R, Bajorath J 2019. Multitask machine learning for classifying highly and weakly potent kinase inhibitors. ACS Omega 3:4367–75
[Google Scholar]
33.
Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK et al. 2018. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9:e5441
[Google Scholar]
34.
Lenselink EB, Dijke N, Bongers B, Papadatos G, van Viljmen HWT et al. 2017. Beyond the hype: Deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9:e45
[Google Scholar]
35.
Rodríguez-Pérez R, Bajorath J 2018. Prediction of compound profiling matrices, part II: relative performance of multitask deep learning and random forest classification on the basis of varying amounts of training data. ACS Omega 3:12033–40
[Google Scholar]
36.
Rodríguez-Pérez R, Bajorath J 2021. Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging conditions. J. Comput. Aided Mol. Des. 35:285–95
[Google Scholar]
37.
Cortés-Ciriano I, Bender A 2019. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J. Cheminform. 11:e41
[Google Scholar]
38.
Fernandez M, Ban F, Woo G, Hsing M, Yamazaki T et al. 2018. Toxic Colors: the use of deep learning for predicting toxicity of compounds merely from their graphic images. J. Chem. Inf. Model. 58:1533–43
[Google Scholar]
39.
Iqbal J, Vogt M, Bajorath J. 2021. Prediction of activity cliffs on the basis of images using convolutional neural networks. J. Comput. Aided Mol. Des. 35:1157–64
[Google Scholar]
40.
Iqbal J, Vogt M, Bajorath J. 2020. Activity landscape image analysis using convolutional neural networks. J. Cheminform. 12:e34
[Google Scholar]
41.
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T et al. 2015. Convolutional neural networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Sys. 28:2224–32
[Google Scholar]
42.
Withnall M, Lindelöf E, Engkvist O, Chen H. 2020. Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J. Cheminform. 12:e1
[Google Scholar]
43.
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. 2017. Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 1263–72 New York: ACM
[Google Scholar]
44.
Li Y, Zemel R, Brockschmidt M, Tarlow D. 2016. Gated graph sequence neural networks Paper presented at ICLR (International Conference on Learning Representations) 2016 San Juan: P.R., May 2–4
45.
Battaglia PW, Pascanu R, Lai M, Rezende D, Kavukcuoglu K. 2016. Interaction networks for learning about objects, relations and physics. Adv. Neural Inf. Process. Syst. 29:4502–10
[Google Scholar]
46.
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. 2016. Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 30:595–608
[Google Scholar]
47.
Schütt KT, Arbabzadah F, Chmiela S, Müller KR, Tkatchenko A. 2017. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8:e13890
[Google Scholar]
48.
Defferard M, Bresson X, Vandergheynst P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 29:3837–45
[Google Scholar]
49.
Feinberg EN, Sur D, Wu Z, Husic BE, Mai H et al. 2018. PotentialNet for molecular property prediction. ACS Cent. Sci. 4:1520–30
[Google Scholar]
50.
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C et al. 2018. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9:e513
[Google Scholar]
51.
Rodríguez-Pérez R, Miyao T, Jasial S, Vogt M, Bajorath J. 2018. Prediction of compound profiling matrices. ACS Omega 3:4713–23
[Google Scholar]
52.
Jiang D, Wu Z, Hsieh C-Y, Chen G, Liao B et al. 2021. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J. Cheminform. 13:e12
[Google Scholar]
53.
Yang K, Swanson K, Jin W, Coley C, Eiden P et al. 2019. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59:3370–88
[Google Scholar]
54.
Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F 2021. XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. J. Chem. Inf. Model. 61:2697–705
[Google Scholar]
55.
Böhm HJ. 1992. The computer program LUDI: a new method for the de novo design of enzyme inhibitors. . J. Comput. Aided. Mol. Des. 6:61–78
[Google Scholar]
56.
Gillet VJ, Newell W, Mata P, Myatt G, Sike S et al. 1994. SPROUT: recent developments in the de novo design of molecules. J. Chem. Inf. Comput. Sci. 34:207–17
[Google Scholar]
57.
Hartenfeller M, Zettl H, Walter M, Rupp M, Reisen F et al. 2012. DOGS: reaction-driven de novo design of bioactive compounds. PLOS Comput. Biol. 8:e1002380
[Google Scholar]
58.
Ruddigkeit L, Blum LC, Reymond J. 2013. Visualization and virtual screening of the chemical universe database GDB-17. J. Chem. Inf. Model. 53:56–65
[Google Scholar]
59.
Miyao T, Kaneko H, Funatsu K. 2016. Inverse QSPR/QSAR analysis for chemical structure generation (from y to x). J. Chem. Inf. Model. 56:286–99
[Google Scholar]
60.
Segler MHS, Kogej T, Tyrchan C, Wallter MP. 2018. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4:120–31
[Google Scholar]
61.
Olivecrona M, Blaschke T, Engkvist O, Chen H. 2017. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9:e48
[Google Scholar]
62.
Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A. 2017. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14:3098–104
[Google Scholar]
63.
Yuan W, Jiang D, Nambiar DK, Liew LP, Hay MP et al. 2017. Chemical space mimicry for drug discovery. J. Chem. Inf. Model. 57:875–82
[Google Scholar]
64.
Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B et al. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4:268–76
[Google Scholar]
65.
Popova M, Isayev O, Tropsha A. 2018. Deep reinforcement learning for de novo drug design. Sci. Adv. 4:eaap7885
[Google Scholar]
66.
Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen J 2018. Application of generative autoencoder in de novo molecular design. Mol. Inform. 37:e1700123
[Google Scholar]
67.
Winter R, Montanari F, Steffen A, Briem H, Noé F, Clevert D-A. 2019. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 10:e8016
[Google Scholar]
68.
Skalic M, Jiménez J, Sabbadin D, De Fabritis G. 2019. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model. 59:1205–14
[Google Scholar]
69.
Blaschke T, Arús-Pous J, Chen H, Margreitter C, Tyrchan C et al. 2020. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60:5918–22
[Google Scholar]
70.
Blaschke T, Engkvist O, Bajorath J, Chen H. 2020. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminform. 12:e68
[Google Scholar]
71.
Méndez-Lucio O, Baillif B, Clevert D-A, Rouquié D, Wichard J. 2020. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 11:e10
[Google Scholar]
72.
Grisoni F, Moret M, Lingwood R, Schneider G. 2020. Bidirectional molecule generation with recurrent neural networks. J. Chem. Inf. Model. 60:1175–83
[Google Scholar]
73.
Arús-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond J-L et al. 2020. SMILES-based deep generative scaffold decorator for de-novo drug design. J. Cheminform. 12:e38
[Google Scholar]
74.
Kotsias PC, Arús-Pous J, Chen H, Engkvist O, Tyrchan C, Bjerrum EJ. 2020. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat. Mach. Intell. 2:254–65
[Google Scholar]
75.
Engkvist O, Norrby P, Selmi N, Lam Y, Peng Z et al. 2018. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23:1203–18
[Google Scholar]
76.
Corey EJ. 1967. General methods for the construction of complex molecules. Pure Appl. Chem. 14:19–38
[Google Scholar]
77.
Corey EJ, Long AK, Rubenstein SD. 1985. Computer-assisted analysis in organic synthesis. Science 228:408–19
[Google Scholar]
78.
Gelernter H, Rose JR, Chen C. 1990. Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning. J. Chem. Inf. Comput. Sci. 30:492–504
[Google Scholar]
79.
Kayala MA, Baldi P. 2012. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Model. 52:2526–40
[Google Scholar]
80.
Wei JN, Duvenaud D, Aspuru-Guzik A. 2016. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2:725–32
[Google Scholar]
81.
Coley CW, Barzilay R, Jaakkola TS, Green WH, Jensen KF. 2017. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3:434–43
[Google Scholar]
82.
Liu B, Ramsundar B, Kawthekar P, Shi J, Gomes J et al. 2017. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3:1103–13
[Google Scholar]
83.
Segler MHS, Waller MP. 2017. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23:5966–71
[Google Scholar]
84.
Segler MHS, Preuss M, Waller MP. 2018. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555:604–10
[Google Scholar]
85.
Gao H, Struble TJ, Coley CW, Wang Y, Green WH, Jensen KF. 2018. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4:1465–76
[Google Scholar]
86.
Coley CW, Thomas DA, Lummiss JAM, Jaworski JN, Breen CP et al. 2019. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365:e557
[Google Scholar]
87.
Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA et al. 2019. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5:1572–83
[Google Scholar]
88.
Thakkar A, Kogej T, Reymond J, Engkvist O, Bjerrum EJ. 2020. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11:e154
[Google Scholar]
89.
Zheng S, Rao J, Zhang Z, Xu J, Yang Y. 2020. Predicting retrosynthetic reactions using self-corrected transformer neural networks. J. Chem. Inf. Model. 60:47–55
[Google Scholar]
90.
Tetko IV, Karpov P, Deursen RV, Godin G. 2020. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11:e5575
[Google Scholar]
91.
Gao W, Coley CW. 2020. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60:5714–23
[Google Scholar]
92.
Thakkar A, Selmi N, Reymond J, Engkvist O, Bjerrum EJ. 2020.. “ Ring Breaker”: neural network driven synthesis prediction of the ring system chemical space. J. Med. Chem. 63:8791–808
[Google Scholar]
93.
Somnath VR, Bunne C, Coley CW, Krause A, Barzilay R. 2020. Learning graph models for template-free retrosynthesis Paper presented at International Conference on Machine Learning Vienna: Austria, Jul. 12–18
94.
Genheden S, Thakkar A, Chadimová V, Reymond J, Engkvist O, Bjerrum E. 2020. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 12:e70
[Google Scholar]
95.
Genheden S, Engkvist O, Bjerrum E. 2021. Clustering of synthetic routes using tree edit distance. J. Chem. Inf. Model. 61:3899–907
[Google Scholar]
96.
Thakkar A, Chadimová V, Bjerrum EJ, Engkvist O, Reymond J. 2021. Retrosynthetic accessibility score (RAScore)—rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chem. Sci. 12:e3339
[Google Scholar]
97.
Zhang Q, Yang LT, Chen Z, Li P 2018. A survey on deep learning for big data. Inform. Fusion 42:146–57
[Google Scholar]
98.
Hu Y, Bajorath J. 2017. Entering the ‘big data’ era in medicinal chemistry: molecular promiscuity analysis revisited. Future Sci. OA 3:FSO179
[Google Scholar]
99.
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M et al. 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–7
[Google Scholar]
100.
Kim S, Chen J, Cheng T, Gindulyte A, He J et al. 2021. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49:D1388–95
[Google Scholar]
101.
Sterling T, Irwin JJ. 2015. ZINC 15—ligand discovery for everyone. J. Chem. Inf. Model. 55:2324–37
[Google Scholar]
102.
Martin EJ, Zhu X. 2021. Collaborative profile-QSAR: a natural platform for building collaborative models among competing companies. J. Chem. Inf. Model. 61:1603–16
[Google Scholar]
103.
Spjuth O, Frid J, Hellander A. 2021. The machine learning life cycle and the cloud: implications for drug discovery. Expert Opin Drug Discov 16:1071–79
[Google Scholar]
104.
Sheller MJ, Edwards B, Reina GA, Martin J, Pati S et al. 2020. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10:e12598
[Google Scholar]
105.
Altae-Train H, Ramsundar B, Pappu AS, Pande V. 2017. Low data drug discovery with one-shot learning. ACS Cent. Sci. 3:283–93
[Google Scholar]
106.
Li X, Fourches D. 2020. Inductive transfer learning for molecular activity prediction: next-Gen QSAR models with MolPMoFiT. J. Cheminform. 12:e27
[Google Scholar]
107.
Gubaev K, Podryabinkin EV, Shapeev AV. 2018. Machine learning of molecular properties: locality and active learning. J. Chem. Phys. 148:241727
[Google Scholar]
108.
Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C. 2003. Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 43:667–73
[Google Scholar]
109.
Reker D, Schneider P, Schneider G, Brown JB. 2017. Active learning for computational chemogenomics. Future Med. Chem. 9:381–402
[Google Scholar]
110.
Rodríguez-Pérez R, Miljković F, Bajorath J. 2020. Assessing the information content of structural and protein-ligand interaction representations for the classification of kinase inhibitor binding modes via machine learning and active learning. J. Cheminform. 12:e36
[Google Scholar]
111.
Balfer J, Bajorath J. 2015. Visualization and interpretation of support vector machine activity predictions. J. Chem. Inf. Model. 55:1136–47
[Google Scholar]
112.
Rodríguez-Pérez R, Bajorath J. 2017. Support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction. ACS Omega 2:6371–79
[Google Scholar]
113.
Riniker S, Landrum GA. 2013. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J. Cheminform. 5:e43
[Google Scholar]
114.
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN 2013. Universal approach for structural interpretation of QSAR/QSPR models. Mol. Inform. 32:843–53
[Google Scholar]
115.
Matveieva M, Polishchuk P. 2021. Benchmarks for interpretation of QSAR models. J. Cheminform. 13:e41
[Google Scholar]
116.
Sheridan RP. 2019. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: How robust is it?. 591324–37
117.
Lundberg SM, Lee S. 2017. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30:4766–74
[Google Scholar]
118.
Rodríguez-Pérez R, Bajorath J 2021. Interpretation of compound activity predictions from complex machine learning models. J. Med. Chem. 63:8761–77
[Google Scholar]
119.
Rodríguez-Pérez R, Bajorath J 2020. Interpretation of machine learning models using Shapley values: application to compound potency and multi-target activity predictions. J. Comput.-Aided Mol. Des. 34:1013–26
[Google Scholar]
120.
Ribeiro MT, Singh S, Guestrin C. 2016. Why should I trust you? Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining1135–44 New York: ACM
[Google Scholar]
121.
Shapley LS. 1953. A value for n-person games. Contributions to the Theory of Games, Vol. 2 HW Kuhn, AW Tucker 307–17 Princeton, NJ: Princeton Univ. Press
[Google Scholar]
122.
Feldmann C, Philipps M, Bajorath J. 2021. Explainable machine learning predictions of dual-target compounds reveal characteristic structural features. Sci. Rep. 11:21594
[Google Scholar]
123.
Tang B, Kramer ST, Fang M, Qiu Y, Wu Z, Xu D. 2020. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J. Cheminform. 12:15
[Google Scholar]
124.
Cadow J, Born J, Manica M, Oskooei A, Rodríguez Martínez M. 2020. PaccMann: a web service for interpretable anticancer compound sensitivity prediction. Nucleic Acids Res 48:W502–8
[Google Scholar]

/content/journals/10.1146/annurev-biodatasci-122120-124216

Machine Learning in Chemoinformatics and Medicinal Chemistry

Annual Review of Biomedical Data Science 5, 43 (2022); https://doi.org/10.1146/annurev-biodatasci-122120-124216

/content/journals/10.1146/annurev-biodatasci-122120-124216

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ethical Machine Learning in Healthcare
  
  Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi
  
  Vol. 4 (2021), pp. 123–144
- Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence
  
  Theodore Alexandrov
  
  Vol. 3 (2020), pp. 61–87
- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
  
  Koen Van den Berge, Katharina M. Hembach, Charlotte Soneson, Simone Tiberi, Lieven Clement, Michael I. Love, Rob Patro, and Mark D. Robinson
  
  Vol. 2 (2019), pp. 139–173
- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
- Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS
  
  Lisa Bastarache
  
  Vol. 4 (2021), pp. 1–19
- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
More Less

Annual Review of Biomedical Data Science

Volume 5, 2022

Review Article

Free

Machine Learning in Chemoinformatics and Medicinal Chemistry

Abstract

Most Read This Month

Most Cited Most Cited RSS feed