Identifying Predictive Features in Drug Response Using Machine Learning: Opportunities and Challenges

Mathukumalli Vidyasagar

doi:10.1146/annurev-pharmtox-010814-124502

Annual Review of Pharmacology and Toxicology

Volume 55, 2015

Review Article

Free

Identifying Predictive Features in Drug Response Using Machine Learning: Opportunities and Challenges

Mathukumalli Vidyasagar¹
View Affiliations Hide Affiliations

Affiliations: Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080; email: [email protected]
Vol. 55:15-34 (Volume publication date January 2015) https://doi.org/10.1146/annurev-pharmtox-010814-124502
First published as a Review in Advance on December 12, 2014
© Annual Reviews

Abstract

This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

Keyword(s): cancer biology, EN algorithm, GSEA, k-means clustering, LASSO, machine learning, neural networks, PAM, precision medicine, prediction in pharmacology, regression, SAM, SVMs

Article metrics loading...

/content/journals/10.1146/annurev-pharmtox-010814-124502

2015-01-06

2024-05-12

Full text loading...

/deliver/fulltext/pharmtox/55/1/annurev-pharmtox-010814-124502.html?itemId=/content/journals/10.1146/annurev-pharmtox-010814-124502&mimeType=html&fmt=ahah

Literature Cited

Siegel R, Ma J, Zou Z, Jamal A. 1. 2014. Cancer statistics, 2014. CA Cancer J. Clin. 64:19–29 [Google Scholar]
2. Cancer Research UK 2013. Cancer statistics report: cancer mortality in the UK in 2011 Rep., Cancer Res. UK, London, UK. http://publications.cancerresearchuk.org/downloads/Product/CS_CS_MORTALITY.pdf
3. World Health Organization 2014. Cancer World Health Organ., Cancer Control Programme, Geneva, Switz. http://www.who.int/cancer/en/
4. Cancer Genome Atlas Res. Netw 2011. Integrated genomic analyses of ovarian carcinoma. Nature 474:609–15 [Google Scholar]
5. Cancer Genome Atlas Netw 2012. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330–37 [Google Scholar]
6. Cancer Genome Atlas Netw 2012. Comprehensive molecular portraits of human breast tumours. Nature 490:61–70 [Google Scholar]
7. Cancer Genome Atlas Res. Netw 2012. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489:519–25 [Google Scholar]
8. Cancer Genome Atlas Res. Netw 2008. Comprehensive genomic characterization defines human glioblastoma and core pathways. Nature 455:1061–68 [Google Scholar]
9. European Central Bank 2012. Report on card fraud Rep., July, Eur. Cent. Bank, Frankfurt am Main, Ger. http://www.ecb.europa.eu/pub/pdf/other/cardfraudreport201207en.pdf
Røe K, Kakar M, Seierstad T, Ree AH, Olsen DR. 10. 2011. Early prediction of response to radiotherapy and androgen-deprivation therapy in prostate cancer by repeated functional MRI: a preclinical study. Radiat. Oncol. 6:65 [Google Scholar]
Menden MP, Iorio F, Garnett M, McDermott U, Benes CH. 11. et al. 2013. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLOS ONE 8:4e61318 [Google Scholar]
van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM. 12. et al. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–37 [Google Scholar]
Mook S, Schmidt MK, Weigelt B, Kreike B, Eekhout I. 13. et al. 2010. The 70-gene prognosis signature predicts early metastasis in breast cancer patients between 55 and 70 years of age. Ann. Oncol. 21:717–22 [Google Scholar]
Kok M, Koornstra RH, Mook S, Hauptmann M, Fles R. 14. et al. 2012. Additional value of the 70-gene signature and levels of ER and PR for the prediction of outcome in tamoxifen-treated ER-positive breast cancer. Breast 21:769–78 [Google Scholar]
MacQueen JB. 15. 1967. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab.281–97 Berkeley: Univ. Calif. Press [Google Scholar]
Tibshirani R, Hastie T, Narasimhan B, Chu G. 16. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99:106567–72 [Google Scholar]
Vidyasagar M. 17. 2012. Computational Cancer Biology: An Interaction Network Approach London: Springer
Tusher VG, Tibshirani R, Chu G. 18. 2001. Significance analysis of microarrays applied to the ionizing radiation responses. Proc. Natl. Acad. Sci. USA 98:95116–21 [Google Scholar]
Efron B, Tibshirani R. 19. 2007. On testing the significance of a set of genes. Ann. Appl. Stat. 1:1107–29 [Google Scholar]
Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert BL. 20. et al. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102:4315545–50 [Google Scholar]
Cortes C, Vapnik VN. 21. 1995. Support vector networks. Mach. Learn. 20:273–97 [Google Scholar]
Wenocur RS, Dudley RM. 22. 1981. Some special Vapnik-Chervonenkis classes. Discret. Math. 33:313–18 [Google Scholar]
Guyon I, Weston J, Barnhill S, Vapnik V. 23. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46:389–422 [Google Scholar]
Bradley PS, Mangasarian OL. 24. 1998. Feature selection via concave minimization and support vector machines. ICML '98 Proc. Fifteenth Int. Conference Mach. Learn.82–90 San Francisco: Morgan Kaufmann
Ahsen ME, Singh NK, Boren T, Vidyasagar M, White MA. 25. 2012. A new feature selection algorithm for two-class classification problems and application to endometrial cancer. Proc. IEEE 51st Conf. Decis. Control, Maui, HI, Dec. 10–132976–82
Vidyasagar M. 26. 2014. Machine learning methods in the computational biology of cancer. Proc. R. Soc. A 470:20140081 [Google Scholar]
Tikhonov AN. 27. 1943. On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39:5195–98 [Google Scholar]
Hoerl AE, Kennard RW. 28. 1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:155–67 [Google Scholar]
Tibshirani R. 29. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:1267–88 [Google Scholar]
Osborne MR, Presnell B, Turlach BA. 30. 2000. On the LASSO and its dual. J. Comput. Graph. Stat. 9:319–37 [Google Scholar]
Zou H, Hastie T. 31. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67:301–20 [Google Scholar]
Hastie T, Tibshirani R, Friedman J. 32. 2011. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2nd ed..
Yuan M, Lin Y. 33. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68:49–67 [Google Scholar]
Friedman J, Hastie T, Tibshirani R. 34. 2010. A note on the group lasso and sparse group lasso Dep. Stat., Stanford Univ., Stanford, CA. http://statweb.stanford.edu/∼tibs/ftp/sparse-grlasso.pdf
Simon N, Friedman J, Hastie T, Tibshirani R. 35. 2013. A sparse-group lasso J. Comput. Graph. Stat. 22231–45 [Google Scholar]
36. MiMI 2013. Cytoscape plugin for MiMI. Univ. Mich., Ann Arbor, MI. http://mimiplugin.ncibi.org/
37. IntAct 2014. IntAct molecular interaction database EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK. http://www.ebi.ac.uk/intact/
38. KEGG 2014. KEGG database Kanehisa Laboratories, Inst. Chem. Res., Kyoto Univ., Kyoto, Jpn. http://www.genome.jp/kegg/kegg1.html
39. MINT 2014. MINT: Molecular INTeraction database http://mint.bio.uniroma2.it/mint/Welcome.do
40. Reactome 2013. Reactome Pathway Browser Reactome, http://www.reactome.org/PathwayBrowser/
41. BROCA 2013. BROCA—Cancer Risk Panel. Lab. Med., Univ. Wash., Seattle, WA. http://tests.labmed.washington.edu/BROCA
Kang J, D'Andrea AD, Kozono D. 42. 2012. A DNA repair pathway–focused score for prediction of outcomes in ovarian cancer treated with platinum-based chemotherapy. J. Natl. Cancer Inst. 104:9670–81 [Google Scholar]
Butte AJ, Kohane IS. 43. 2000. Mutual information relevance networks: functional genomic clustering using pairwise entropy measures. Pac. Symp. Biocomput. 2000:418–29 [Google Scholar]
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G. 44. et al. 2008. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a cellular context. BMC Bioinform. 7:Suppl. 1S7 [Google Scholar]
Singh NK, Ahsen ME, Mankala SK, Vidyasagar M, White MA. 45. 2012. Inferring weighted and directed gene interaction networks from gene expression data using the phi-mixing coefficient. Proc. 2012 IEEE Int. Workshop Genomic Signal Process. Stat. (GENSIPS'12), Washington, DC, Dec. 2–4168–71 [Google Scholar]
Ahsen ME, Vidyasagar M. 46. 2014. Mixing coefficients between discrete and real random variables: computation and properties. IEEE Trans. Autom. Control 59:134–47 [Google Scholar]

/content/journals/10.1146/annurev-pharmtox-010814-124502

Identifying Predictive Features in Drug Response Using Machine Learning: Opportunities and Challenges

Annual Review of Pharmacology and Toxicology 55, 15 (2015); https://doi.org/10.1146/annurev-pharmtox-010814-124502

/content/journals/10.1146/annurev-pharmtox-010814-124502

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Role of Nrf2 in Oxidative Stress and Toxicity
  
  Qiang Ma
  
  Vol. 53 (2013), pp. 401–426
- PHARMACOLOGY AND FUNCTIONS OF METABOTROPIC GLUTAMATE RECEPTORS
  
  P. Jeffrey Conn, and Jean-Philippe Pin
  
  Vol. 37 (1997), pp. 205–237
- CYCLOOXYGENASES 1 AND 2
  
  J. R. Vane, Y. S. Bakhle, and R. M. Botting
  
  Vol. 38 (1998), pp. 97–120
- THE HEME OXYGENASE SYSTEM:A Regulator of Second Messenger Gases
  
  Mahin D. Maines
  
  Vol. 37 (1997), pp. 517–554
- 2,3,7,8-Tetrachlorodibenzo-p-Dioxin and Related Halogenated Aromatic Hydrocarbons: Examination of the Mechanism of Toxicity
  
  Alan Poland, and Joyce C. Knutson
  
  Vol. 22 (1982), pp. 517–554
- Excitatory Amino Acid Transmitters
  
  J C Watkins, and R H Evans
  
  Vol. 21 (1981), pp. 165–204
- The Excitatory Amino Acid Receptors: Their Classes, Pharmacology, and Distinct Properties in the Function of the Central Nervous System
  
  D T Monaghan, R J Bridges, and C W Cotman
  
  Vol. 29 (1989), pp. 365–402
- BIOCHEMICAL, CELLULAR, AND PHARMACOLOGICAL ASPECTS OF THE MULTIDRUG TRANSPORTER¹
  
  Suresh V. Ambudkar, Saibal Dey, Christine A. Hrycyna, Muralidhara Ramachandra, Ira Pastan, and Michael M. Gottesman
  
  Vol. 39 (1999), pp. 361–398
- Efficient Analysis of Experimental Observations
  
  W J Dixon
  
  Vol. 20 (1980), pp. 441–462
- Specific Pharmacology of Calcium in Myocardium, Cardiac Pacemakers, and Vascular Smooth Muscle
  
  A Fleckenstein
  
  Vol. 17 (1977), pp. 149–166
More Less

Annual Review of Pharmacology and Toxicology

Volume 55, 2015

Review Article

Free

Identifying Predictive Features in Drug Response Using Machine Learning: Opportunities and Challenges

Abstract

Most Read This Month

Most Cited Most Cited RSS feed