This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and -means clustering. Several references indicative of the application of these methods to cancer biology are discussed.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Siegel R, Ma J, Zou Z, Jamal A. 1.  2014. Cancer statistics, 2014. CA Cancer J. Clin. 64:19–29 [Google Scholar]
  2. 2. Cancer Research UK 2013. Cancer statistics report: cancer mortality in the UK in 2011 Rep., Cancer Res. UK, London, UK. http://publications.cancerresearchuk.org/downloads/Product/CS_CS_MORTALITY.pdf
  3. 3. World Health Organization 2014. Cancer World Health Organ., Cancer Control Programme, Geneva, Switz. http://www.who.int/cancer/en/
  4. 4. Cancer Genome Atlas Res. Netw 2011. Integrated genomic analyses of ovarian carcinoma. Nature 474:609–15 [Google Scholar]
  5. 5. Cancer Genome Atlas Netw 2012. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330–37 [Google Scholar]
  6. 6. Cancer Genome Atlas Netw 2012. Comprehensive molecular portraits of human breast tumours. Nature 490:61–70 [Google Scholar]
  7. 7. Cancer Genome Atlas Res. Netw 2012. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489:519–25 [Google Scholar]
  8. 8. Cancer Genome Atlas Res. Netw 2008. Comprehensive genomic characterization defines human glioblastoma and core pathways. Nature 455:1061–68 [Google Scholar]
  9. 9. European Central Bank 2012. Report on card fraud Rep., July, Eur. Cent. Bank, Frankfurt am Main, Ger. http://www.ecb.europa.eu/pub/pdf/other/cardfraudreport201207en.pdf
  10. Røe K, Kakar M, Seierstad T, Ree AH, Olsen DR. 10.  2011. Early prediction of response to radiotherapy and androgen-deprivation therapy in prostate cancer by repeated functional MRI: a preclinical study. Radiat. Oncol. 6:65 [Google Scholar]
  11. Menden MP, Iorio F, Garnett M, McDermott U, Benes CH. 11.  et al. 2013. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLOS ONE 8:4e61318 [Google Scholar]
  12. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM. 12.  et al. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–37 [Google Scholar]
  13. Mook S, Schmidt MK, Weigelt B, Kreike B, Eekhout I. 13.  et al. 2010. The 70-gene prognosis signature predicts early metastasis in breast cancer patients between 55 and 70 years of age. Ann. Oncol. 21:717–22 [Google Scholar]
  14. Kok M, Koornstra RH, Mook S, Hauptmann M, Fles R. 14.  et al. 2012. Additional value of the 70-gene signature and levels of ER and PR for the prediction of outcome in tamoxifen-treated ER-positive breast cancer. Breast 21:769–78 [Google Scholar]
  15. MacQueen JB. 15.  1967. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab.281–97 Berkeley: Univ. Calif. Press [Google Scholar]
  16. Tibshirani R, Hastie T, Narasimhan B, Chu G. 16.  2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99:106567–72 [Google Scholar]
  17. Vidyasagar M. 17.  2012. Computational Cancer Biology: An Interaction Network Approach London: Springer
  18. Tusher VG, Tibshirani R, Chu G. 18.  2001. Significance analysis of microarrays applied to the ionizing radiation responses. Proc. Natl. Acad. Sci. USA 98:95116–21 [Google Scholar]
  19. Efron B, Tibshirani R. 19.  2007. On testing the significance of a set of genes. Ann. Appl. Stat. 1:1107–29 [Google Scholar]
  20. Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert BL. 20.  et al. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102:4315545–50 [Google Scholar]
  21. Cortes C, Vapnik VN. 21.  1995. Support vector networks. Mach. Learn. 20:273–97 [Google Scholar]
  22. Wenocur RS, Dudley RM. 22.  1981. Some special Vapnik-Chervonenkis classes. Discret. Math. 33:313–18 [Google Scholar]
  23. Guyon I, Weston J, Barnhill S, Vapnik V. 23.  2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46:389–422 [Google Scholar]
  24. Bradley PS, Mangasarian OL. 24.  1998. Feature selection via concave minimization and support vector machines. ICML '98 Proc. Fifteenth Int. Conference Mach. Learn.82–90 San Francisco: Morgan Kaufmann
  25. Ahsen ME, Singh NK, Boren T, Vidyasagar M, White MA. 25.  2012. A new feature selection algorithm for two-class classification problems and application to endometrial cancer. Proc. IEEE 51st Conf. Decis. Control, Maui, HI, Dec. 10–132976–82
  26. Vidyasagar M. 26.  2014. Machine learning methods in the computational biology of cancer. Proc. R. Soc. A 470:20140081 [Google Scholar]
  27. Tikhonov AN. 27.  1943. On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39:5195–98 [Google Scholar]
  28. Hoerl AE, Kennard RW. 28.  1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:155–67 [Google Scholar]
  29. Tibshirani R. 29.  1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:1267–88 [Google Scholar]
  30. Osborne MR, Presnell B, Turlach BA. 30.  2000. On the LASSO and its dual. J. Comput. Graph. Stat. 9:319–37 [Google Scholar]
  31. Zou H, Hastie T. 31.  2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67:301–20 [Google Scholar]
  32. Hastie T, Tibshirani R, Friedman J. 32.  2011. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2nd ed..
  33. Yuan M, Lin Y. 33.  2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68:49–67 [Google Scholar]
  34. Friedman J, Hastie T, Tibshirani R. 34.  2010. A note on the group lasso and sparse group lasso Dep. Stat., Stanford Univ., Stanford, CA. http://statweb.stanford.edu/∼tibs/ftp/sparse-grlasso.pdf
  35. Simon N, Friedman J, Hastie T, Tibshirani R. 35.  2013. A sparse-group lasso J. Comput. Graph. Stat. 22231–45 [Google Scholar]
  36. 36. MiMI 2013. Cytoscape plugin for MiMI. Univ. Mich., Ann Arbor, MI. http://mimiplugin.ncibi.org/
  37. 37. IntAct 2014. IntAct molecular interaction database EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK. http://www.ebi.ac.uk/intact/
  38. 38. KEGG 2014. KEGG database Kanehisa Laboratories, Inst. Chem. Res., Kyoto Univ., Kyoto, Jpn. http://www.genome.jp/kegg/kegg1.html
  39. 39. MINT 2014. MINT: Molecular INTeraction database http://mint.bio.uniroma2.it/mint/Welcome.do
  40. 40. Reactome 2013. Reactome Pathway Browser Reactome, http://www.reactome.org/PathwayBrowser/
  41. 41. BROCA 2013. BROCA—Cancer Risk Panel. Lab. Med., Univ. Wash., Seattle, WA. http://tests.labmed.washington.edu/BROCA
  42. Kang J, D'Andrea AD, Kozono D. 42.  2012. A DNA repair pathway–focused score for prediction of outcomes in ovarian cancer treated with platinum-based chemotherapy. J. Natl. Cancer Inst. 104:9670–81 [Google Scholar]
  43. Butte AJ, Kohane IS. 43.  2000. Mutual information relevance networks: functional genomic clustering using pairwise entropy measures. Pac. Symp. Biocomput. 2000:418–29 [Google Scholar]
  44. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G. 44.  et al. 2008. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a cellular context. BMC Bioinform. 7:Suppl. 1S7 [Google Scholar]
  45. Singh NK, Ahsen ME, Mankala SK, Vidyasagar M, White MA. 45.  2012. Inferring weighted and directed gene interaction networks from gene expression data using the phi-mixing coefficient. Proc. 2012 IEEE Int. Workshop Genomic Signal Process. Stat. (GENSIPS'12), Washington, DC, Dec. 2–4168–71 [Google Scholar]
  46. Ahsen ME, Vidyasagar M. 46.  2014. Mixing coefficients between discrete and real random variables: computation and properties. IEEE Trans. Autom. Control 59:134–47 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error