Developments in biotechnology have enabled the design of whole genome assays for nucleic acid sequencing, gene expression monitoring, gene copy number evaluation, and epigenetic silencing that have had major effects on biology, cancer drug development, and clinical trial design. Because cancer is a disease of DNA alteration, these developments have had a particularly important effect on the development of personalized oncology. Facilitating this transition has been development of statistical methods for transforming this high dimensional data into useful biological information, > classification methods, and new designs for clinical trials. In this article we review some of the key statistical developments in this area.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Allison DB, Cui X, Page GP, Sabripour M. 2006. Microarray data analysis from disarray to consolidation and consensus. Nat. Rev. Genet. 7:55–65 [Google Scholar]
  2. Ambroise C, McLachlan GJ. 2002. Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS 99:6562–66 [Google Scholar]
  3. Baldi P, Long AD. 2001. A Bayesian framework for the analysis of microarray expression data: Regularized t test and statistical inference of gene changes. Bioinformatics 17:509–19 [Google Scholar]
  4. Beckman RA, Antonijevic Z, Kalamegham R, Chen C. 2016. Adaptive design for a confirmatory basket trial in multiple tumor types based on a putative predictive biomarker. Clin. Pharmacol. Ther. 100:6617–25 [Google Scholar]
  5. Beckman RA, Clark J, Chen C. 2011. Integrating predictive biomarkers and classifiers into oncology clinical development programmes. Nat. Rev. Drug Discov. 10:10735–48 [Google Scholar]
  6. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 85:289–300 [Google Scholar]
  7. Berry SM, Connor JT, Lewis RJ. 2015. The platform trial: an efficient strategy for evaluating multiple treatments. JAMA 313:161619–20 [Google Scholar]
  8. Bolstad BM, Irizarry R, Astrand M, Speed TP. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–93 [Google Scholar]
  9. Boulesteix AL, Strimmer K. 2007. Partial least squares: a versatile tool for the analysis of high dimensional genomic data. Brief. Bioinform. 8:32–44 [Google Scholar]
  10. Breiman L. 2001. Random forests. Mach. Learn. 45:5–32 [Google Scholar]
  11. Brannath W, Zuber E, Branson M, Bretz F, Gallo P. et al. 2009. Confirmatory adaptive designs with Bayesian decision tools for a targeted therapy in oncology. Stat. Med. 28:1445–63 [Google Scholar]
  12. Brown MPS, Grundy WN, Linn D, Cristianini N, Sugnet CW. et al. 1999. Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 97:262–67 [Google Scholar]
  13. Brown PO, Botstein D. 1999. Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21:33–37 [Google Scholar]
  14. Brunet JP, Tamayo P, Golub TR, Mesirov JP. 2004. Metagenes and molecular pattern discovery using matrix factorization. PNAS 101:4164–69 [Google Scholar]
  15. Cai T, Tian L, Wong PH, Wei LJ. 2011. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12:270–82 [Google Scholar]
  16. Cunanan KM, Iasonos A, Shen R, Begg CB, Gönen M. 2017. An efficient basket trial design. Stat. Med. 36:1568–79 [Google Scholar]
  17. Dixon DO, Simon R. 1991. Bayesian subset analysis. Biometrics 1:871–81 [Google Scholar]
  18. Dudoit S, Fridlyand J. 2002. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3:36.1–36.21 [Google Scholar]
  19. Dudoit S, Fridyland J, Speed T. 2002.a Comparison of discrimination methods for classification of tumors using gene expression data. J. Am. Stat. Assoc. 97:77–97 [Google Scholar]
  20. Dudoit S, Yang YH, Callow MJ, Speed TP. 2002.b Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12:111–40 [Google Scholar]
  21. Efron B. 2012. Large Scale Inference: Empirical Bayes Methods for Estimation, Testing and Prediction Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  22. Eisen MB, Spellman PT, Brown PO, Botstein D. 1998. Cluster analysis and display of genome-wide expression patterns. PNAS 95:14863–68 [Google Scholar]
  23. Foster JC, Taylor JMG, Ruberg SJ. 2011. Subgroup identification from randomized clinical trial data. Stat. Med. 30:2867–80 [Google Scholar]
  24. Freidlin B, Jiang W, Simon R. 2010. The cross-validated adaptive signature design. Clin. Cancer Res. 16:691–98 [Google Scholar]
  25. Freidlin B, Simon R. 2005. Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clin. Cancer Res. 11:7872–78 [Google Scholar]
  26. Galvan A, Ioannidis JPA, Dragani T. 2009. Beyond genome-wide association studies: genetic heterogeneity and individual predisposition to cancer. Trends Genet 26:132–41 [Google Scholar]
  27. Getz G, Levine E, Domany E. 2000. Coupled two-way clustering analysis of gene microarray data. PNAS 97:12079–84 [Google Scholar]
  28. Gonen M, Heller G. 2005. Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92:965–70 [Google Scholar]
  29. Graf AC, Posch M, Koenig F. 2015. Adaptive designs for subpopulation analysis optimizing utility functions. Biom. J. 57:76–89 [Google Scholar]
  30. Heagerty PJ, Lumly T, Pepe MS. 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56:337–44 [Google Scholar]
  31. Heagerty PJ, Zheng Y. 2005. Survival model predictive accuracy and ROC curves. Biometrics 61:92–105 [Google Scholar]
  32. Hoering A, LeBlanc M, Crowley J. 2008. Randomized phase III clinical trial designs for targeted agents. Clin. Cancer Res. 14:4358–67 [Google Scholar]
  33. Hudson TJ, Anderson W, Artez A. et al. 2010. International network of cancer genome projects. Nature 464:993–98 [Google Scholar]
  34. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ. et al. 2003. Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–64 [Google Scholar]
  35. Janes H, Pepe MS, Bossuyt PM, Barlow WE. 2011. Measuring the performance of markers for guiding treatment decisions. Ann. Intern. Med. 154:253–59 [Google Scholar]
  36. Jenkins M, Stone A, Jennison C. 2010. Seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints. Pharm. Stat. 10:347–56 [Google Scholar]
  37. Jiang W, Freidlin B, Simon R. 2007. Biomarker adaptive threshold design: a procedure for evaluating treatment with possible biomarker-defined subset effect. J. Natl. Cancer Inst. 99:1036–43 [Google Scholar]
  38. Karuri SW, Simon R. 2012. A two-stage Bayesian design for co-development of new drugs and companion diagnostics. Stat. Med. 31:901–14 [Google Scholar]
  39. Kerr MK, Churchill GA. 2001. Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98:8961–65 [Google Scholar]
  40. Lee JJ, Gu X, Liu S. 2010. Bayesian adaptive randomization designs for targeted agent development. Clin. Trials 7:584–96 [Google Scholar]
  41. Lee SI, Lee H, Abbeel P, Ng AY. 2006. Efficient L1 regularized logistic regression. Am. Assoc. Artif. Intel. 6:401–8 [Google Scholar]
  42. Magnusson BP, Turnbull BW. 2012. Group sequential enrichment design incorporating subgroup selection. Stat. Med. 32:2695–714 [Google Scholar]
  43. Maitournam A, Simon R. 2005. On the efficiency of targeted clinical trials. Stat. Med. 24:329–39 [Google Scholar]
  44. Mandrekar SJ, Sargent DJ. 2010. Predictive biomarker validation in practice lessons from real trials. Clin. Trials 7:567–73 [Google Scholar]
  45. Matsui S, Buyse M, Simon R. 2015. Design and Analysis of Clinical Trials for Predictive Medicine Boca Raton, FL: Chapman and Hall/CRC [Google Scholar]
  46. Matsui S, Simon R, Qu P, Shaughnessy JD, Barlogie B, Crowley J. 2012. Developing and validating continuous genomic signatures in randomized clinical trials for predictive medicine. Clin. Cancer Res. 18:6065–73 [Google Scholar]
  47. McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. 2002. Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18:1462–69 [Google Scholar]
  48. Mehta C, Schafer H, Daniel H, Irle S. 2014. Biomarker driven population enrichment for adaptive oncology trials with a time to event endpoints. Stat. Med. 33:4515–31 [Google Scholar]
  49. Molinaro AM, Simon R, Pfeiffer RM. 2005. Prediction error estimation: a comparison of resampling methods. Bioinformatics 21:3301–7 [Google Scholar]
  50. Murphy SA. 2003. Optimal dynamic treatment regimes (with discussion). J. R. Stat. Soc. B 58:331–66 [Google Scholar]
  51. Nguyen DV, Rocke DM. 2002. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50 [Google Scholar]
  52. Pepe MS, Feng Z, Huang Y, Longton G, Prentice R. et al. 2008. Integrating the predictiveness of a marker with its performance as a classifier. Am. J. Epidemiol. 167:362–68 [Google Scholar]
  53. Rosenblum M, van der Laan MJ. 2011. Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika 98:845–60 [Google Scholar]
  54. Schapire RE, Freund Y, Bartlett P, Lee WS. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26:1651–86 [Google Scholar]
  55. Shahn Z, Madigan D. 2017. Latent class mixture models of treatment effect heterogeneity. Bayesian Anal 12:831–54 [Google Scholar]
  56. Shen J, He X. 2015. Inference for subgroup analysis with a structured logistic-normal mixture model. J. Am. Stat. Assoc. 110:303–12 [Google Scholar]
  57. Simon N, Simon R. 2013. Adaptive enrichment designs in clinical trials. Biostatistics 14:613–25 [Google Scholar]
  58. Simon RM. 2013. Genomic Clinical Trials and Predictive Medicine Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  59. Simon RM. 2015. Sensitivity, specificity, PPV, and NPV for predictive biomarkers. J. Natl. Cancer Inst. 107:8djv153 [Google Scholar]
  60. Simon RM, Geyer S, Subramanian J, Roychowdhury S. 2016. The Bayesian basket design for genomic variant-driven phase II trials. Semin. Oncol. 43:13–18 [Google Scholar]
  61. Simon RM, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. 2007. Analysis of gene expression microarray data using BRB-ArrayTools. Cancer Inform 3:11–17 [Google Scholar]
  62. Simon RM, Maitournam A. 2004. Evaluating the efficiency of targeted designs for randomized clinical trials. Clin. Cancer Res 10:6759–63 Erratum 2006. Clin. Cancer Res. 12:3229 [Google Scholar]
  63. Simon RM, Radmacher MD, Dobbin K, McShane LM. 2003. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95:14–18 [Google Scholar]
  64. Simon RM, Subramanian J, Li MC, Menezes S. 2011. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief. Bioinform. 12:203–14 [Google Scholar]
  65. Smyth GK. 2004. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3:3 [Google Scholar]
  66. Song JX. 2014. A two-stage patient enrichment adaptive design in phase II oncology trials. Contemp. Clin. Trials 37:148–54 [Google Scholar]
  67. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S. et al. 2001. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS 98:10869–74 [Google Scholar]
  68. Speed T. 2005. Statistical Analysis of Gene Expression Microarray Data Boca Raton, FL: Chapman and Hall/CRC [Google Scholar]
  69. Storey JD. 2002. A direct approach to false discovery rates. J. R. Stat. Soc. B 64:479–98 [Google Scholar]
  70. Storey JD. 2003. The positive false discovery rate: A Bayesian interpretation and the q value. Ann. Stat. 31:2013–35 [Google Scholar]
  71. Subramanian J, Simon R. 2010. An evaluation of resampling methods for assessment of survival risk prediction in high-dimensional settings. Stat. Med. 30:642–53 [Google Scholar]
  72. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S. et al. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. PNAS 96:2907–12 [Google Scholar]
  73. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88 [Google Scholar]
  74. Tibshirani R. 1997. The lasso method for variable selection in the Cox model. Stat. Med. 16:385–95 [Google Scholar]
  75. Tibshirani R, Walther G, Hastie T. 2001. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63:411–23 [Google Scholar]
  76. Tibshirani R, Hastie T, Narasimhan B, Chu G. 2003. Class prediction by nearest shrunken centroids with applications to DNA microarrays. Stat. Sci. 18:104–17 [Google Scholar]
  77. Tibshirani R, Walther G. 2005. Cluster validation by prediction strength. J. Comput. Graph. Stat. 14:511–28 [Google Scholar]
  78. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. 2005. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67:91–108 [Google Scholar]
  79. Tibshirani RJ, Efron B. 2002. Pre-validation and inference in microarrays. Stat. Appl. Genet. Mol. Biol. 1:1 [Google Scholar]
  80. Tusher V, Tibshirani R, Chu C. 2001. Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98:5116–21 [Google Scholar]
  81. Varma S, Simon R. 2006. Bias in error estimation when using cross-validation for model selection. BMC Bioinform 7:91 [Google Scholar]
  82. Yuan M, Lin Y. 2007.a Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68:49–67 [Google Scholar]
  83. Yuan M, Lin Y. 2007.b Model selection and estimation in the Gaussian graphical model. Biometrika 94:19–35 [Google Scholar]
  84. Wang SJ, Hung HMJ, O'Neill RT. 2007. Approaches to evaluation of treatment effect in clinical trials with genomic subset. J. Pharm. Stat. 6:227–44 [Google Scholar]
  85. Wang SJ, Hung HMJ, O'Neill RT. 2009. Adaptive patient enrichment designs in therapeutic trials. Biom. J. 51:358–74 [Google Scholar]
  86. Wassmer G, Dragalin V. 2015. Designing issues in confirmatory adaptive population enrichment trials. J. Biopharm. Stat. 25:651–69 [Google Scholar]
  87. Wright GW, Simon R. 2003. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19:2448–55 [Google Scholar]
  88. Wu T, Lange K. 2008. Coordinate descent procedures for lasso penalized regression. Ann. Appl. Stat. 2:224–44 [Google Scholar]
  89. Zhang B, Tsiatis AA, Laber EB, Davidian M. 2012.a A robust method for estimating optimal treatment regimes. Biometrics 68:1010–18 [Google Scholar]
  90. Zhang B, Tsaitis AA, Davidian M, Zhang M, Laber E. 2012.b Estimating optimal treatment regimes from a classification perspective. Biometrics 68:103–14 [Google Scholar]
  91. Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67:301–20 [Google Scholar]
  92. Zhao L, Tian L, Cai T, Clagget B, Wei LJ. 2013. Effectively selecting a target population for a future comparative study. J. Am. Stat. Assoc. 108:527–39 [Google Scholar]
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error