1932

Abstract

Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054131
2017-03-07
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054131.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054131&mimeType=html&fmt=ahah

Literature Cited

  1. Aalen OO. 1992. Modelling heterogeneity in survival analysis by the compound Poisson distribution. Ann. Appl. Probability 4:951–72 [Google Scholar]
  2. Aalen OO, Borgan O, Gjessing HK. 2008. Survival and Event History Analysis: A Process Point of View New York: Springer
  3. Aguirre-Hernández R, Farewell VT. 2004. Appraisals of models for the study of disease progression in psoriatic arthritis. Handbook of Statistics 23: Advances in Survival Analysis ed. N Balakrishnan, CR Rao 643–73 Amsterdam: Elsevier [Google Scholar]
  4. Aitchison J. 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. J. Am. Stat. Assoc. 50:901–8 [Google Scholar]
  5. Albert PS, Shih W, Lu S, Lin Y. 2005. Letter to the editor of Biometrics. Biometrics 47:879–81 [Google Scholar]
  6. Berkson J, Gage RP. 1952. Survival curves for cancer patients following treatment. J. Am. Stat. Assoc. 47:501–15 [Google Scholar]
  7. Boag JW. 1949. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. B 11:15–44 [Google Scholar]
  8. Breslow N, Clayton D. 1993. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88:9–25 [Google Scholar]
  9. Bruce B, Fries JF. 2003. The Stanford Health Assessment Questionnaire: dimensions and practical applications. Health Q. Life Outcomes 1:1–20 [Google Scholar]
  10. Cohen AC. 1963. Estimation in mixtures of discrete distributions. Proc. Int. Symp. Discret. Distrib.373–78 Montreal: Pergamon Press [Google Scholar]
  11. Cragg JG. 1971. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39:829–44 [Google Scholar]
  12. Diggle P, Heagerty P, Liang KY, Zeger S. 2002. Analysis of Longitudinal Data New York: Oxford Univ. Press
  13. Dobbie M, Welsh A. 2001. Theory & methods: modelling correlated zero-inflated count data. Aust. N. Z. J. Stat. 43:431–44 [Google Scholar]
  14. Farewell VT. 1977. A model for a binary variable with time-censored observations. Biometrika 64:43–46 [Google Scholar]
  15. Farewell VT. 1986. Mixture models in survival analysis: Are they worth the risk?. Can. J. Stat. 14:257–62 [Google Scholar]
  16. Ghosh P, Albert PS. 2009. A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput. Stat. Data Anal. 53:699–706 [Google Scholar]
  17. Gladman DD, Farewell VT, Kopciuk M, Cook R. 1998. HLA markers and progression in psoriatic arthritis. J. Rheumatol. 25:730–33 [Google Scholar]
  18. Gladman DD, Farewell VT, Nadeau C. 1995. Clinical indicators of progression in psoriatic arthritis: multivariate relative risk model. J. Rheumatol. 22:675–79 [Google Scholar]
  19. Gladman DD, Shuckett R, Russell ML, Thorne J, Schachter RK. 1987. Psoriatic arthritis (PsA)—an analysis of 220 patients. Q. J. Med. 62:127–41 [Google Scholar]
  20. Golin C, Davis R, Przybyla S, Fowler B, Parker S. et al. 2010. SafeTalk, a multicomponent, motivational interviewing-based, safer sex counseling program for people living with HIV/AIDS: a qualitative assessment of patients’ views. AIDS Patient Care STDs 24:237–45 [Google Scholar]
  21. Golin C, Earp J, Grodensky C, Patel S, Suchindran C. et al. 2012. Longitudinal effects of SafeTalk, a motivational interviewing-based program to improve safer sex practices among people living with HIV/AIDS. AIDS Behav. 16:1182–91 [Google Scholar]
  22. Hall DB. 2000. Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56:1030–39 [Google Scholar]
  23. Hall DB, Zhang Z. 2004. Marginal models for zero inflated clustered data. Stat. Model. 4:161–80 [Google Scholar]
  24. Heagerty PJ. 1999. Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55:688–98 [Google Scholar]
  25. Heagerty PJ. 2002. Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics 58:342–51 [Google Scholar]
  26. Heagerty PJ, Zeger SL. 2000. Marginalized multilevel model and likelihood inference (with discussion). Stat. Sci. 15:1–26 [Google Scholar]
  27. Henderson R, Diggle P, Dobson A. 2000. Joint modelling of longitudinal measurements and event time data. Biostatistics 1:465–80 [Google Scholar]
  28. Husted JA, Tom BD, Farewell VT, Schentag CT, Gladman DD. 2005. Description and prediction of physical functional disability in psoriatic arthritis: a longitudinal analysis using a Markov model approach. Arthritis Rheum. 53:404–9 [Google Scholar]
  29. Husted JA, Tom BD, Farewell VT, Schentag CT, Gladman DD. 2007. A longitudinal study of the effect of disease activity and clinical damage on physical function over the course of psoriatic arthritis: Does the effect change over time?. Arthritis Rheum. 56:840–49 [Google Scholar]
  30. Johnson NL, Kotz S. 1969. Distributions in Statistics: Discrete Distributions Boston: Houghton Mifflin
  31. Lachenbruch PA. 2002. Analysis of data with excess zeros. Stat. Methods Med. Res. 11:297–302 [Google Scholar]
  32. Lambert D. 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14 [Google Scholar]
  33. Lee AH, Wang K, Scott JA, Yau KK, McLachlan GJ. 2006. Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Stat. Methods Med. Res. 15:47–61 [Google Scholar]
  34. Lin L, Bandyopadhyay D, Lipsitz SR, Sinha D. 2010. Association models for clustered data with binary and continuous responses. Biometrics 66:287–93 [Google Scholar]
  35. Liu L, Cowen M, Strawderman RL, Shih YT. 2010. A flexible two-part random-effects model for correlated medical costs. J. Health Econ. 29:110–23 [Google Scholar]
  36. Long DL, Presser JS, Herring AH, Golin CE. 2014. A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat. Med. 33:5151–65 [Google Scholar]
  37. Long DL, Presser JS, Herring AH, Golin CE. 2015. A marginalized zero-inflated Poisson regression model with random effects. J. R. Stat. Soc. C 64:815–30 [Google Scholar]
  38. Lu SE, Lin Y, Shih WCJ. 2004. Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60:257–67 [Google Scholar]
  39. Maller R, Zhou X. 1996. Survival Analysis with Long-Term Survivors Chichester, UK: Wiley
  40. Min Y, Agresti A. 2005. Random effect models for repeated measures of zero-inflated count data. Stat. Model. 5:1–19 [Google Scholar]
  41. Moger TA, Aalen OO, Heimdal KA. 2005. A distribution for multivariate frailty based on the compound Poisson distribution with random scale. Lifetime Data Anal. 11:41–95 [Google Scholar]
  42. Moger TA, Aalen OO, Heimdal KA, Gjessing KH. 2004. Analysis of testicular cancer data using a frailty model with familial dependence. Stat. Med. 23:617–32 [Google Scholar]
  43. Moulton LH, Curriero FC, Barroso PF. 2002. Mixture models for quantitative HIV RNA data. Stat. Methods Med. Res. 11:317–25 [Google Scholar]
  44. Munro R, Hampson R, McEntegart A, Thomson EA, Madhock R, Capell H. 1998. Improved functional outcome in patients with early rheumatoid arthritis treated with intramuscular gold: results of a five year prospective study. Ann. Rheum. Dis. 57:88–93 [Google Scholar]
  45. Neelon B, O'Malley AJ, Normand SL. 2011. A Bayesian two-part latent class model for longitudinal medical expenditure data: assessing the impact of mental health and substance abuse parity. Biometrics 67:280–89 [Google Scholar]
  46. O'Keeffe AG, Tom BDM, Farewell VT. 2012. Mixture distributions in multi-state modelling: some considerations in a study of psoriatic arthritis. Stat. Med. 32:600–19 [Google Scholar]
  47. Olsen MK, Schafer JL. 2001. A two-part random-effects model for semicontinuous longitudinal data. J. Am. Stat. Assoc. 96:730–45 [Google Scholar]
  48. Ospina R, Ferrari SLP. 2012. A general class of zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 56:1609–23 [Google Scholar]
  49. Preisser JS, Das K, Long DL, Divaris K. 2016. Marginalized zero-inflated negative binomial regression with application to dental caries. Stat. Med. 35:1722–35 [Google Scholar]
  50. Saha C, Jones MP. 2005. Asymptotic bias in the linear mixed effects model under non-ignorable missing data mechanisms. J. R. Stat. Soc. B 67:167–82 [Google Scholar]
  51. SAS Institute Inc 2013. NLMIXED Procedure. SAS/STAT Software, Version 9.3. SAS Inst., Inc., Cary, NC. https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#nlmixed_toc.htm
  52. Self S, Liang KY. 1987. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82:605–10 [Google Scholar]
  53. Siannis F, Farewell VT, Cook RJ, Schentag CT, Gladman DD. 2006. Clinical and radiologic damage in psoriatic arthritis. Ann. Rheum. Dis. 65:478–81 [Google Scholar]
  54. Smith VA, Preisser JS, Neelon B, Maciejewski M. 2014. A marginalized two-part model for semi-continuous data. Stat. Med. 33:4891–903 [Google Scholar]
  55. Smith VA, Preisser JS, Neelon B, Maciejewski M. 2015. A marginalized two-part model for longitudinal semi-continuous data. Stat. Methods Med. Res. doi: 10.1177/0962280215592908
  56. Solis-Trapala IL, Farewell VT. 2005. Regression analysis of overdispersed correlated count data with subject specific covariates. Stat. Med. 24:2557–75 [Google Scholar]
  57. Stiratelli R, Laird N, Ware JH. 1984. Random-effects models for serial observations with binary response. Biometrics 40:961–71 [Google Scholar]
  58. Su L, Tom BD, Farewell VT. 2009. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 10:374–89 [Google Scholar]
  59. Su L, Tom BD, Farewell VT. 2015. A likelihood–based two–part marginal model for longitudinal semicontinous data. Stat. Methods Med. Res. 24:194–205 [Google Scholar]
  60. Taweab F, Ibrahim NA. 2014. Cure rate models: a review of recent progress with a study of change-point cure models when cured is partially known. J. Appl. Sci. 14:609–16 [Google Scholar]
  61. Taylor JMG. 1995. Semi-parametric estimation in failure time mixture distributions. Biometrics 51:814–17 [Google Scholar]
  62. Tom BD, Su L, Farewell VT. 2016. A corrected formulation for marginal inference derived from two-part mixed models for longitudinal semi-continuous data. Stat. Methods Med. Res. 25:2014–20 [Google Scholar]
  63. Tooze JA, Grunwald GK, Jones RH. 2002. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 11:341–55 [Google Scholar]
  64. Wang Z, Louis T. 2003. Matching conditional and marginal shapes in binary mixed-effects models using a bridge distribution function. Biometrika 90:765–75 [Google Scholar]
  65. White H. 1982. Maximum likelihood estimation of misspecified models. Econometrica 50:1–25 [Google Scholar]
  66. Williamson JM, Datta S, Satten GA. 2003. Marginal analysis of clustered data when cluster size is informative. Biometrics 59:36–42 [Google Scholar]
  67. Wolfinger R, O'Connell M. 1993. Generalized linear models. J. Stat. Comput. Simul. 48:233–43 [Google Scholar]
  68. Wu M, Bailey K. 1989. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–55 [Google Scholar]
  69. Wu M, Carroll R. 1988. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44:175–88 [Google Scholar]
  70. Yang Y, Simpson D. 2010. Unified computational methods for regression analysis of zero-inflated and bound-inflated data. Comput. Stat. Data Anal. 54:1525–34 [Google Scholar]
  71. Yau KK, Wang K, Lee AH. 2003. Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros. Biometrical J. 45:437–52 [Google Scholar]
  72. Yiu S, Farewell VT, Tom BDM. 2016. Exploring the existence of a stayer population with mover- stayer counting process models: application to joint damage in psoriatic arthritis. J. R. Stat. Soc. C. doi:10.1111/rssc.12187. In press
  73. Young PJ, Weeden S, Kirwan JR. 1999. The analysis of a bivariate multi-state Markov transition model for rheumatoid arthritis with an incomplete disease history. Stat. Med. 18:1677–90 [Google Scholar]
  74. Zhang M, Strawderman RL, Cowen ME, Wells MT. 2006. Bayesian inference for a two-part hierarchical model: an application to profiling providers in managed health care. J. Am. Stat. Assoc. 101:934–45 [Google Scholar]
/content/journals/10.1146/annurev-statistics-060116-054131
Loading
/content/journals/10.1146/annurev-statistics-060116-054131
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error