The Reliability of Clinical Diagnoses: State of the Art

Helena Chmura Kraemer

doi:10.1146/annurev-clinpsy-032813-153739

Annual Review of Clinical Psychology

Volume 10, 2014

Review Article

Free

The Reliability of Clinical Diagnoses: State of the Art

Helena Chmura Kraemer¹
View Affiliations Hide Affiliations

Affiliations: Department of Psychiatry and Behavioral Sciences, Stanford University (Emerita), Palo Alto, California 94301; and Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania 15213; email: [email protected]
Vol. 10:111-130 (Volume publication date March 2014) https://doi.org/10.1146/annurev-clinpsy-032813-153739
First published as a Review in Advance on January 02, 2014
© Annual Reviews

Abstract

Reliability of clinical diagnosis is essential for good clinical decision making as well as productive clinical research. The current review emphasizes the distinction between a disorder and a diagnosis and between validity and reliability of diagnoses, and the relationships that exist between them. What is crucial is that reliable diagnoses are essential to establishing valid diagnoses. The present review discusses the theoretical background underlying the evaluation of diagnoses, possible designs of reliability studies, estimation of the reliability coefficient, the standards for assessment of reliability, and strategies for improving reliability without compromising validity.

Keyword(s): design, disorder, kappa, validity

Article metrics loading...

/content/journals/10.1146/annurev-clinpsy-032813-153739

2014-03-28

2024-04-26

Full text loading...

/deliver/fulltext/clinpsy/10/1/annurev-clinpsy-032813-153739.html?itemId=/content/journals/10.1146/annurev-clinpsy-032813-153739&mimeType=html&fmt=ahah

Literature Cited

Algina J. 1978. Comment on Bartko's “On various intraclass correlation reliability coefficients.”. Psychol. Bull. 85:135–38 [Google Scholar]
Am. Psychiatr. Assoc 1980. Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: Am. Psychiatr. Publ, 3rd ed..
Am. Psychiatr. Assoc 1994. Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: Am. Psychiatr. Publ, 4th ed..
Am. Psychiatr. Assoc 2013. Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: Am. Psychiatr. Publ, 5th ed..
Bartko JJ. 1976. On various intraclass correlation reliability coefficients. Psychol. Bull. 83:762–65 [Google Scholar]
Berkson J. 1946. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull. 2:47–53 [Google Scholar]
Berkson J. 1955. The statistical study of association between smoking and lung cancer. Proc. Staff Meet. Mayo Clin. 30:56–60 [Google Scholar]
Berry KF, Johnston JE, Mielke PW Jr. 2005. Exact and resampling probability values for weighted kappa. Psychol. Rep. 96:243–52 [Google Scholar]
Bloch DA, Kraemer HC. 1989. 2×2 kappa coefficients: measures of agreement or association. Biometrics 45:269–87 [Google Scholar]
Brown GW. 1976. Berkson fallacy revisited: spurious conclusions from patient surveys. Am. J. Dis. Child. 130:56–60 [Google Scholar]
Brown W. 1910. Some experimental results in the correlation of mental abilities. Br. J. Psychol. 3:296–322 [Google Scholar]
Clarke DE, Narrow WE, Regier DA, Kuramoto SJ, Kupfer DJ. et al. 2013. DSM-5 field trials in the United States and Canada, part I: study design, sampling strategy, implementation, and analytic approaches. Am. J. Psychiatry 170:43–58 [Google Scholar]
Cohen J. 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70:213–29 [Google Scholar]
Detre KM, Wright E, Murphy ML, Takaro T. 1975. Observer agreement in evaluating coronary angiograms. Circulation 52:979–86 [Google Scholar]
Donner A, Bull S. 1983. Inferences concerning a common intraclass correlation. Biometrics 39:771–75 [Google Scholar]
Donner A, Wells G. 1986. A comparison of confidence interval methods for the intraclass correlation coefficient. Biometrics 42:401–12 [Google Scholar]
Efron B. 1988. Bootstrap confidence intervals: good or bad?. Psychol. Bull. 104:293–96 [Google Scholar]
Efron B, Gong G. 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37:36–48 [Google Scholar]
Efron B, Tibshirani R. 1995. Computer-Intensive Statistical Methods Stanford, CA: Div. Biostat. Stanford Univ.
Elwood RW. 1993. Psychological tests and clinical discriminations: beginning to address the base rate problem. Clin. Psychol. Rev. 13:409–19 [Google Scholar]
Finney DJ. 1994. On biometric language and its abuses. Biom. Bull. 11:2–4 [Google Scholar]
Fleiss JL. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76:378–82 [Google Scholar]
Fleiss JL. 1981. Statistical Methods For Rates and Proportions New York: Wiley
Fleiss JL, Cicchetti DV. 1978. Inference about weighted kappa in the non-null case. Appl. Psychol. Meas. 2:113–17 [Google Scholar]
Fleiss JL, Cohen J, Everitt BS. 1969. Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72:323–27 [Google Scholar]
Frances A. 2013. Saving Normal: An Insider's Revolt Against Out-of-Control Psychiatric Diagnosis, DSM-5, Big Pharma, and the Medicalization of Ordinary Life New York: William Morrow
Greenberg G. 2013. The Book of Woe: The DSM and the Unmaking of Psychiatry New York: Blue Rider Press
Gross RT, Spiker D, Haynes CW. 1997. Helping Low Birth Weight, Premature Babies Stanford, CA: Stanford Univ. Press
Koran LM. 1975a. The reliability of clinical methods, data and judgments, part 1. N. Engl. J. Med. 293:642–46 [Google Scholar]
Koran LM. 1975b. The reliability of clinical methods, data and judgments, part 2. N. Engl. J. Med. 293:695–701 [Google Scholar]
Kraemer HC. 1979. Ramifications of a population model for k as a coefficient of reliability. Psychometrika 44:461–72 [Google Scholar]
Kraemer HC. 1980. Extensions of the kappa coefficient. Biometrics 36:207–16 [Google Scholar]
Kraemer HC. 1992a. How many raters? Toward the most reliable diagnostic consensus. Stat. Med. 11:317–31 [Google Scholar]
Kraemer HC. 1992b. Evaluating Medical Tests: Objective and Quantitative Guidelines Newbury Park, CA: Sage
Kraemer HC. 2013. Validity and psychiatric diagnosis. Arch. Gen. Psychiatry 70:138–39 [Google Scholar]
Kraemer HC, Kupfer DJ, Clarke DE, Narrow WE, Regier DA. 2012. DSM-5: How reliable is reliable enough?. Am. J. Psychiatry 169:13–15 [Google Scholar]
Landis JR, Koch GG. 1977. The measurement of observer agreement for categorical data. Biometrics 33:159–74 [Google Scholar]
Lord FM, Novick MR. 1968. Statistical Theories of Mental Test Scores Reading, MA: Addison-Wesley
Ramasundarahettige CF, Donner A, Zhou GY. 2009. Confidence interval construction for a difference between two dependent intraclass correlation coefficients. Stat. Med. 28:1041–53 [Google Scholar]
Regier DA, Narrow WE, Clarke DE, Kraemer HC, Kuramoto SJ. et al. 2013. DSM-5 field trials in the United States and Canada, part II: test-retest reliability of selected categorical diagnoses. Am. J. Psychiatry 170:59–70 [Google Scholar]
Robins LN, Barrett JE. 1989. The Validity of Psychiatric Diagnosis New York: Raven
Rothery P. 1979. A nonparametric measure of intraclass correlation. Biometrika 66:629–39 [Google Scholar]
Shrout PE, Fleiss JL. 1979. Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86:420–28 [Google Scholar]
Spearman C. 1910. Correlation calculated from faulty data. Br. J. Psychol. 3:271–95 [Google Scholar]
Spitzer RL, Forman JB, Nee J. 1979. DSM-III field trials: I. Initial interrater diagnostic reliability. Am. J. Psychiatry 136:815–20 [Google Scholar]
Spitznagel EL, Helzer JE. 1985. A proposed solution to the base rate problem in the kappa statistic. Arch. Gen. Psychiatry 42:725–28 [Google Scholar]
Veiel HOF. 1988. Base-rates, cut-points, and interaction effects: the problem with dichotomized continuous variables. Psychol. Med. 18:703–10 [Google Scholar]

/content/journals/10.1146/annurev-clinpsy-032813-153739

The Reliability of Clinical Diagnoses: State of the Art

Annual Review of Clinical Psychology 10, 111 (2014); https://doi.org/10.1146/annurev-clinpsy-032813-153739

/content/journals/10.1146/annurev-clinpsy-032813-153739

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ecological Momentary Assessment
  
  Saul Shiffman, Arthur A. Stone, and Michael R. Hufford
  
  Vol. 4 (2008), pp. 1–32
- Network Analysis: An Integrative Approach to the Structure of Psychopathology
  
  Denny Borsboom, and Angélique O.J. Cramer
  
  Vol. 9 (2013), pp. 91–121
- Stress and Depression
  
  Constance Hammen
  
  Vol. 1 (2005), pp. 293–319
- Group-Based Trajectory Modeling in Clinical Research
  
  Daniel S. Nagin, and Candice L. Odgers
  
  Vol. 6 (2010), pp. 109–138
- Mediators and Mechanisms of Change in Psychotherapy Research
  
  Alan E. Kazdin
  
  Vol. 3 (2007), pp. 1–27
- Cognition and Depression: Current Status and Future Directions
  
  Ian H. Gotlib, and Jutta Joormann
  
  Vol. 6 (2010), pp. 285–312
- Depression in Older Adults
  
  Amy Fiske, Julie Loebach Wetherell, and Margaret Gatz
  
  Vol. 5 (2009), pp. 363–389
- Cognitive Vulnerability to Emotional Disorders
  
  Andrew Mathews, and Colin MacLeod
  
  Vol. 1 (2005), pp. 167–195
- Motivational Interviewing
  
  Jennifer Hettema, Julie Steele, and William R. Miller
  
  Vol. 1 (2005), pp. 91–111
- Stress and Health: Psychological, Behavioral, and Biological Determinants
  
  Neil Schneiderman, Gail Ironson, and Scott D. Siegel
  
  Vol. 1 (2005), pp. 607–628
More Less

Annual Review of Clinical Psychology

Volume 10, 2014

Review Article

Free

The Reliability of Clinical Diagnoses: State of the Art

Abstract

Most Read This Month

Most Cited Most Cited RSS feed