Measurement Development and Evaluation

Michael J. Zickar

doi:10.1146/annurev-orgpsych-012119-044957

Annual Review of Organizational Psychology and Organizational Behavior

Volume 7, 2020

Review Article

Free

Measurement Development and Evaluation

Michael J. Zickar¹
View Affiliations Hide Affiliations

Affiliations: Department of Psychology, Bowling Green State University, Bowling Green, Ohio 43402, USA; email: [email protected]
Vol. 7:213-232 (Volume publication date January 2020) https://doi.org/10.1146/annurev-orgpsych-012119-044957
First published as a Review in Advance on October 24, 2019
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

Psychological measurement is at the heart of organizational research. I review recent practices in the area of measurement development and evaluation, detailing best practice recommendations in both of these areas. Throughout the article, I stress that theory and discovery should guide scale development and that statistical tools, although they play a crucial role, should be chosen to best evaluate the theoretical underpinnings of scales as well as to best promote discovery. I review all stages of scale development and evaluation, ranging from construct specification and item writing, to scale revision. Different statistical frameworks are considered, including classical test theory, exploratory factor analysis, confirmatory factor analysis, and item response theory, and I encourage readers to consider how best to use each of these tools to capitalize on each approach's particular strengths.

Keyword(s): factor analysis, item response theory, psychological measurement, psychometrics, scale development, test theory

Article metrics loading...

/content/journals/10.1146/annurev-orgpsych-012119-044957

2020-01-21

2024-04-28

Full text loading...

/deliver/fulltext/orgpsych/7/1/annurev-orgpsych-012119-044957.html?itemId=/content/journals/10.1146/annurev-orgpsych-012119-044957&mimeType=html&fmt=ahah

Literature Cited

Brown A, Maydeu-Olivares A. 2013. How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychol. Methods 18:136–52
[Google Scholar]
Buckles S, Walstad WB. 2008. The national assessment of educational progress in economics: test framework, content specifications, and results. J. Econ. Educ. 39:1100–6
[Google Scholar]
Cao M, Drasgow F, Cho S 2015. Developing ideal intermediate personality items for the ideal point model. Organ. Res. Methods 18:2252–75
[Google Scholar]
Chang EC, Maydeu-Olivares A, D'Zurilla TJ 1997. Optimism and pessimism as partially independent constructs: relationship to positive and negative affectivity and psychological well-being. Personal. Individ. Differ. 23:3433–40
[Google Scholar]
Clark LA, Watson D. 2019. Constructing validity: new developments in creating objective measuring instruments. Psychol. Assess. In press
[Google Scholar]
Couper MP, Tourangeau R, Conrad FG, Singer E 2006. Evaluating the effectiveness of visual analog scales: a web experiment. Soc. Sci. Comput. Rev. 24:2227–45
[Google Scholar]
Crawford JR, Henry JD. 2004. The Positive and Negative Affect Schedule (PANAS): construct validity, measurement properties and normative data in a large non‐clinical sample. Br. J. Clin. Psychol. 43:3245–65
[Google Scholar]
Dalal DK, Carter NT, Lake CJ 2014. Middle response scale options are inappropriate for ideal point scales. J. Bus. Psychol. 29:3463–78
[Google Scholar]
DeVellis RF. 2003. Scale Development: Theory and Applications Thousand Oaks, CA: SAGE. , 2nd ed..
Diamantopoulos A, Siguaw JA. 2006. Formative versus reflective indicators in organizational measure development: a comparison and empirical illustration. Br. J. Manag. 17:4263–82
[Google Scholar]
Drasgow F, Chernyshenko OS, Stark S 2010. 75 years after Likert: Thurstone was right!. Ind. Organ. Psychol. 3:4465–76
[Google Scholar]
Dwight SA, Donovan JJ. 2003. Do warnings not to fake reduce faking. ? Hum. Perform. 16:11–23
[Google Scholar]
Ellis BB, Mead AD. 2002. Item analysis: theory and practice using classical and modern test theory. Blackwell Handbooks of Research Methods in Psychology: Handbook of Research Methods in Industrial and Organizational Psychology SG Rogelberg 324–43 Malden, MA: Blackwell Publ.
[Google Scholar]
Embretson SE, Reise SP. 2000. Item Response Theory for Psychologists Mahwah, NJ: Lawrence Erlbaum
Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ 1999. Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 4:3272–99
[Google Scholar]
Fives H, DiDonato-Barnes N. 2013. Classroom test construction: the power of a table of specifications. Pract. Assess. Res. Eval. 18:31–7
[Google Scholar]
Gierl MJ, Haladyna TM, eds. 2013. Automatic Item Generation: Theory and Practice New York: Routledge
Gorsuch RL. 1983. Factor Analysis Hillsdale, NJ: Lawrence Erlbaum Assoc.
Gynther MD, Burkhart BR, Hovanitz C 1979. Do face-valid items have more predictive validity than subtle items? The case of the MMPI Pd scale. J. Consult. Clin. Psychol. 47:2295–300
[Google Scholar]
Haladyna TM, Rodriguez MC. 2013. Developing and Validating Test Items New York: Routledge
Hayton JC, Allen DG, Scarpello V 2004. Factor retention decisions in exploratory factor analysis: a tutorial on parallel analysis. Organ. Res. Methods 7:2191–205
[Google Scholar]
Hernández A, Drasgow F, González-Romá V 2004. Investigating the functioning of a middle category by means of a mixed-measurement model. J. Appl. Psychol. 89:4687–99
[Google Scholar]
Highhouse S, Nye CD, Zhang DC 2019. Dark motives and elective use of brainteaser interview questions. Appl. Psychol. 68:2311–40
[Google Scholar]
Hollrah JL, Schlottmann RS, Scott AB, Brunetti DG 1995. Validity of the MMPI subtle items. J. Personal. Assess. 65:2278–99
[Google Scholar]
Jackson DL, Gillaspy JA Jr, Purc-Stephenson R 2009. Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychol. Methods 14:16–23
[Google Scholar]
Landers RN, Sackett PR, Tuzinski KA 2011. Retesting after initial failure, coaching rumors, and warnings against faking in online personality measures for selection. J. Appl. Psychol. 96:1202–10
[Google Scholar]
MacCallum RC, Widaman KF, Zhang S, Hong S 1999. Sample size in factor analysis. Psychol. Methods 4:184–99
[Google Scholar]
Malamut A, Van Rooy DL, Davis VA 2011. Bridging the digital divide across a global business: development of a technology‐enabled selection system for low‐literacy applicants. Technology‐Enhanced Assessment of Talent NT Tippins, S Adler 267–92 San Francisco, CA: Jossey Bass
[Google Scholar]
McCloy RA, Heggestad ED, Reeve CL 2005. A silk purse from the sow's ear: retrieving normative information from multidimensional forced-choice items. Organ. Res. Methods 8:2222–48
[Google Scholar]
Min H, Zickar M, Yankov G 2018. Understanding item parameters in personality scales: an explanatory item response modeling approach. Personal. Individ. Differ. 128:1–6
[Google Scholar]
Nunnally JC. 1978. Psychometric Theory Hillsdale, NJ: McGraw-Hill. , 2nd ed..
Nye CD, Drasgow F. 2011. Assessing goodness of fit: Simple rules of thumb simply do not work. Organ. Res. Methods 14:3548–70
[Google Scholar]
Olson-Buchanan JB, Drasgow F, Moberg PJ, Mead AD, Keenan PA, Donovan MA 1998. Interactive video assessment of conflict resolution skills. Pers. Psychol. 51:11–24
[Google Scholar]
Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M et al. 2015. Automatic personality assessment through social media language. J. Personal. Soc. Psychol. 108:6934–52
[Google Scholar]
Russell SS, Spitzmüller C, Lin LF, Stanton JM, Smith PC, Ironson GH 2004. Shorter can also be better: the abridged job in general scale. Educ. Psychol. Meas. 64:5878–93
[Google Scholar]
Schriesheim CA, Eisenbach RJ. 1995. An exploratory and confirmatory factor-analytic investigation of item wording effects on the obtained factor structures of survey questionnaire measures. J. Manag. 21:61177–93
[Google Scholar]
Schwarz N. 1999. Self-reports: how the questions shape the answers. Am. Psychol. 54:293–105
[Google Scholar]
Shaffer JA, DeGeest D, Li A 2016. Tackling the problem of construct proliferation: a guide to assessing the discriminant validity of conceptually related constructs. Organ. Res. Methods 19:180–110
[Google Scholar]
Sliter KA, Zickar MJ. 2014. An IRT examination of the psychometric functioning of negatively worded personality items. Educ. Psychol. Meas. 74:2214–26
[Google Scholar]
Spector PE, Rogelberg SG, Ryan AM, Schmitt N, Zedeck S 2014. Moving the pendulum back to the middle: reflections on and introduction to the inductive research special issue of Journal of Business and Psychology. J. Bus. Psychol. 29:4499–502
[Google Scholar]
Stark S, Chernyshenko OS, Drasgow F, White LA 2012. Adaptive testing with multidimensional pairwise preference items: improving the efficiency of personality and other noncognitive assessments. Organ. Res. Methods 15:3463–87
[Google Scholar]
Swain SD, Weathers D, Niedrich RW 2008. Assessing three sources of misresponse to reversed Likert items. J. Mark. Res. 45:1116–31
[Google Scholar]
Thurstone LL. 1927. A law of comparative judgment. Psychol. Rev. 34:4273–86
[Google Scholar]
Vergauwe J, Wille B, Hofmans J, Kaiser RB, Fruyt FD 2017. The too little/too much scale: a new rating format for detecting curvilinear effects. Organ. Res. Methods 20:3518–44
[Google Scholar]
von Davier M, Carstensen CH 2007. Multivariate and Mixture Distribution Rasch Models: Extensions and Applications New York: Springer
Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ 2000. Computerized Adaptive Testing: A Primer New York: Routledge
Weijters B, Baumgartner H. 2012. Misresponse to reversed and negated items in surveys: a review. J. Mark. Res. 49:5737–47
[Google Scholar]
Weijters B, Baumgartner H, Schillewaert N 2013. Reversed item bias: an integrative model. Psychol. Methods 18:320–34
[Google Scholar]
Winkielman P, Knäuper B, Schwarz N 1998. Looking back at anger: Reference periods change the interpretation of emotion frequency questions. J. Personal. Soc. Psychol. 75:3719–28
[Google Scholar]
Zhang Y, Waldman DA, Han YL, Li XB 2015. Paradoxical leader behaviors in people management: antecedents and consequences. Acad. Manag. J. 58:2538–66
[Google Scholar]
Zickar MJ. 2012. A review of recent advances in item response theory. Research in Personnel and Human Resources Management JJ Martocchio, A Joshi, H Liao 145–76 Bingley, UK: Emerald Group Publ.
[Google Scholar]
Zickar MJ, Gibby RE. 2006. A history of faking and socially desirable responding on personality tests. A Closer Examination of Applicant Faking Behavior RE Griffith, MH Peterson 21–42 Charlotte, NC: Information Age Publ.
[Google Scholar]
Zickar MJ, Gibby RE, Robie C 2004. Uncovering faking samples in applicant, incumbent, and experimental data sets: an application of mixed-model item response theory. Organ. Res. Methods 7:2168–90
[Google Scholar]
Zickar MJ, Ury KL. 2002. Developing an interpretation of item parameters for personality items: content correlates of parameter estimates. Educ. Psychol. Meas. 62:119–31
[Google Scholar]

/content/journals/10.1146/annurev-orgpsych-012119-044957

Measurement Development and Evaluation

Annual Review of Organizational Psychology and Organizational Behavior 7, 213 (2020); https://doi.org/10.1146/annurev-orgpsych-012119-044957

/content/journals/10.1146/annurev-orgpsych-012119-044957

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Conservation of Resources in the Organizational Context: The Reality of Resources and Their Consequences
  
  Stevan E. Hobfoll, Jonathon Halbesleben, Jean-Pierre Neveu, and Mina Westman
  
  Vol. 5 (2018), pp. 103–128
- Burnout and Work Engagement: The JD–R Approach
  
  Arnold B. Bakker, Evangelia Demerouti, and Ana Isabel Sanz-Vergel
  
  Vol. 1 (2014), pp. 389–411
- Self-Determination Theory in Work Organizations: The State of a Science
  
  Edward L. Deci, Anja H. Olafsen, and Richard M. Ryan
  
  Vol. 4 (2017), pp. 19–43
- Psychological Safety: The History, Renaissance, and Future of an Interpersonal Construct
  
  Amy C. Edmondson, and Zhike Lei
  
  Vol. 1 (2014), pp. 23–43
- Employee Voice and Silence
  
  Elizabeth W. Morrison
  
  Vol. 1 (2014), pp. 173–197
- Psychological Capital: An Evidence-Based Positive Approach
  
  Fred Luthans, and Carolyn M. Youssef-Morgan
  
  Vol. 4 (2017), pp. 339–366
- How Technology Is Changing Work and Organizations
  
  Wayne F. Cascio, and Ramiro Montealegre
  
  Vol. 3 (2016), pp. 349–375
- Research on Workplace Creativity: A Review and Redirection
  
  Jing Zhou, and Inga J. Hoever
  
  Vol. 1 (2014), pp. 333–359
- The Psychology of Entrepreneurship
  
  Michael Frese, and Michael M. Gielnik
  
  Vol. 1 (2014), pp. 413–438
- Abusive Supervision
  
  Bennett J. Tepper, Lauren Simon, and Hee Man Park
  
  Vol. 4 (2017), pp. 123–152
More Less

Annual Review of Organizational Psychology and Organizational Behavior

Volume 7, 2020

Review Article

Free

Measurement Development and Evaluation

Abstract

Most Read This Month

Most Cited Most Cited RSS feed