1932

Abstract

Psychological measurement is at the heart of organizational research. I review recent practices in the area of measurement development and evaluation, detailing best practice recommendations in both of these areas. Throughout the article, I stress that theory and discovery should guide scale development and that statistical tools, although they play a crucial role, should be chosen to best evaluate the theoretical underpinnings of scales as well as to best promote discovery. I review all stages of scale development and evaluation, ranging from construct specification and item writing, to scale revision. Different statistical frameworks are considered, including classical test theory, exploratory factor analysis, confirmatory factor analysis, and item response theory, and I encourage readers to consider how best to use each of these tools to capitalize on each approach's particular strengths.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-orgpsych-012119-044957
2020-01-21
2024-10-07
Loading full text...

Full text loading...

/deliver/fulltext/orgpsych/7/1/annurev-orgpsych-012119-044957.html?itemId=/content/journals/10.1146/annurev-orgpsych-012119-044957&mimeType=html&fmt=ahah

Literature Cited

  1. Brown A, Maydeu-Olivares A. 2013. How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychol. Methods 18:136–52
    [Google Scholar]
  2. Buckles S, Walstad WB. 2008. The national assessment of educational progress in economics: test framework, content specifications, and results. J. Econ. Educ. 39:1100–6
    [Google Scholar]
  3. Cao M, Drasgow F, Cho S 2015. Developing ideal intermediate personality items for the ideal point model. Organ. Res. Methods 18:2252–75
    [Google Scholar]
  4. Chang EC, Maydeu-Olivares A, D'Zurilla TJ 1997. Optimism and pessimism as partially independent constructs: relationship to positive and negative affectivity and psychological well-being. Personal. Individ. Differ. 23:3433–40
    [Google Scholar]
  5. Clark LA, Watson D. 2019. Constructing validity: new developments in creating objective measuring instruments. Psychol. Assess. In press
    [Google Scholar]
  6. Couper MP, Tourangeau R, Conrad FG, Singer E 2006. Evaluating the effectiveness of visual analog scales: a web experiment. Soc. Sci. Comput. Rev. 24:2227–45
    [Google Scholar]
  7. Crawford JR, Henry JD. 2004. The Positive and Negative Affect Schedule (PANAS): construct validity, measurement properties and normative data in a large non‐clinical sample. Br. J. Clin. Psychol. 43:3245–65
    [Google Scholar]
  8. Dalal DK, Carter NT, Lake CJ 2014. Middle response scale options are inappropriate for ideal point scales. J. Bus. Psychol. 29:3463–78
    [Google Scholar]
  9. DeVellis RF. 2003. Scale Development: Theory and Applications Thousand Oaks, CA: SAGE. , 2nd ed..
    [Google Scholar]
  10. Diamantopoulos A, Siguaw JA. 2006. Formative versus reflective indicators in organizational measure development: a comparison and empirical illustration. Br. J. Manag. 17:4263–82
    [Google Scholar]
  11. Drasgow F, Chernyshenko OS, Stark S 2010. 75 years after Likert: Thurstone was right!. Ind. Organ. Psychol. 3:4465–76
    [Google Scholar]
  12. Dwight SA, Donovan JJ. 2003. Do warnings not to fake reduce faking. ? Hum. Perform. 16:11–23
    [Google Scholar]
  13. Ellis BB, Mead AD. 2002. Item analysis: theory and practice using classical and modern test theory. Blackwell Handbooks of Research Methods in Psychology: Handbook of Research Methods in Industrial and Organizational Psychology SG Rogelberg 324–43 Malden, MA: Blackwell Publ.
    [Google Scholar]
  14. Embretson SE, Reise SP. 2000. Item Response Theory for Psychologists Mahwah, NJ: Lawrence Erlbaum
    [Google Scholar]
  15. Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ 1999. Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 4:3272–99
    [Google Scholar]
  16. Fives H, DiDonato-Barnes N. 2013. Classroom test construction: the power of a table of specifications. Pract. Assess. Res. Eval. 18:317
    [Google Scholar]
  17. Gierl MJ, Haladyna TM, eds. 2013. Automatic Item Generation: Theory and Practice New York: Routledge
    [Google Scholar]
  18. Gorsuch RL. 1983. Factor Analysis Hillsdale, NJ: Lawrence Erlbaum Assoc.
    [Google Scholar]
  19. Gynther MD, Burkhart BR, Hovanitz C 1979. Do face-valid items have more predictive validity than subtle items? The case of the MMPI Pd scale. J. Consult. Clin. Psychol. 47:2295–300
    [Google Scholar]
  20. Haladyna TM, Rodriguez MC. 2013. Developing and Validating Test Items New York: Routledge
    [Google Scholar]
  21. Hayton JC, Allen DG, Scarpello V 2004. Factor retention decisions in exploratory factor analysis: a tutorial on parallel analysis. Organ. Res. Methods 7:2191–205
    [Google Scholar]
  22. Hernández A, Drasgow F, González-Romá V 2004. Investigating the functioning of a middle category by means of a mixed-measurement model. J. Appl. Psychol. 89:4687–99
    [Google Scholar]
  23. Highhouse S, Nye CD, Zhang DC 2019. Dark motives and elective use of brainteaser interview questions. Appl. Psychol. 68:2311–40
    [Google Scholar]
  24. Hollrah JL, Schlottmann RS, Scott AB, Brunetti DG 1995. Validity of the MMPI subtle items. J. Personal. Assess. 65:2278–99
    [Google Scholar]
  25. Jackson DL, Gillaspy JA Jr, Purc-Stephenson R 2009. Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychol. Methods 14:16–23
    [Google Scholar]
  26. Landers RN, Sackett PR, Tuzinski KA 2011. Retesting after initial failure, coaching rumors, and warnings against faking in online personality measures for selection. J. Appl. Psychol. 96:120210
    [Google Scholar]
  27. MacCallum RC, Widaman KF, Zhang S, Hong S 1999. Sample size in factor analysis. Psychol. Methods 4:184–99
    [Google Scholar]
  28. Malamut A, Van Rooy DL, Davis VA 2011. Bridging the digital divide across a global business: development of a technology‐enabled selection system for low‐literacy applicants. Technology‐Enhanced Assessment of Talent NT Tippins, S Adler 267–92 San Francisco, CA: Jossey Bass
    [Google Scholar]
  29. McCloy RA, Heggestad ED, Reeve CL 2005. A silk purse from the sow's ear: retrieving normative information from multidimensional forced-choice items. Organ. Res. Methods 8:2222–48
    [Google Scholar]
  30. Min H, Zickar M, Yankov G 2018. Understanding item parameters in personality scales: an explanatory item response modeling approach. Personal. Individ. Differ. 128:1–6
    [Google Scholar]
  31. Nunnally JC. 1978. Psychometric Theory Hillsdale, NJ: McGraw-Hill. , 2nd ed..
    [Google Scholar]
  32. Nye CD, Drasgow F. 2011. Assessing goodness of fit: Simple rules of thumb simply do not work. Organ. Res. Methods 14:3548–70
    [Google Scholar]
  33. Olson-Buchanan JB, Drasgow F, Moberg PJ, Mead AD, Keenan PA, Donovan MA 1998. Interactive video assessment of conflict resolution skills. Pers. Psychol. 51:11–24
    [Google Scholar]
  34. Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M et al. 2015. Automatic personality assessment through social media language. J. Personal. Soc. Psychol. 108:6934–52
    [Google Scholar]
  35. Russell SS, Spitzmüller C, Lin LF, Stanton JM, Smith PC, Ironson GH 2004. Shorter can also be better: the abridged job in general scale. Educ. Psychol. Meas. 64:5878–93
    [Google Scholar]
  36. Schriesheim CA, Eisenbach RJ. 1995. An exploratory and confirmatory factor-analytic investigation of item wording effects on the obtained factor structures of survey questionnaire measures. J. Manag. 21:61177–93
    [Google Scholar]
  37. Schwarz N. 1999. Self-reports: how the questions shape the answers. Am. Psychol. 54:293–105
    [Google Scholar]
  38. Shaffer JA, DeGeest D, Li A 2016. Tackling the problem of construct proliferation: a guide to assessing the discriminant validity of conceptually related constructs. Organ. Res. Methods 19:180–110
    [Google Scholar]
  39. Sliter KA, Zickar MJ. 2014. An IRT examination of the psychometric functioning of negatively worded personality items. Educ. Psychol. Meas. 74:2214–26
    [Google Scholar]
  40. Spector PE, Rogelberg SG, Ryan AM, Schmitt N, Zedeck S 2014. Moving the pendulum back to the middle: reflections on and introduction to the inductive research special issue of Journal of Business and Psychology. J. Bus. Psychol. 29:4499–502
    [Google Scholar]
  41. Stark S, Chernyshenko OS, Drasgow F, White LA 2012. Adaptive testing with multidimensional pairwise preference items: improving the efficiency of personality and other noncognitive assessments. Organ. Res. Methods 15:3463–87
    [Google Scholar]
  42. Swain SD, Weathers D, Niedrich RW 2008. Assessing three sources of misresponse to reversed Likert items. J. Mark. Res. 45:1116–31
    [Google Scholar]
  43. Thurstone LL. 1927. A law of comparative judgment. Psychol. Rev. 34:4273–86
    [Google Scholar]
  44. Vergauwe J, Wille B, Hofmans J, Kaiser RB, Fruyt FD 2017. The too little/too much scale: a new rating format for detecting curvilinear effects. Organ. Res. Methods 20:3518–44
    [Google Scholar]
  45. von Davier M, Carstensen CH 2007. Multivariate and Mixture Distribution Rasch Models: Extensions and Applications New York: Springer
    [Google Scholar]
  46. Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ 2000. Computerized Adaptive Testing: A Primer New York: Routledge
    [Google Scholar]
  47. Weijters B, Baumgartner H. 2012. Misresponse to reversed and negated items in surveys: a review. J. Mark. Res. 49:5737–47
    [Google Scholar]
  48. Weijters B, Baumgartner H, Schillewaert N 2013. Reversed item bias: an integrative model. Psychol. Methods 18:320–34
    [Google Scholar]
  49. Winkielman P, Knäuper B, Schwarz N 1998. Looking back at anger: Reference periods change the interpretation of emotion frequency questions. J. Personal. Soc. Psychol. 75:3719–28
    [Google Scholar]
  50. Zhang Y, Waldman DA, Han YL, Li XB 2015. Paradoxical leader behaviors in people management: antecedents and consequences. Acad. Manag. J. 58:2538–66
    [Google Scholar]
  51. Zickar MJ. 2012. A review of recent advances in item response theory. Research in Personnel and Human Resources Management JJ Martocchio, A Joshi, H Liao 145–76 Bingley, UK: Emerald Group Publ.
    [Google Scholar]
  52. Zickar MJ, Gibby RE. 2006. A history of faking and socially desirable responding on personality tests. A Closer Examination of Applicant Faking Behavior RE Griffith, MH Peterson 21–42 Charlotte, NC: Information Age Publ.
    [Google Scholar]
  53. Zickar MJ, Gibby RE, Robie C 2004. Uncovering faking samples in applicant, incumbent, and experimental data sets: an application of mixed-model item response theory. Organ. Res. Methods 7:2168–90
    [Google Scholar]
  54. Zickar MJ, Ury KL. 2002. Developing an interpretation of item parameters for personality items: content correlates of parameter estimates. Educ. Psychol. Meas. 62:119–31
    [Google Scholar]
/content/journals/10.1146/annurev-orgpsych-012119-044957
Loading
/content/journals/10.1146/annurev-orgpsych-012119-044957
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error