Construct validity

Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable.[1][2][3] Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects.[1][4][5][6] Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence[7][8] such as content validity and criterion validity.[9][10]

Construct validity is the appropriateness of inferences made on the basis of observations or measurements (often test scores), specifically whether a test can reasonably be considered to reflect the intended construct. Constructs are abstractions that are deliberately created by researchers in order to conceptualize the latent variable, which is correlated with scores on a given measure (although it is not directly observable). Construct validity examines the question: Does the measure behave like the theory says a measure of that construct should behave?

Construct validity is essential to the perceived overall validity of the test. Construct validity is particularly important in the social sciences, psychology, psychometrics and language studies.

Psychologists such as Samuel Messick (1998) have pushed for a unified view of construct validity "...as an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores..."[11] While Messick's views are popularized in educational measurement and originated in a career around explaining validity in the context of the testing industry, a definition more in line with foundational psychological research, supported by data-driven empirical studies that emphasize statistical and causal reasoning was given by (Borsboom et al., 2004).[12]

Key to construct validity are the theoretical ideas behind the trait under consideration, i.e. the concepts that organize how aspects of personality, intelligence, etc. are viewed.[13] Paul Meehl states that, "The best construct is the one around which we can build the greatest number of inferences, in the most direct fashion."[1]

Scale purification, i.e. "the process of eliminating items from multi-item scales" (Wieland et al., 2017) can influence construct validity. A framework presented by Wieland et al. (2017) highlights that both statistical and judgmental criteria need to be taken under consideration when making scale purification decisions.[14]

  1. ^ a b c Cronbach, L. J.; Meehl, P. E. (1955). "Construct validity in psychological tests". Psychological Bulletin. 52 (4): 281–302. doi:10.1037/h0040957. hdl:11299/184279. PMID 13245896. S2CID 5312179.
  2. ^ Cook T. D.; Campbell D. T. (1979). Quasi-experimentation. Boston: Houghton Mifflin.
  3. ^ Sjøberg, D. I. K.; Bergersen, G. R. (2022). "Construct validity in software engineering". IEEE Transactions on Software Engineering. 49 (3): 1374–1396. doi:10.1109/TSE.2022.3176725.
  4. ^ Kelley, Truman Lee (1927). Interpretation of educational measurements. New York: World Book.
  5. ^ Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.
  6. ^ Polit DF Beck CT (2012). Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. Philadelphia, USA: Wolters Klower Health, Lippincott Williams & Wilkins
  7. ^ Messick, S. (1995). "Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning". American Psychologist. 50 (9): 741–749. doi:10.1037/0003-066x.50.9.741.
  8. ^ Schotte, C. K. W.; Maes, M.; Cluydts, R.; De Doncker, D.; Cosyns, P. (1997). "Construct validity of the Beck Depression Inventory in a depressive population". Journal of Affective Disorders. 46 (2): 115–125. doi:10.1016/s0165-0327(97)00094-3. PMID 9479615.
  9. ^ Guion, R. M. (1980). "On trinitarian doctrines of validity". Professional Psychology. 11 (3): 385–398. doi:10.1037/0735-7028.11.3.385.
  10. ^ Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.
  11. ^ Messick, Samuel (1998). "Test validity: A matter of consequence". Social Indicators Research. 45 (1–3): 35–44. doi:10.1023/a:1006964925094. S2CID 142684085.
  12. ^ Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The Concept of Validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
  13. ^ Pennington, Donald (2003). Essential Personality. Arnold. ISBN 978-0-340-76118-2.
  14. ^ Wieland, A., Durach, C.F., Kembro, J. & Treiblmaier, H. (2017), Statistical and judgmental criteria for scale purification, Supply Chain Management, Vol. 22, No. 4, https://doi.org/10.1108/SCM-07-2016-0230