Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable.[1][2][3] Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects.[1][4][5][6] Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence[7][8] such as content validity and criterion validity.[9][10]
Construct validity is the appropriateness of inferences made on the basis of observations or measurements (often test scores), specifically whether a test can reasonably be considered to reflect the intended construct. Constructs are abstractions that are deliberately created by researchers in order to conceptualize the latent variable, which is correlated with scores on a given measure (although it is not directly observable). Construct validity examines the question: Does the measure behave like the theory says a measure of that construct should behave?
Construct validity is essential to the perceived overall validity of the test. Construct validity is particularly important in the social sciences, psychology, psychometrics and language studies.
Psychologists such as Samuel Messick (1998) have pushed for a unified view of construct validity "...as an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores..."[11] While Messick's views are popularized in educational measurement and originated in a career around explaining validity in the context of the testing industry, a definition more in line with foundational psychological research, supported by data-driven empirical studies that emphasize statistical and causal reasoning was given by (Borsboom et al., 2004).[12]
Key to construct validity are the theoretical ideas behind the trait under consideration, i.e. the concepts that organize how aspects of personality, intelligence, etc. are viewed.[13] Paul Meehl states that, "The best construct is the one around which we can build the greatest number of inferences, in the most direct fashion."[1]
Scale purification, i.e. "the process of eliminating items from multi-item scales" (Wieland et al., 2017) can influence construct validity. A framework presented by Wieland et al. (2017) highlights that both statistical and judgmental criteria need to be taken under consideration when making scale purification decisions.[14]