Controlling for a variable

In causal models, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a confounder in, for example, an observational study or experiment.

When estimating the effect of explanatory variables on an outcome by regression, controlled-for variables are included as inputs in order to separate their effects from the explanatory variables.[1]

A limitation of controlling for variables is that a causal model is needed to identify important confounders (backdoor criterion is used for the identification). Without having one, a possible confounder might remain unnoticed. Another associated problem is that if a variable which is not a real confounder is controlled for, it may in fact make other variables (possibly not taken into account) become confounders while they were not confounders before. In other cases, controlling for a non-confounding variable may cause underestimation of the true causal effect of the explanatory variables on an outcome (e.g. when controlling for a mediator or its descendant).[2][3] Counterfactual reasoning mitigates the influence of confounders without this drawback.[3]

  1. ^ Frost, Jim. "A Tribute to Regression Analysis | Minitab". Retrieved 2015-08-04.
  2. ^ Streiner, David L (February 2016). "Control or overcontrol for covariates?". Evid Based Ment Health. 19 (1): 4–5. doi:10.1136/eb-2015-102294. PMC 10699339. PMID 26755716. S2CID 11155639.
  3. ^ a b Pearl, Judea; Mackenzie, Dana (2018). The Book of Why: The New Science of Cause and Effect. London: Allen Lane. ISBN 978-0-241-24263-6.