Causal graph

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs (also known as path diagrams, causal Bayesian networks or DAGs) are probabilistic graphical models used to encode assumptions about the data-generating process.

Causal graphs can be used for communication and for inference. They are complementary to other forms of causal reasoning, for instance using causal equality notation. As communication devices, the graphs provide formal and transparent representation of the causal assumptions that researchers may wish to convey and defend. As inference tools, the graphs enable researchers to estimate effect sizes from non-experimental data,[1][2][3][4][5] derive testable implications of the assumptions encoded,[1][6][7][8] test for external validity,[9] and manage missing data[10] and selection bias.[11]

Causal graphs were first used by the geneticist Sewall Wright[12] under the rubric "path diagrams". They were later adopted by social scientists[13][14][15][16][17] and, to a lesser extent, by economists.[18] These models were initially confined to linear equations with fixed parameters. Modern developments have extended graphical models to non-parametric analysis, and thus achieved a generality and flexibility that has transformed causal analysis in computer science, epidemiology,[19] and social science.[20] Recent advances include the development of large-scale causality graphs, such as CauseNet, which compiles over 11 million causal relations extracted from web sources to support causal question answering and reasoning.[21]

  1. ^ a b Pearl, Judea (2000). Causality. Cambridge, MA: MIT Press. ISBN 9780521773621.
  2. ^ Tian, Jin; Pearl, Judea (2002). "A general identification condition for causal effects". Proceedings of the Eighteenth National Conference on Artificial Intelligence. ISBN 978-0-262-51129-2.
  3. ^ Shpitser, Ilya; Pearl, Judea (2008). "Complete Identification Methods for the Causal Hierarchy" (PDF). Journal of Machine Learning Research. 9: 1941–1979.
  4. ^ Huang, Y.; Valtorta, M. (2006). Identifiability in Causal Bayesian Networks: A Sound and Complete Algorithm (PDF).
  5. ^ Bareinboim, Elias; Pearl, Judea (2012). "Causal Inference by Surrogate Experiments: z-Identifiability". Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. arXiv:1210.4842. Bibcode:2012arXiv1210.4842B. ISBN 978-0-9749039-8-9.
  6. ^ Tian, Jin; Pearl, Judea (2002). "On the Testable Implications of Causal Models with Hidden Variables". Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. pp. 519–27. arXiv:1301.0608. Bibcode:2013arXiv1301.0608T. ISBN 978-1-55860-897-9.
  7. ^ Shpitser, Ilya; Pearl, Judea (2008). "Complete Identification Methods for the Causal Hierarchy" (PDF). Journal of Machine Learning Research. 9 (64): 1941–1979. ISSN 1533-7928. Retrieved 2024-08-11.
  8. ^ Chen, Bryant; Pearl, Judea (2014). "Testable Implications of Linear Structural Equation Models". Proceedings of the AAAI Conference on Artificial Intelligence. 28. doi:10.1609/aaai.v28i1.9065. S2CID 1612893.
  9. ^ Bareinmboim, Elias; Pearl, Judea (2014). "External Validity: From do-calculus to Transportability across Populations". Statistical Science. 29 (4): 579–595. arXiv:1503.01603. doi:10.1214/14-sts486. S2CID 5586184.
  10. ^ Mohan, Karthika; Pearl, Judea; Tian, Jin (2013). "Graphical Models for Inference with Missing Data" (PDF). Advances in Neural Information Processing Systems.
  11. ^ Bareinboim, Elias; Tian, Jin; Pearl, Judea (2014). "Recovering from Selection Bias in Causal and Statistical Inference". Proceedings of the AAAI Conference on Artificial Intelligence. 28. doi:10.1609/aaai.v28i1.9074.
  12. ^ Wright, S. (1921). "Correlation and causation". Journal of Agricultural Research. 20: 557–585.
  13. ^ Blalock, H. M. (1960). "Correlational analysis and causal inferences". American Anthropologist. 62 (4): 624–631. doi:10.1525/aa.1960.62.4.02a00060.
  14. ^ Duncan, O. D. (1966). "Path analysis: Sociological examples". American Journal of Sociology. 72: 1–16. doi:10.1086/224256. S2CID 59428866.
  15. ^ Duncan, O. D. (1976). "Introduction to structural equation models". American Journal of Sociology. 82 (3): 731–733. doi:10.1086/226377.
  16. ^ Jöreskog, K. G. (1969). "A general approach to confirmatory maximum likelihood factor analysis". Psychometrika. 34 (2): 183–202. doi:10.1007/bf02289343. S2CID 186236320.
  17. ^ Goldberger, A. S. (1972). "Structural equation models in the social sciences". Econometrica. 40 (6): 979–1001. doi:10.2307/1913851. JSTOR 1913851.
  18. ^ White, Halbert; Chalak, Karim; Lu, Xun (2011). "Linking granger causality and the pearl causal model with settable systems" (PDF). Causality in Time Series Challenges in Machine Learning. 5.
  19. ^ Rothman, Kenneth J.; Greenland, Sander; Lash, Timothy (2008). Modern epidemiology. Lippincott Williams & Wilkins. ISBN 978-0-7817-5564-1.
  20. ^ Morgan, S. L.; Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press. doi:10.1017/cbo9781107587991. ISBN 978-1-107-06507-9.
  21. ^ Heindorf, Stefan; Scholten, Yan; Wachsmuth, Henning; Ngonga Ngomo, Axel-Cyrille; Potthast, Martin (2020). "CauseNet: Towards a Causality Graph Extracted from the Web". Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM. ACM.