Bias (statistics)

Statistical bias, in the mathematical field of statistics, is a systematic tendency in which the methods used to gather data and generate statistics present an inaccurate, skewed or biased depiction of reality. Statistical bias exists in numerous stages of the data collection and analysis process, including: the source of the data, the methods used to collect the data, the estimator chosen, and the methods used to analyze the data. Data analysts can take various measures at each stage of the process to reduce the impact of statistical bias in their work. Understanding the source of statistical bias can help to assess whether the observed results are close to actuality. Issues of statistical bias has been argued to be closely linked to issues of statistical validity.^[1]

Statistical bias can have significant real world implications as data is used to inform decision making across a wide variety of processes in society. Data is used to inform lawmaking, industry regulation, corporate marketing and distribution tactics, and institutional policies in organizations and workplaces. Therefore, there can be significant implications if statistical bias is not accounted for and controlled. For example, if a pharmaceutical company wishes to explore the effect of a medication on the common cold but the data sample only includes men, any conclusions made from that data will be biased towards how the medication affects men rather than people in general. That means the information would be incomplete and not useful for deciding if the medication is ready for release in the general public. In this scenario, the bias can be addressed by broadening the sample. This sampling error is only one of the ways in which data can be biased.

Bias can be differentiated from other statistical mistakes such as accuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria. Other forms of human-based bias emerge in data collection as well such as response bias, in which participants give inaccurate responses to a question. Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously. Ideally, all factors are controlled and accounted for.

Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested), or from the phenomenon of random errors.^[2] The terms flaw or mistake are recommended to differentiate procedural errors from these specifically defined outcome-based terms.

^ Cole, Nancy S. (October 1981). "Bias in testing". American Psychologist. 36 (10): 1067–1077. doi:10.1037/0003-066X.36.10.1067. ISSN 1935-990X.
^ Popovic, Aleksandar; Huecker, Martin R. (June 23, 2023). "Study Bias". Stat Pearls. PMID 34662027.

[1] Cole, Nancy S. (October 1981). "Bias in testing". American Psychologist. 36 (10): 1067–1077. doi:10.1037/0003-066X.36.10.1067. ISSN 1935-990X.

[:0-2] Popovic, Aleksandar; Huecker, Martin R. (June 23, 2023). "Study Bias". Stat Pearls. PMID 34662027.

[1]

[2]