Truth discovery

Truth discovery (also known as truth finding) is the process of choosing the actual true value for a data item when different data sources provide conflicting information on it.

Several algorithms have been proposed to tackle this problem, ranging from simple methods like majority voting to more complex ones able to estimate the trustworthiness of data sources.[1]

Truth discovery problems can be divided into two sub-classes: single-truth and multi-truth. In the first case only one true value is allowed for a data item (e.g birthday of a person, capital city of a country). While in the second case multiple true values are allowed (e.g. cast of a movie, authors of a book).[2][3]

Typically, truth discovery is the last step of a data integration pipeline, when the schemas of different data sources have been unified and the records referring to the same data item have been detected.[4]

  1. ^ Li, Yaliang; Gao, Jing; Meng, Chuishi; Li, Qi; Su, Lu; Zhao, Bo; Fan, Wei; Han, Jiawei (2016-02-25). "A Survey on Truth Discovery". ACM SIGKDD Explorations Newsletter. 17 (2): 1–16. doi:10.1145/2897350.2897352. S2CID 9060471.
  2. ^ Wang, Xianzhi; Sheng, Quan Z.; Fang, Xiu Susie; Yao, Lina; Xu, Xiaofei; Li, Xue (2015). "An Integrated Bayesian Approach for Effective Multi-Truth Discovery". Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. Melbourne, Australia: ACM Press. pp. 493–502. doi:10.1145/2806416.2806443. hdl:2440/110033. ISBN 9781450337946. S2CID 16207808.
  3. ^ Lin, Xueling; Chen, Lei (2018). "Domain-aware Multi-truth Discovery from Conflicting Sources". VLDB Endowment. 11 (5): 635–647. doi:10.1145/3187009.3177739.
  4. ^ Dong, Xin Luna; Srivastava, Divesh (2015-02-15). "Big Data Integration". Synthesis Lectures on Data Management. 7 (1): 1–198. doi:10.2200/S00578ED1V01Y201404DTM040. ISSN 2153-5418.