Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier.[1]
The term was introduced by Tore Dalenius in 1986.[3] Since then, quasi-identifiers have been the basis of several attacks on released data. For instance, Sweeney linked health records to publicly available information to locate the then-governor of Massachusetts' hospital records using uniquely identifying quasi-identifiers,[4][5] and Sweeney, Abu and Winn used public voter records to re-identify participants in the Personal Genome Project.[6] Additionally, Arvind Narayanan and Vitaly Shmatikov discussed on quasi-identifiers to indicate statistical conditions for de-anonymizing data released by Netflix.[7]
Motwani and Ying warn about potential privacy breaches being enabled by publication of large volumes of government and business data containing quasi-identifiers.[8]
^Barth-Jones, Daniel C. The're-identification'of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now. Then and Now (June 4, 2012) (2012).
^ Sweeney, Latanya, Akua Abu, and Julia Winn. "Identifying participants in the personal genome project by name." Available at SSRN 2257732 (2013).