Because the Google Ngram data set is not an unbiased sample,[5] and does not include metadata,[6] there are several pitfalls when using it to study language or the popularity of terms.[7] Medical literature accounts for a large, but shifting, share of the corpus,[8] which does not take into account how often the literature is printed, or read.
^Koplenig, Alexander (April 2017). "The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets—Reconstructing the composition of the German corpus in times of WWII". Digital Scholarship in the Humanities. 32 (1): 169–188. doi:10.1093/llc/fqv037. ISSN2055-7671.