Data preparation

Data preparation is the act of manipulating (or pre-processing) raw data (which may come from disparate data sources) into a form that can readily and accurately be analysed, e.g. for business purposes.[1]

Data preparation is the first step in data analytics projects and can include many discrete tasks such as loading data or data ingestion, data fusion, data cleaning, data augmentation, and data delivery.[2]

The issues to be dealt with fall into two main categories:

  • systematic errors involving large numbers of data records, probably because they have come from different sources;
  • individual errors affecting small numbers of data records, probably due to errors in the original data entry.
  1. ^ Friedland, David (September 7, 2016). "A Fresh Look at Data Preparation". IRI (Blog Article). IRI, The CoSort Company.
  2. ^ Pyle, Dorian (April 5, 1999). Data Preparation for Data Mining. Morgan Kaufmann. ISBN 9781558605299 – via Google Books.