Data lake

Example of a database that can be used by a data lake (in this case structured data)

A data lake is a system or repository of data stored in its natural/raw format,[1] usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc.,[2] and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).[3] A data lake can be established on premises (within an organization's data centers) or in the cloud (using cloud services).

  1. ^ "The growing importance of big data quality". The Data Roundtable. 21 November 2016. Retrieved 1 June 2020.
  2. ^ "What is a data lake?". aws.amazon.com. Retrieved 12 October 2020.
  3. ^ Campbell, Chris. "Top Five Differences between DataWarehouses and Data Lakes". Blue-Granite.com. Archived from the original on 14 March 2016.