Synthetic data

Synthetic data is information that is artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models.[1]

Data generated by a computer simulation can be seen as synthetic data. This encompasses most applications of physical modeling, such as music synthesizers or flight simulators. The output of such systems approximates the real thing, but is fully algorithmically generated.

Synthetic data is used in a variety of fields as a filter for information that would otherwise compromise the confidentiality of particular aspects of the data. In many sensitive applications, datasets theoretically exist but cannot be released to the general public;[2] synthetic data sidesteps the privacy issues that arise from using real consumer information without permission or compensation.

  1. ^ "What is synthetic data? - Definition from WhatIs.com". SearchCIO. Retrieved 2022-09-08.
  2. ^ Nikolenko, Sergey I. (2021). Synthetic Data for Deep Learning. Springer Optimization and Its Applications. Vol. 174. doi:10.1007/978-3-030-75178-4. ISBN 978-3-030-75177-7. S2CID 202750227.