Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish.
This practice is an integral part of the open science movement.
There is a large and multidisciplinary consensus on the benefits resulting from this practice.[1][2][3]
The main goal is to elevate data to be first class research outputs.[4] There are a number of initiatives underway as well as points of consensus and issues still in contention.[5]
There are several distinct ways to make research data available, including:
publishing data as supplemental material associated with a research article, typically with the data files hosted by the publisher of the article
hosting data on a publicly available website, with files available for download
hosting data in a repository that has been developed to support data publication, e.g. figshare, Dryad, Dataverse, Zenodo. A large number of general and specialty (such as by research topic) data repositories exist.[6] For example, the UK Data Service enables users to deposit data collections and re-share these for research purposes.
publishing a data paper about the dataset, which may be published as a preprint, in a regular journal, or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository.
Publishing data allows researchers to both make their data available to others to use, and enables datasets to be cited similarly to other research publication types (such as articles or books), thereby enabling producers of datasets to gain academic credit for their work.
The motivations for publishing data may range for a desire to make research more accessible, to enable citability of datasets, or research funder or publisher mandates that require open data publishing. The UK Data Service is one key organisation working with others to raise the importance of citing data correctly[7] and helping researchers to do so.
Solutions to preserve privacy within data publishing has been proposed, including privacy protection algorithms, data ”masking” methods, and regional privacy level calculation algorithm.[8]