Apache Parquet

Apache Parquet
Initial release13 March 2013; 11 years ago (2013-03-13)
Stable release
2.9.0 / 6 October 2021; 2 years ago (2021-10-06)[1]
Repository
Written inJava (reference implementation)[2]
Operating systemCross-platform
TypeColumn-oriented DBMS
LicenseApache License 2.0
Websiteparquet.apache.org

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

  1. ^ "Apache Parquet – Releases". Apache.org. Archived from the original on 22 February 2023. Retrieved 22 February 2023.
  2. ^ "Parquet-MR source code". GitHub. Archived from the original on 11 June 2018. Retrieved 2 July 2019.