Data orientation

Data orientation refers to how tabular data is represented in a linear memory model such as in-disk or in-memory.The two most common representations are column-oriented (columnar format) and row-oriented (row format).[1][2]

The choice of data orientation is a trade-off and an architectural decision in databases, query engines, and numerical simulations.[1] As a result of these tradeoffs, row-oriented formats are more commonly used in Online transaction processing (OLTP) and column-oriented formats are more commonly used in Online analytical processing (OLAP).[2]

Examples of column-oriented formats include Apache ORC,[3] Apache Parquet,[4] Apache Arrow,[5] formats used by BigQuery, Amazon Redshift and Snowflake. Predominant examples of row-oriented formats include CSV, formats used in most relational databases, in-memory format of Apache Spark, and Apache Avro.[6]

  1. ^ a b Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil (2008). "Column-stores vs. Row-stores: How different are they really?". Proceedings of the 2008 ACM SIGMOD international conference on Management of data. pp. 967–980. doi:10.1145/1376616.1376712. ISBN 978-1-60558-102-6.
  2. ^ a b Funke, Florian; Kemper, Alfons; Neumann, Thomas (2012). "Compacting Transactional Data in Hybrid OLTP&OLAP Databases". Proceedings of the VLDB Endowment. 5 (11): 1424–1435. doi:10.14778/2350229.2350258.
  3. ^ "Apache ORC". Retrieved 2024-05-21.
  4. ^ "Apache Parquet". Retrieved 2024-05-21.
  5. ^ "Apache Arrow". Retrieved 2024-05-21.
  6. ^ "Apache Avro". Retrieved 2024-05-21.