Zarr (data format)

Zarr is an open standard for storing large multidimensional array data. It specifies a protocol and data format, and is designed to be "cloud ready" including random access, by dividing data into subsets referred to as chunks.[1][2] Zarr can be used within many programming languages, including Python, Java, JavaScript, C++ and Julia.[3] It has been used by organisations such as Google, Microsoft to publish large datasets.[4][5]

Zarr is designed to support high-throughput distributed I/O on different storage systems, which is a common requirement in cloud computing. Multiple read operations can efficiently occur to a Zarr array in parallel, or multiple write operations in parallel.[6]

  1. ^ "Zarr - chunked, compressed, N-dimensional arrays". zarr.dev. Retrieved 2024-09-12.
  2. ^ "Cloud-Optimized Geospatial Formats Guide: Zarr". guide.cloudnativegeo.org. Retrieved 2024-09-12.
  3. ^ "Zarr Implementations". github.com. Retrieved 2024-09-12.
  4. ^ "Google Cloud: ERA5 data". cloud.google.com. Retrieved 2024-09-12.
  5. ^ "Microsoft Planetary Computer: Reading Zarr Data". planetarycomputer.microsoft.com. Retrieved 2024-09-12.
  6. ^ "Zarr - Tutorial". zarr.readthedocs.io. Retrieved 2024-09-12.