Apache Avro

Apache Avro
Developer(s)	Apache Software Foundation
Initial release	2 November 2009; 15 years ago
Stable release	1.11.3 / September 23, 2023; 13 months ago
Repository	Avro Repository
Written in	Java, C, C++, C#, Perl, Python, PHP, Ruby
Type	Remote procedure call framework
License	Apache License 2.0
Website	avro.apache.org

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing (Avro IDL) and another which is more machine-readable based on JSON.^[3]

It is similar to Thrift and Protocol Buffers, but does not require running a code-generation program when a schema changes (unless desired for statically-typed languages).

Apache Spark SQL can access Avro as a data source.^[4]

^ "Apache Avro: a New Format for Data Interchange". blog.cloudera.com. Retrieved March 10, 2019.
^ "Apache Avro Releases". avro.apache.org. Retrieved September 23, 2023.
^ Kleppmann, Martin (2017). Designing Data-Intensive Applications (First ed.). O'Reilly. p. 122.
^ "3 Reasons Why In-Hadoop Analytics are a Big Deal - Dataconomy". dataconomy.com. April 21, 2016.