Apache Spark

Apache Spark
Original author(s)Matei Zaharia
Developer(s)Apache Spark
Initial releaseMay 26, 2014; 10 years ago (2014-05-26)
Stable release
3.5.2 (Scala 2.13) / August 10, 2024; 3 months ago (2024-08-10)
RepositorySpark Repository
Written inScala[1]
Operating systemMicrosoft Windows, macOS, Linux
Available inScala, Java, SQL, Python, R, C#, F#
TypeData analytics, machine learning algorithms
LicenseApache License 2.0
Websitespark.apache.org Edit this at Wikidata

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

  1. ^ "Spark Release 2.0.0". MLlib in R: SparkR now offers MLlib APIs [..] Python: PySpark now offers many more MLlib algorithms"