Apache Flink

Apache Flink
Developer(s)	Apache Software Foundation
Initial release	May 2011; 13 years ago
Stable release	1.20.0 / 1 August 2024; 3 months ago
Repository	github.com/apache/flink ;
Written in	Java and Scala
Operating system	Cross-platform
Type	Data analytics; machine learning algorithms;
License	Apache License 2.0
Website	flink.apache.org

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala.^[3]^[4] Flink executes arbitrary dataflow programs in a data-parallel and pipelined (hence task parallel) manner.^[5] Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs.^[6]^[7] Furthermore, Flink's runtime supports the execution of iterative algorithms natively.^[8]

Flink provides a high-throughput, low-latency streaming engine^[9] as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics.^[10] Programs can be written in Java, Scala,^[11] Python,^[12] and SQL^[13] and are automatically compiled and optimized^[14] into dataflow programs that are executed in a cluster or cloud environment.^[15]

Flink does not provide its own data-storage system, but provides data-source and sink connectors to systems such as Apache Doris, Amazon Kinesis, Apache Kafka, HDFS, Apache Cassandra, and ElasticSearch.^[16]

^ "Release 1.20.0". 1 August 2024. Retrieved 20 August 2024.
^ "All stable Flink releases". flink.apache.org. Apache Software Foundation. Retrieved 2021-12-20.
^ "Apache Flink: Scalable Batch and Stream Data Processing". apache.org.
^ "apache/flink". GitHub. 29 January 2022.
^ Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. The VLDB Journal 23, 6 (December 2014), 939-964. DOI
^ Ian Pointer (7 May 2015). "Apache Flink: New Hadoop contender squares off against Spark". InfoWorld.
^ "On Apache Flink. Interview with Volker Markl". odbms.org.
^ Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning fast iterative data flows. Proc. VLDB Endow. 5, 11 (July 2012), 1268-1279. DOI
^ "Benchmarking Streaming Computation Engines at Yahoo!". Yahoo Engineering. Retrieved 2017-02-23.
^ Carbone, Paris; Fóra, Gyula; Ewen, Stephan; Haridi, Seif; Tzoumas, Kostas (2015-06-29). "Lightweight Asynchronous Snapshots for Distributed Dataflows". arXiv:1506.08603 [cs.DC].
^ "Apache Flink 1.2.0 Documentation: Flink DataStream API Programming Guide". ci.apache.org. Retrieved 2017-02-23.
^ "Apache Flink 1.2.0 Documentation: Python Programming Guide". ci.apache.org. Retrieved 2017-02-23.
^ "Apache Flink 1.2.0 Documentation: Table and SQL". ci.apache.org. Retrieved 2017-02-23.
^ Fabian Hueske, Mathias Peters, Matthias J. Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. 2012. Opening the black boxes in data flow optimization. Proc. VLDB Endow. 5, 11 (July 2012), 1256-1267. DOI
^ Daniel Warneke and Odej Kao. 2009. Nephele: efficient parallel data processing in the cloud. In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS '09). ACM, New York, NY, USA, Article 8, 10 pages. DOI
^ "Apache Flink 1.2.0 Documentation: Streaming Connectors". ci.apache.org. Retrieved 2017-02-23.

[wikidata-b699a657a2100c58420b67d80b58e7b93ad097b6-v18-1] "Release 1.20.0". 1 August 2024. Retrieved 20 August 2024.

[2] "All stable Flink releases". flink.apache.org. Apache Software Foundation. Retrieved 2021-12-20.

[3] "Apache Flink: Scalable Batch and Stream Data Processing". apache.org.

[4] "apache/flink". GitHub. 29 January 2022.

[5] Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. The VLDB Journal 23, 6 (December 2014), 939-964. DOI

[6] Ian Pointer (7 May 2015). "Apache Flink: New Hadoop contender squares off against Spark". InfoWorld.

[7] "On Apache Flink. Interview with Volker Markl". odbms.org.

[8] Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning fast iterative data flows. Proc. VLDB Endow. 5, 11 (July 2012), 1268-1279. DOI

[9] "Benchmarking Streaming Computation Engines at Yahoo!". Yahoo Engineering. Retrieved 2017-02-23.

[:2-10] Carbone, Paris; Fóra, Gyula; Ewen, Stephan; Haridi, Seif; Tzoumas, Kostas (2015-06-29). "Lightweight Asynchronous Snapshots for Distributed Dataflows". arXiv:1506.08603 [cs.DC].

[11] "Apache Flink 1.2.0 Documentation: Flink DataStream API Programming Guide". ci.apache.org. Retrieved 2017-02-23.

[12] "Apache Flink 1.2.0 Documentation: Python Programming Guide". ci.apache.org. Retrieved 2017-02-23.

[13] "Apache Flink 1.2.0 Documentation: Table and SQL". ci.apache.org. Retrieved 2017-02-23.

[14] Fabian Hueske, Mathias Peters, Matthias J. Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. 2012. Opening the black boxes in data flow optimization. Proc. VLDB Endow. 5, 11 (July 2012), 1256-1267. DOI

[15] Daniel Warneke and Odej Kao. 2009. Nephele: efficient parallel data processing in the cloud. In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS '09). ACM, New York, NY, USA, Article 8, 10 pages. DOI

[:0-16] "Apache Flink 1.2.0 Documentation: Streaming Connectors". ci.apache.org. Retrieved 2017-02-23.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]