Databricks Announces Major Contributions to Flagship Open Source Projects at Data + AI Summit
SAN FRANCISCO, June 28, 2022 /CNW/ -- Databricks, the data and AI company and pioneer of the data lakehouse paradigm, today announced several contributions to popular data and AI open source projects including Delta Lake, MLflow, and Apache Spark.
At the Data + AI Summit, the largest gathering of the open source data and AI community, Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse.
"From the beginning, Databricks has been committed to open standards and the open source community. We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open source technology," said Ali Ghodsi, Co-Founder and CEO of Databricks. "Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow and Spark are all core to this architectural transformation, and we're proud to do our part in accelerating their innovation and adoption."
Delta Lake 2.0 Brings the Lakehouse to Everyone
"Databricks provides Akamai with a table storage format that is open and battle-tested for demanding workloads such as ours. The lakehouse powers interactive analytics at scale so that our customers can have near real-time analysis of security events within our Edge platform," said Aryeh Sivan, VP Engineering at Akamai. "We are very excited about the rapid innovation that Databricks, along with the rapidly growing community, is bringing to Delta Lake. We are also looking forward to collaborating with other developers on the project to move the data community to greater heights."
"The Delta Lake project is seeing phenomenal activity and growth trends indicating the developer community wants to be a part of the project. Contributor strength has increased by 60% during the last year and the growth in total commits is up 95% and the average lines of code per commit is up 900%. We are seeing this upward velocity from contributing organizations like Uber Technologies, Walmart and CloudBees, Inc., among others," said Executive Director of the Linux Foundation, Jim Zemlin.
MLflow 2.0 Introduces MLflow Pipelines to Templatize and Automate MLOps
Next Generation Streaming Engine and Spark Whenever and Wherever
In collaboration with the Spark community, Databricks also announced Project Lightspeed, the next generation of the Spark streaming engine. As the diversity of applications moving into streaming data has increased, new requirements have emerged to support the most in-demand data workloads for lakehouse, data streaming. Spark Structured Streaming has been widely adopted since the early days of streaming because of its ease of use, performance, large ecosystem, and developer communities. With that in mind, Databricks will collaborate with the community and encourage participation in Project Lightspeed to improve performance, ecosystem support for connectors, enhance functionality for processing data with new operators and APIs, and simplify deployment, operations, monitoring and troubleshooting.
To learn more about Databricks' commitment to the open source community visit: https://databricks.com/product/open-source.
Safe Harbor Statement
Contact: [email protected]
View original content to download multimedia:https://www.prnewswire.com/news-releases/databricks-announces-major-contributions-to-flagship-open-source-projects-at-data--ai-summit-301576465.html