Explore the next frontier of data

Read the latest news and opinions from our experts

 

Featured Post

Recent Posts

Starburst and Databricks Collaborate on the Trino Delta Lake Connector

This blog was co-authored by Claudius Li, Product Manager at Starburst, and Joe Lodin, Information Engineer at Starburst.

Starburst recently donated the Delta Lake connector to Trino. We released the initial Delta Lake connector for Starburst Enterprise users in April 2020. The connector started out with read capabilities, but we’ve consistently expanded functionality to add write capabilities, data management capabilities, and significant performance enhancements. 

Over the last couple of weeks we ported the Delta Lake connector and its documentation to Trino, which all recently shipped in Trino 373. In the Trino Community Broadcast episode 34, we showcased the new connector. You can watch the demo video to see the connector in action, and you can also follow along with our instructions. Now, with Trino and Delta Lake, anyone can create and query a lake house with 100% open source software.

However, all of this is just the beginning. We have a lot more plans, as Starburst and Databricks are teaming up to make the connector even better.

Galactic lake house and a stronger community

One of our plans is to add Delta Lake support to Starburst Galaxy. Starburst Galaxy is based on Trino, just like Starburst Enterprise. Adding the Delta Lake connector means that customers can set up a fully functional lake house in minutes with just a few clicks. Starburst Galaxy users will be able to connect to either an existing lake house, or any cloud-native object storage to instantly create a new lake house.

From open source support to enterprise-grade support

With the Delta Lake connector in open source Trino, and open source Delta Lake available, anybody can get started running their own lake house. Starburst and Databricks are working with both open source communities to gather feedback, fix issues, improve performance, and add new features. 

“The decision to contribute the Delta Lake connector to Trino is an important milestone for Starburst. It reinforces our commitment to the large open source community around Trino. Together with other open source communities like Delta Lake, we can bring tremendous features to our users. We look forward to learning about their usage and problems, and making Trino even better for everyone.” said Matt Fuller, VP of Product at Starburst.

Databricks is embracing and supporting the Delta Lake open source community in a similar fashion, as Michael Armbrust, Distinguished Engineer at Databricks mentions: "The Starburst team shares our commitment to the open-source community, and this is an amazing starting point for future collaborations between Trino and Delta Lake! We're excited to see the Delta Lake connector flourish with the Trino community."

All of this work flows back to both communities, as Starburst and Databricks contribute fully-supported enterprise features onto the core open source projects.

Starburst Enterprise and Databricks customers automatically get the upgrade to the new connector in an upcoming release.

The road ahead

Starburst is continuing to iterate on the Delta Lake connector and everything related to it. We work constantly to make all of our connectors faster, more reliable, and more flexible. This commitment holds for any code in the core Trino engine, in the Trino connectors, and for any additional connectors available to Starburst Enterprise and Starburst Galaxy users.

Databricks is building a standalone Delta Lake reader library. This library improves performance and reliability by maintaining the library as part of Delta Lake, thereby adjusting to any interface, protocol, or semantic changes. Starburst and Databricks are working closely to make sure the reader library meets the needs of Trino and other Delta Lake users.

The secret to reliable software is testing. To collaborate on ensuring the reliability of these complex systems, Databricks has graciously offered to donate a test environment to the Trino project. Starburst will use this to make sure that Trino and Databricks can correctly read data written by the other engine as part of our continuous integration setup.

What next?

We are really excited to bring all the benefits of Trino and Delta Lake to our communities, and create a healthy ecosystem of collaboration with all of our users, customers, and contributors alike.

Starburst and Databricks are eager to hear about enhancements you think we should add to our connectors. Chat with us on the Trino Slack, and consider sending a pull request with your improvements.

Manfred Moser

Manfred Moser is the Director of Information Engineering at Starburst

Your Comments :

data-mesh-email-signature
Datanova 2022

From Facebook

Read more of what you like.