The year 2021 has been an incredible year of action, growth, and innovation at Starburst. From launching Starburst Galaxy, the first cross cloud analytics solution, to key executive hires like Adrian Estala, David Freeman, Javier Molina and Toni Adams, from achieving unicorn status to tripling revenue and customers, and from winning top industry accolades to owning the share of voice on Data Mesh, 2021 has been a landmark year. All such milestones were captured in the 125 blogs that were published throughout the year.
We launched an All-Stars Editorial Program with elements of gamification earlier in the year to up-level our content game by adding more internal voices to the Starburst blog. The idea was to break down the functional silos and unlock the tribal knowledge we possess as a team in order to build a thriving content engine. I’m happy to report that the program was a huge success that saw contributions from nearly all departments including alliances, engineering, sales, customer success, and of course marketing.
Between culture, events, people, product, technical, and thought leadership posts, deep technical blogs logged the most views and came to the top after the funding announcement. The top five blogs of 2021 (published in 2021) in order of popularity, measured by unique page views, are:
It is no surprise that our Series C blog announcement made it to the top. An important milestone in the company’s history, the funding took Starburst to a valuation of $1.2 billion. After bootstrapping the company to profitability for the first two years without any outside capital, we have raised $164 million to date.
Wondering why we’ve raised so much money? “Simply put, we believe we are solving the biggest problem that the big data era couldn’t: Offering fast access to data, regardless of where it lives,” explains Mr CEO, Justin Borgman. “Stated another way, we provide data warehousing-style analytics, without the data warehouse.”
Starburst is an enterprise-grade edition of the open source Trino that is used by hundreds of customers today, including data-driven companies like Comcast, Tesla, FINRA, and Zalando. It includes additional features that improve performance, security, connectivity, and manageability, along with 24x7 support from the Presto (and Trino!) experts.
Since the onset of big data, the idea of a single source of reference or a single point of truth has been the goal of data management. However, the approach is hypocritical at its very core.
Anybody who has been involved in the deployment of an enterprise-wide data warehouse knows that the endeavor is filled with coordination. “Organizationally, it is a centralized behemoth… Scalability requires independent units working in parallel, while centralization introduces coordination, resistance, and inertia,” notes Daniel Abadi, the award-winning professor of Computer Science.
Daniel describes how the Data Mesh solves this age-old problem: It does for human teams what the parallel database systems do for “teams” of computers/servers.
“Given that the technology [like Starburst] that enables the Data Mesh to be implemented without ‘the silo catch’ now exists, the remaining justifications for the continued existence of the centralized data warehouse are starting to disappear,” he concludes.
Data Fabric and Data Mesh are both emerging paradigms designed to solve a prevalent problem in modern data management: Undue reliance on a slow, central human team to make new datasets available.
They are fundamentally different techniques that make vastly different technical assumptions. A Data Mesh aims at increasing the efficiency of human effort by emphasizing domain ownership and distributed data teams while Data Fabric aims to automate the work with artificial intelligence.
“The Data Fabric fundamentally is about eliminating human effort, while the Data Mesh is about smarter and more efficient use of human effort,” notes Daniel Abadi in his blog that compares and contrasts the two architectural paradigms.
The first installment in a four-part blog series, Trino on Ice offers a gentle introduction to the Apache Iceberg Connector, as the title suggests. It explores Iceberg’s table properties, cloud compatibility, concurrency model, the Iceberg specification, and partitioning.
“Iceberg maps and indexes the files in order to provide a higher level abstraction that handles the relational table format for data lakes,” shares Brian Olsen, US Marine turned software engineer and developer advocate.
He sets the tone by discussing the pain points of Hive and how Iceberg attempts to solve some of those problems and provides a general guide from Hive to Iceberg. All this through the lens of a Trino user!
As Microsoft’s Azure platform continues to grow and enterprise companies create and migrate data, Starburst provides an easy way to query that data wherever it lives.
In this blog, Director of Customer Solutions Tom explains through a diagram, how Starburst provides a SQL Query Engine for Data Mesh – a query consumption layer on top of a Data Mesh in which different organizational domains store their data in different data storage locations such as ADLS, Synapse, SQL databases, NoSQL, and even queuing systems like Kafka.
“Starburst has helped many customers transition from their on-premises data lake to Azure’s ADLS Gen 2 storage,” notes Tom. “Adding a high performant, high concurrent query engine on top of this storage allows a company to provide an easy-to-use SQL-based tool to query data in a variety of locations.”
Hope you enjoyed our Top 5. We’ll conclude with a fun update from our All-Stars Editorial Program – We’re running a tight race between Colleen Tartow and Tom Nats for the grand prize of a ‘Dinner for Two with CEO Justin Borgman' :) Will it be one of them or someone else? The winner will be crowned on Mon, Jan 31 that marks the end of the company's fiscal year, stay tuned!