Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Back to Company Blog

Tom Nats

Director of Customer Solutions

Starburst

Moving On-Premise Data to Azure Cloud ADLS

Last Updated: December 15, 2023

Data Lake Data Mesh Partners starburst enterprise

Microsoft has migrated thousands of customers to its Azure cloud platform and has quickly become the second most popular cloud provider. Companies have easily transitioned their Windows and non-windows infrastructure including their analytics and operational platforms.

There are different services within the Azure cloud that can provide analytical services such as Azure Data Lake Storage (ADLS), Azure Synapse and Microsoft SQL Server. Ideally, most of a company’s analytical data would be stored in ADLS due to its low cost and high performance and stored in an open columnar format such as Apache ORC or Apache Parquet. This allows true separation of compute and storage without “data lock-in” as we like to call it.

Moving from on-premise data lake to Azure’s ADLS Gen 2 storage

Starburst has helped many customers transition from their on-premises data lake to Azure’s ADLS Gen 2 storage. Adding a high performant, high concurrent query engine on top of this storage allows a company to provide an easy-to-use SQL-based tool to query data in a variety of locations.

In the diagram below, we illustrate a query consumption layer on top of a data mesh in which different organizational domains store their data in different data storage locations such as ADLS, Synapse, SQL databases such as SQL Server, and even NoSQL and queuing systems such as Kafka.

image (1)

Instead of moving data from one source to another just to query it, Starburst allows you to query the data where it lives with simple sql such as:

select region,sum(tot_sales) from adls.sales, sqlserver.regions;

Choosing the storage location for your data is mostly dependent on the expectations from the consumers of the data which include uptime, performance and quality/volume of data. Whether your data is in ADLS, Synapse, Cosmos DB or even Event Hubs, Starburst can join this data together using standard SQL queries leaving the headaches of ETL/data migration in the past. See all of our connectors here.

Aggregations/Rollups and Sandboxes

Additionally, Starburst’s ANSI SQL language support makes it an excellent choice for creating and managing tables in your data lake as well as many of the 40+ connectors to other data sources. The types of tables can be integration, rollup, aggregation and even sandbox tables for users to experiment and create sample data sets with. Combining and persisting data from multiple sources is table stakes for any modern analytics architecture and Starburst makes it super easy using industry standard sql.

Starburst Cached Views

Cached views allow tables from many different data sources to be cached in other connectors such as ADLS Gen 2. This allows for much higher performance using parallel reads from the Starburst cluster workers. This can also help burden the impact on the source system from many users querying it at the same time.

In the diagram above, the customer table is replicated to ADLS every hour and queries to the SQL Server table “customer” are automatically redirected to ADLS greatly improving the performance and reducing the impact on the database. The source database can be located anywhere including an on-premises data center.

Starburst Delta Lake Connector

Modifying data in a data lake has always been a challenge. Delta Lake provides a transactional layer on top of ADLS which includes many benefits such as ACID transactions (update, delete,etc..), and large performance improvements. With our latest Starburst Enterprise release, we have introduced update, insert and delete of Delta Lake data in ADLS. This is tremendously important for not only updating existing data in your data lake but for compliance such as GDPR.

Starburst Stargate

Large enterprise companies often have data spread across different locations such as on-premises, cloud, and even multi-cloud. Starburst’s Stargate allows clusters in different locations to work together to process queries locally and reducing egress costs and greatly improving performance by processing the data locally.

Starburst Stargate Diagram

The best part of Starburst Stargate is users don’t have to worry about where the data lives, they just write normal sql queries to get the answers they are looking for.

Conclusion

As Microsoft’s Azure platform continues to grow and enterprise companies create and migrate data, Starburst provides an easy way to query that data wherever it lives. This provides a one-stop-shopping solution to query data in a variety of sources and locations. With the additions of Stargate and Views, we help you manage the complexity of providing relevant and timely data to all of your users using industry-standard SQL.

How to create a cost-effective Azure lakehouse data strategy

Azure Data Lake Storage (ADLS) is a great technology that provides low-cost, redundant storage.

Learn more

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

Back to Company Blog

Moving On-Premise Data to Azure Cloud ADLS

Last Updated: December 15, 2023

Related posts

Moving from on-premise data lake to Azure’s ADLS Gen 2 storage

Aggregations/Rollups and Sandboxes

Starburst Cached Views

Starburst Delta Lake Connector

Starburst Stargate

Conclusion

How to create a cost-effective Azure lakehouse data strategy

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

Back to Company Blog

Moving On-Premise Data to Azure Cloud ADLS

Last Updated: December 15, 2023

Related posts

6 Considerations for Choosing the Right Cloud Data Lake Solution

Azure data lake with Starburst Galaxy for better analytics

How To Migrate Queries From Amazon Athena To Starburst Galaxy

The hidden value of your data lake

Moving from on-premise data lake to Azure’s ADLS Gen 2 storage

Aggregations/Rollups and Sandboxes

Starburst Cached Views

Starburst Delta Lake Connector

Starburst Stargate

Conclusion

How to create a cost-effective Azure lakehouse data strategy

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy