Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Harrison Johnson

Head of Technology Partnerships

Starburst

How Data Access Helps with ML/AI Projects

Last Updated: April 6, 2023

Data Leader/CxO Head of Analytics LOB/CxO security SQL

Today in the data space, when you peruse technology solutions, it’s very difficult to put your finger on just exactly what each firm’s product or services do. Two solutions that are providing very different capabilities seem to have nearly identical messaging. Moreover, buzzwords drift to the front of the line, ahead of hard realities like data gravity, data silos, data recency, and other very real limiting factors standing in the way of data driven digital transformation. Data science is no exception. The hairy challenges of getting models, algorithms, and other key data assets to production often lurk in shadows, whilst we praise the amazing tools available and the business value they will enable. When it comes to data science, I have been most puzzled by the omission of data access as a very real blocker to success. In this blog, I detail just how significant data access really is to successful data science strategies.

Benefits of Data Access in ML/AI Projects

Data access is one of the most forgotten hardships of ML/AI, acting as a towering obstacle standing between brillant data minds, their tools, and the promise of data driven decisions. Examples of the benefits: time to market, operational efficiency, risk mitigation, customer 360, increased profits and more are regularly on display through various marketing efforts. The potential upside and business transformation is seemingly limitless.

For instance in healthcare, Nina Schwalbe, MPH, adjunct professor in the Heilbrunn Department of Population and Brian Wahl, PhD, assistant scientist in the Department of International Health at the Johns Hopkins Bloomberg School of Public Health said, “Enabling access across borders will require new types of data sharing protocols and standards on interoperability and data labeling. This global movement could be facilitated by an international collaboration so that data are rapidly and equitably available for the development and testing of AI-driven health interventions.”

ML/AI Models Are As Good As Your Data

When we unpack the process of getting ML/AI into production, we learn that these solutions are only as valuable as the data used to create them and constantly enrich them. This puts data exploration, data discovery, and unimpeded iterative questions front and center. This type of data work is rapidly becoming a workload majority stake holder. In essence, your data teams’ ability to make a difference will rely heavily on their ability to quickly and efficiently incorporate key workloads into their model creation workflows.

When we look at market leaders and market challengers alike, they lay out a path to success for ML/AI that naturally starts with a curious assumption, that all of the data you need is at your fingertips. The unfortunate reality is that is rarely the case and the data needed to shape and reshape ML/AI solutions is constantly evolving. This seldom recognized “elephant in the room” (no pun intended with Hadoop… or is there?) is data access, one of the most prohibitive blockers to getting data science frameworks up, running and out the door adding value.

Data Access for Data Consumers

The volume of data (and its growth) creates an opportunity for everyone within the organization to be a data consumer. However, there is an infinite number of questions which has created acute pain around data access and the requirement to centralize data through constant movement and copying. This acute pain is compounded further when considering the importance of data discovery and data exploration.

Let’s think about how data discovery has traditionally been done: data scientists or analysts request large volumes of data from all corners of the data estate to be moved into a lake or warehouse. Seems straightforward, right? Wrong, this is a messy, laborsome, arduous and all around a painful process. For all of that pain, typically this takes lots of time and produces data quality and recency issues. Imagine if archeologists scoured the Earth for dig sites, but instead of setting up shop and excavating these assets, started carving out however many square miles/ tons of Earth and had it airlifted back to their labs and museums. And, then hoping that they have grabbed the right plots of land, began their process of uncovering artifacts to surface to the public. How impractical would that be?

How Starburst Helps With Data Access

Good news! There is a way to provide secure, performant access to data without it having to be moved or copied. Without any lifting, shifting, ripping, replacing or migrating your data consumer population and data scientists in particular can begin to leverage SQL as a common language to do data exploration on any data, in any system! Starburst can deploy anywhere: on all major clouds, K8s, VMs or bare metal and connect with the BI tool, SQL editor or custom application of choice for the data consumers and provide near real time access for data discovery and exploration. We are not a data science tool, but a friend of ML/AI in our ability to accelerate key assets (models, algorithms etc) to production by streamlining the discovery process.

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

How Data Access Helps with ML/AI Projects

Last Updated: April 6, 2023

Related posts

Benefits of Data Access in ML/AI Projects

ML/AI Models Are As Good As Your Data

Data Access for Data Consumers

How Starburst Helps With Data Access

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

How Data Access Helps with ML/AI Projects

Last Updated: April 6, 2023

Related posts

Starburst Enterprise LTS Backport Releases

Introducing New Data Observability Features in Starburst Galaxy – Now in Public Preview

Automating the “Icehouse” – Fully-managed Open Lakehouse Platform on Starburst Galaxy

What’s New in Starburst Galaxy – April 2024

Benefits of Data Access in ML/AI Projects

ML/AI Models Are As Good As Your Data

Data Access for Data Consumers

How Starburst Helps With Data Access

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy