Explore the next frontier of data

Read the latest news and opinions from our experts

 

Featured Post

Recent Posts

Data Mesh: 6 Myths and Misconceptions

According to Gartner’s recent research1, IT budgets are on the rise as companies have a vested interest in technologies aimed at composability. Mastering the risk of accelerating change and addressing new business opportunities and challenges are paramount to competing in data and analytics. Moreover, central to conquering the pressures of an always-on, always-changing business climate is Data Mesh. 

Data Mesh encourages data democratization and enables business users and data scientists alike to access, analyze, and operationalize business insights from data products created by the domain engineers who know that data best. Like everyone else, we’re excited about the promise of Data Mesh. However, we are noticing wide-ranging myths and misconceptions about what it is, and wanted to provide a clarifying cheat sheet of Data Mesh myths and misconceptions. 

Myth #1: Data Mesh is a technology-focused solution

Truth: Data Mesh is not a technology-focused solution or even a subset of technologies. Rather, Data Mesh is an organizational, cultural, and technological paradigm shift in how we gather, manage, and share analytical data. 

Of course, Data Mesh requires technology, but there isn’t just one solution. Data Mesh requires a conglomerate of them as part of a real modern data stack. The good news is that Data Mesh puts the onus on various technologies (i.e. compute, storage, and a self-serve BI tool) that support each domain to produce a set of easily consumable data products.  

Truth: #DataMesh is not just a technology-focused solution, but an organizational, cultural and technological paradigm shift in how we gather, manage and share analytical data. #bigdata


Myth #2:
Data Mesh means everything must be decentralized 

Truth: Different business domains can share infrastructure as long as that infrastructure is self-serviced and doesn’t require centralized human teams. 

For more context, currently, data warehouses and data lakes tend to rely on data centralization and often fall short when it comes to scale and speed due to extensive data movement and copying of data - which delays time-to-insight and increases the complexity of their data management architecture. 

Support for decentralization and democratization of data emerges from a growing awareness that one’s current data architecture may not be meeting the organization’s needs. That’s why Data Mesh proposes a far more compelling move away from the centralization paradigm to optimizing architecture for distributed and decentralized data. 

Of course, you certainly may have a centralized data infrastructure, built around complex data pipelines and can incrementally work towards a decentralized data architecture, producing data products for each domain. The more important point is to find a way to move from centralization towards decentralization as you discern which tools to use in your toolbelt. The reality is that you’ll have to find an equilibrium on the spectrum of a centralized vs decentralized approach.

Centralized Architectures (i.e. Data Lakes and Data Warehouses) 

Decentralized Architectures (i.e. SQL-based MPP query engine)

A centralized data architecture means the data from each domain is copied to one location, and that the data from the multiple domains are combined to create centralized data models and unified views.

Data Pipelines create bottlenecks:

  • A large central ETL pipeline means reduced flexibility to end consumers 
  • Backlogged central data team
  • Isolation between ingestion, transformation and delivery

Disconnection between data producers and consumers

  • Data source owners, data engineers and data consumers work at cross purposes
  • Loss of how to map analytics back to business fundamentals

A decentralized data architecture means the data from each domain is not copied but rather resides wherever it’s at and each domain has its own data products, data product lifecycle as well as data product owners.

A single point of access:

  • Removes analytics bottlenecks
  • Simplifies complexity by reducing data movement
  • Takes advantage of data lake cost efficiencies
  • Significantly reduces delays regarding time-to-insight
  • Speeds critical decisions that drive competitive advantage

Data owners, producers and consumers

  • Gives data consumers greater control of their data products, via a self-serve data platform

 

Truth: #DataMesh does not mean everything must be decentralized. Discern which tools to use in your toolbelt as you find an equilibrium on the spectrum of a #centralized vs #decentralized approach. #analytics

 

Myth #3: Current and Data Mesh self-serve data platforms are the same thing

Truth: Large organizations have implemented self-serve platforms, but the self-serve aspect of Data Mesh is vastly different in a few ways. Right now, the majority of data platforms built today are for centralized data teams. 

Whereas Data Mesh offers a deeper form of functionality, from storage, data product and all the way up to the self-service infrastructure that supports its respective business domain. Data platforms built to support Data Mesh should be optimized to give autonomy to domain teams and give generalists the ability to create data products. And from a strategy perspective, it’s very important for the organization to discover and identify their own place on the spectrum of the Data Mesh self-service data platform, which is somewhere between a fully centralized data infrastructure and a fully decentralized infrastructure. 

Truth:Large orgs self-serve platforms are built for centralized data teams. Data platforms built to support #DataMesh should give autonomy to domain teams & give generalists the ability to create data products #datascience


Myth #4: Data fabric and Data Mesh are similar solutions

Truth: Data fabric is technology-centric, while a Data Mesh focuses on not only the technology, but organizational, people and process. 

A data fabric provides an architecture to access data across centralized technologies, which requires data movement. Once data is collected, it’s available via an API, where individuals can share important data between devices, applications, and colleagues or partners through a dashboard. 

Meanwhile, Data Mesh is decentralized for distributed data where users query data from a single point of access, which significantly reduces the need for movement. Plus, Data Mesh is more about using data to build data products, owned, and used by the business domain. 

Both are important and ultimately companies choose what's right for them, based on the company’s data, security policies, talent availability, performance needs, and financial constraints.

Truth: Data fabric is technology-centric, while a #DataMesh focuses on not only the technology, but organizational, people and process.


Myth #5: You can nest Data Mesh into any technological solution such as data virtualization

Truth: Data Mesh is not a synonym for data virtualization. However, we can certainly see why data virtualization complements Data Mesh. For one thing, data virtualization has been a great tool to see a high-level overview of an organization’s overall data. It is perfect for smaller scale needs. However, once a larger scope is needed, the requirements expand. That’s why it’s worth taking a closer look instead at how Data Mesh scales data virtualization. Like a data fabric, Data Mesh is more than just a technical solution, it’s also a cultural and organizational solution, whereas data virtualization is not. 

Truth:Data virtualization is perfect for small scale needs, but once a larger scope is needed, the requirements expand. It's worth taking a closer look at how #DataMesh scales data virtualization.

 

Myth #6: Data Mesh eliminates data engineers and/or causes friction with data analysts

Truth: Data Mesh is not about removing data engineers but about better data engineering management and creating new career paths and opportunities. 

Right now, data engineering skills are centralized, meaning they’re multi-domain experts for ETLs across all domains and data. Data engineers are expected to clean, aggregate, and transform the data — all of which requires deep technical expertise surrounding the technology and without any real connection about the business. That creates friction with data analysts as the dataset they receive from engineers might miss the mark with what the analysts really needed. All of this creates an environment for potential burnout for many data engineers for a position that’s already operating at a reduced capacity. 

Data engineers within a Data Mesh architecture remove themselves as a bottleneck and they support an ecosystem of data products within each business domain rather than for the entire organization. When data engineers work within the mesh, they understand the data itself within their business domain. They’ll also have all the necessary expertise around their data and its uses, and are able to react quickly to changing market conditions or internal requirements. 

With a decentralized approach, data engineers in the domains can access and manipulate data easily. Data Mesh enables data engineers and consumers frictionless access to data, across both cloud providers and on-premise data, creates an ease of viewing data in different formats, eliminating copying data from one technology stack to another, and connecting to data, wherever it is. 

Not only data engineers, but data consumers can be responsible for all aspects of the data product lifecycle, including correcting missing or incorrect data. 

Truth: #DataMesh is not about removing data engineers but about better data engineering management and creating new career paths and opportunities.

1Gartner Survey of Over 2,000 CIOs Reveals the Need for Enterprises to Embrace Business Composability in 2022

 

Cindy Ng

Technical Content Marketing Manager at Starburst

Your Comments :

data-mesh-email-signature

From Facebook

Read more of what you like.