This blog was co-authored by Mayank Mehra, Head of Product Management at Modak.
Data Fabric and Data Mesh concepts are front and center for many data-driven organizations and are routinely compared in data management and engineering circles. If you want some practical ideas to accelerate your data strategy, look for opportunities to learn from both approaches and leverage the best for your design.
A simpler and faster pathway to decentralized data sources
There are numerous articles and videos on mesh vs fabric, many of them offer useful opinions on the pros and cons. While most present the two as competing ideas, we propose that they can work together. They are both great concepts, and while there are differences in the approach, they share some key principles:
- Eliminating data silos and enabling data democratization across the enterprise.
- Enabling access to decentralized data sources, on-prem or the cloud, with the agility and scale that our business teams demand. Centralization is not a requirement, and for many organizations, it is not effective.
- Simplifying the ETL process to eliminate the bottleneck that the current centralized teams present.
In this article, we are going to focus on three capabilities: artificial intelligence, domains and data products, and governance. Certainly, there is a lot more to discuss and more opportunities to leverage the best of both worlds, but let this be our first step towards a more enriching conversation in the near future.
How a Data Fabric Leverages Artificial Intelligence
A Data Fabric uses artificial intelligence to integrate data sets across different data sources. The fabric relies on active metadata, knowledge graphs, and machine learning to drive recommendations for integration and analytics. This approach automates your discovery of new logical groupings to create virtual data domains. If you have good metadata and are working across large data sets, this is a sensible approach.
For anyone building a fabric or a mesh, look for ways to leverage AI to automate data discovery and integration. The effectiveness of the AI engine will depend greatly on the metadata and your knowledge of the data sets; you need to ‘teach’ the engine and keep an eye on data quality. If you have implemented a Data Mesh and are looking for new ways to analyze, improve the quality, or categorize your data sets, look into AI capabilities.
Data Mesh Domains Serve Up Data Products
The biggest difference between a Data Fabric and a Data Mesh is how they each address the concept of domains and data products. The fabric creates a virtual management layer that sits on top of the data sources to create logical domains. Whether it is recommended by AI or designed by an engineer, in a fabric, the domain is managed within a central virtual layer.
A mesh can also rely on a virtual layer to create logical domains and products, but it moves management and delivery closer to the consumer. The Data Mesh adds people and processes to the domain and product concepts. In a mesh, distributed domains are managed in a self-service manner by autonomous domain teams. Each domain team designs and builds data products for their consumer as their primary purpose is to simplify consumer reuse and incentivize sharing. The teams closest to the business problem and the business data, manage the domain.
If you are building a fabric or a mesh, empower your data consumer. Data products should be curated and offered in a manner that enables the consumer to quickly find them, use them, and share them. Self-service capabilities enable domain teams to build their own data products, and some autonomy allows them to make rapid governance decisions. Finally, if you have already built a Data Fabric and are looking for ways to accelerate consumer adoption, consider empowering them to manage their own domains and products.
Governance for a Data Fabric and Data Mesh
A Data Fabric can be described as employing a top-down approach to governance. In a fabric, the metadata and virtual layers are centrally managed. A Data Mesh more closely resembles a bottom-up approach, with distributed domain teams each managing their own data governance. Whether you are implementing a fabric or a mesh, adapt your governance approach to meet the risk vs value profile that best fits the use case. A Data Mesh promotes autonomy to enable and empower domain teams to govern their own areas. A domain with higher risk data may employ strict controls, whereas another domain may choose an open-access approach.
Data Fabric and Data Mesh: Learn from Collective Experiences
Whether you have started your mesh or fabric or are still thinking about how to get started, you have an opportunity to drive continuous improvement and consumer value by learning from the collective experiences and capabilities of both concepts.
Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide technology agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.
Modak’s Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, augmented data preparation, data quality, and governed data lake solutions.
Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.
Starburst is the fastest, most efficient query engine for your data warehouse, data lake, or data mesh. We unlock the value of distributed data by making it fast and easy to access, no matter where it lives. Starburst queries data across any database, making it instantly actionable for data-driven organizations. With Starburst, teams can lower the total cost of their infrastructure and analytics investments, prevent vendor lock-in, and use the existing tools that work for their business. Trusted by companies like Comcast, FINRA, and Condé Nast, Starburst helps companies make better decisions faster on all data.