This blog was co-authored by Starburst Solutions Architect Andy Mott.
Insane in the domain!
Insane in the brain!
Crazy insane, got no domain!
- Cypress Hill, sort of
Data Mesh is based on four central concepts, the first of which is domain-oriented ownership and architecture. In this blog, we’ll explore what that means and delve into the details of what makes this a fundamental shift supporting a decentralized data ecosystem.
What is a domain?
A domain is simply a collection of people typically organized around a common business purpose. Domains will typically start by mirroring the organization and then iterate from there. Examples of a domain for an ecommerce site might include users, merchants, products, marketing, etc. From a functional perspective, the domain could serve several purposes: for example, the merchants domain may own partner relationships with merchants, track products, organize payments for merchants, and so on. Ideally, every domain drives data production (ingestion), transformation, and serving of data products to downstream analytics - this is how data ultimately provides business value.
The challenges with centralized data ownership
As has been seen by countless data teams over time, any disconnect between the data producers and the data consumers ultimately provides a challenge when deriving business value from data. There is an inherent loss of signal in the transition of ownership of data that reduces the value of the data itself. In a centralized data environment, it is frequently unclear who ultimately owns and is responsible for data produced by a domain. Those responsibilities include data production, ingestion, transformation, quality assurance, and serving. CIO of healthcare provider UnityPoint Health Laura Smith says, “One of the biggest challenges for organizations is not the collection of the data itself but developing a team that will apply the data and drive change throughout the organization.”
It’s common these days for companies to play a game of hot potato when trying to understand who is responsible for datasets - the engineering team producing the data is solely focused on the operational system and business function of their product development - the data they produce is an afterthought for them. That it ultimately drives business value and can provide context for business decisions is a bonus, but outside the development team’s scope of concern - they are being judged on the product they create, not the data.
According to Forrester research, between 60 percent and 73 percent of all data within an enterprise goes unused for analytics. Meanwhile, in a recent Accenture survey, only 32 percent of companies reported being able to realize tangible and measurable value from data, while only 27 percent said data and analytics projects produce insights and recommendations that are highly actionable.
In the centralized model, data is ultimately passed to a data team external to the operational function, with data engineers and analysts then trying to understand and derive value from that data as well as those from all other functions. This is, of course, problematic because the data producers are the ones with the breadth and depth of context surrounding the data - they are the most knowledgeable about it. In this model, the analysts are far removed from the data production and the people who know it best. Moreover, the data engineering efforts become a clear bottleneck between the data producers and the analysts. This inefficiency results in a “cycle of doom” wherein any changes or additional information required by analytics takes far too long to produce; by the time the data is updated to the specifications of the analysts, it is often no longer needed or additional changes have been identified. Without a clear link between the data producers and the data consumers, there is a loss of feedback and loss of value in the data.
Domain-driven data ownership
Data Mesh hinges on a shift in ownership of data from an external data team back into the operational domain - without this, it can be argued that you will continue to repeat the challenges detailed above where value continues to be lost as data ownership changes hands. At its core, Data Mesh applies domain-oriented decomposition and ownership to an organization’s data. The domains are responsible for the data they produce - for ingestion, transformation, and serving that data to end users. By shifting ownership and liability of data back into the domain, there is no transfer of data ownership and therefore no value lost - the people who are most knowledgeable about the data are the people preparing and providing the data for analysis. The data becomes another product that the domain produces and is responsible for, and the data engineers docs on data within a single domain, working closely with other domain SMEs to produce valuable data products.
Specifically, domain data product ownership means that product owners and developers have both responsibility and accountability for:
- Creating and serving those data products to other domains and end users
- Ensuring the data is accessible, usable, available, and meets the quality criteria defined
- Evolving the data product based on user feedback, and retiring the data product when it is no longer used or relevant
- Evangelizing and “marketing” those data products to the rest of the organization
Domain-driven technology capabilities
While the social aspect of responsibility for the data is important, to produce a data product requires specific technology capabilities. These capabilities will be determined by the domain, so the domain drives technology capability adoption. For example, a domain may need a more secure upstream environment for PII or financial data, or it could be pulling in data from third party partners. The domains should use the data ingestion, transformation, and serving tools that makes sense for their specific data. That said, the data product format should be standardized and served in a way that is standardized across the analytical plane (aka the Data Mesh experience plane), which enables data product consumers to work seamlessly. The domain should decide upon the data technologies that enable their data product development within the domain environment.
What does this look like in practice?
In practice, domains must include people and processes who can ingest data from operational and analytical planes and produce data products served based on expert knowledge and business experience. The data products from each domain need to be served into the analytical plane for use by analysts and other domains - this means the data should be described by the domain in a way that can be understood and easily used by users outside the domain.
This shift of ownership of domain data ultimately means there is additional breadth to the domain’s responsibility - and additional work for the employees within that domain. This leads to a need for data engineers to be released into the domains from their previous position in a centralized data organization. The reorganization of data engineering under a CTO or CIO’s purview is a familiar leadership challenge to many companies who have been struggling to adequately produce value out of a centralized data organization. To enforce this, there should be incentivization of domains to foster ownership of its data products. In my experience, this ultimately is a good career and organization move for data engineers, are they are able to focus more on data modeling and production of high-quality data products, rather than spreading themselves too thin across too many domains.
There is also an opportunity for software engineers to become “citizen data engineers” within their domain, which is fantastic for career growth and to spread domain knowledge as data products are built out. On the flip side, there is also an opportunity for driven analysts to become more data engineer-like, as they develop more domain-specific knowledge and because they have domain knowledge. That there is a non-trivial skills overlap (e.g. SQL) is a boon to the analyst and data engineer having a common language, and provides career movement for both.
How does this enable Data Mesh?
Domain-driven data ownership and architecture is key to enabling and driving the other three principles governing Data Mesh:
- Domains are the clear owners and producers of data products
- When domains include data products from other domains (whether in the course of product development or in producing additional data products), there should be a contract governing the collaborative relationship between the domains involved
- Data products, including a combination of various data products, accelerate time-to-insight, thereby increase the overall business value and shorten the data-value gap
- Domains control aspects of governance including authorization, which is specific to each data product
- Domains operate within the framework of security, compliance, and regulation defined and enforced by the central IT organization
- Domains create data products on a self-service infrastructure provided by the central IT organization.
How Starburst supports domain-oriented ownership
At its core, Starburst shortens the path between the data and the business value derived from the data. What this means in the context of producing data products is that a domain can rely on Starburst to allow data engineers to focus less on building infrastructure and pipelines to support data engineering efforts. Data engineers can instead focus more on using simple tools they already know, such as SQL to prepare high-quality, low-latency data products for end users. Starburst is also used at the cross-domain analytical layer as the query engine which streamlines and simplifies data product access by analysts and data scientists.
Starburst also reduces the overall number of vendors (and vendor-specific knowledge) required, and with its large set of connectors allows each domain to connect to data wherever and in whatever format it may live. With its SQL-based interface, Starburst enables “citizen data engineers” as well as analysts across the organization by providing a consistent and familiar interface using the lingua franca of data. Further, no matter where you are in your cloud or microservices architecture journey, Starburst can not only support data across disparate architectures, our software is flexible enough to move with you along that journey - adding new data sources or adjusting existing ones is simple.
Curious about how Starburst can help you achieve a more domain-oriented data architecture? Contact us to discuss!