Get early access to free early release chapters (including the newly released chapter!) of the O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale, written by Zhamak Dehghani.
We’re really excited to see the Data Mesh concept emerge as one of the more popular topics from our Datanova events. Pioneered by Zhamak Dehghani of ThoughtWorks, Data Mesh proposes architectural and organizational changes that transform the way large enterprises analyze data. We’ve already explored the idea in this blog, so I’ll skip the overview. Instead, I’d like to look at how this approach is helping data scientists and data analysts. These two on-demand sessions are packed with great insights and details from industry experts, but here are the highlights:
What Data Mesh Means for Data Analysts
What Data Mesh Means for Data Scientists
Here are 4 ways Data Mesh impacts data scientists and data analysts:1. Easily Access More Data
If you want to explore data, you have to find it first, and one of the challenges facing data scientists today is that they can’t always track down the data they need. One of the panelists at our Data Mesh for Data Scientists session, Max Schultze of Zalando, shared an anecdote about a team that typically worked with data in a narrow, well-defined space. When they wanted to build a new model that incorporated data from their new sources, they needed a month just to find the data. The team that was supposed to be responsible for it didn’t realize that managing this data was their job – the person who had been doing the work had left the company long before and neglected to reassign the task.
Unfortunately, this is normal. As Max notes, it’s the kind of thing data scientists have to deal with all the time. One of the foundational principles of Data Mesh is that datasets like this are treated as products, and that specific groups are in charge of making that product easily and readily available to others within the organization. For data scientists – and data analysts, for that matter – that means no more waiting months, weeks, or even days. You’re treated like a customer that needs and deserves to be served well. If you want to explore new datasets, you can reach out to the appropriate group and get to work immediately.
2. Access Higher Quality Data
Both data scientists and data analysts suffer when data is incomplete, incompatible, or outdated. Traditionally, organizations have tried to centralize data to glean insights, but as Zhamak often points out, the process of centralizing data leads to decay. The data is out of date by the time it’s ready.
Data Mesh addresses this problem by keeping data with the groups or teams that know it best. Once they become responsible for the data, and for making it easily and readily available to other groups within the organization, data analysts and data scientists end up working with higher quality data. This, in turn, yields more accurate results.
3. Focus on What You Do Best
We’ve all heard the statistics. Data scientists spend 80% to 90% of their time wrangling data – not doing the high-level work they were hired to do. Often, these are individuals with PhDs, and they’re forced to spend the majority of their time cleansing data. No one wants it to be this way. Not the data scientists. Not the company that hired them. By shifting responsibility to data product teams, Data Mesh takes this work off your plate as a data scientist. Data is ready for you to explore immediately, so you can focus on what you do best.
The same holds true for data analysts. When data is treated as a product, and maintained by those who know it best, then data analysts don’t have to worry about where it’s stored or how to access it. Bank of America Senior Quantitative Analyst Gareth Stevenson summarizes it perfectly in his session:
“Thinking of data this way allows data analysts to focus on what their likely training was… doing maths on the data or thinking about how to present that data in a meaningful way. So if we allow this abstraction layer between the data analysts and the data itself, where the data is left to live where it should live, and we can offer technology that enables that virtualization in a meaningful way, they can focus on the things they actually care about.”
4. Reduce Time-to-Insight
What this all adds up to for data analysts and data scientists is the ability to turn out higher-quality results and insights in less time. If you don’t have to spend a month finding a dataset, wait multiple months for an ETL job to centralize disparate datasets, or dedicate extra time to cleansing and preparing that data, you’re going to generate results sooner.
Plus, these are going to be higher-quality or more accurate insights, since they’ll be based on higher-quality, more recent data.
There’s a lot more to learn about the Data Mesh concept, and how it impacts organizations. We’ve sponsored Zhamak’s O’Reilly Book, and you can learn more at our Data Mesh Resource Center which includes customer use cases.