In the beginning
In 2021, the journey from a centralized data organization to a data mesh had only just begun for HelloFresh.
Less than a decade ago, HelloFresh, like so many other companies, had a small, specialized team that focused on warehousing data and producing reports for analysts and executives. They fashioned themselves a siloed existence, growing separate from the rest of the organization as the only ones who understood their data, how to manage it, and how much it could be trusted.
But as the volume of data and demand for access to it increased, the team struggled to keep up.
Their specialized data management skills and processes kept most of the organization from accessing or using data. As the team struggled to keep up with demand, the quality of the data suffered and the lack of reliability and quick access to the data was starting to stagnate innovation.
In early 2020, they decided to stop the constant firefighting that their Data Engineers were doing due to a lack of data quality standardization and uncertain ownership of data in the organization. To unlock analytical data at scale, the central, specialized team decided to pivot from data warehousing to a data mesh construct in which they built the tools and programs that would enable everyone to help themselves to data they could trust.The journey had begun.
Their ultimate goal is to build data products that have a purpose and that people can trust, high-quality assets that the rest of the organization can quickly discover, understand, and securely access. As part of their phased approach to implementing this new organizational mindset, they elected to use Soda to tackle data quality testing and monitoring.
This case study on HelloFresh seeks to illustrate an example of where Soda sits in an organization, how it fits into the data mesh landscape, and the value it provides in expanding internal adoption and ownership of data quality.
Data domain teams
In an effort to decentralize data ownership, the data warehousing group started by building teams of data specialists that operate autonomously, that were not a part of a centralized “data ownership” construct. These Data Domain Teams, as they came to be called, are comprised of Data Engineers, Data Analysts, Data Scientists, and Data Product Managers. These teams take ownership of the data in their domain and provide the support and services that come with that responsibility. They build and maintain data products while adhering to compliance standards and ensuring their data products are accessible to everyone in the organization via a self-serve data platform.
Data Domain Teams do not report to a centralized body that controls access to the data or dictates mandates about data products. However, they do have access to federated data governance standards and data quality and management tools to facilitate their objectives and meet the needs of their customers, the consumers of data in the organization.
The data warehousing group took these necessary steps to pivot towards the Data Domain Team model:
- Maintain the existing data warehouse to meet established commitments.
- Hire people to build a new cloud data infrastructure.
- Enable the people asking for reports to serve themselves via a data platform.
- Slowly decommission the data warehouses, disassemble old commitments, and set up new Service Level Objectives (SLOs).