Soda

Guides

Which Comes First: Data Quality or Data Governance?

Both data quality management and data governance are critical data management activities.  After all, there’s not much point managing your data if you aren’t sure if it is the right data, or if it is of good enough quality to use. But, which should come first, data quality or data governance? Learn what you need to start with and how to overcome the obstacles.

Which Comes First: Data Quality or Data Governance?

In this guide
Download Guide

Getting started

Like the proverbial question of which came first, the chicken or the egg, many people in the data management community ponder, which should first, data quality or data governance?

It's not an easy question to answer. Most of us understand what data quality is – “data that is good enough to use” is a simple enough concept to grasp. Data governance, on the other hand, sounds like something that will prevent you from doing your work and, let’s be honest, it also sounds rather boring! But that’s not true; data governance is about understanding the data your organization has and having a structured framework of roles and responsibilities to manage it. This ensures that the right people can make consistent decisions about the quality and use of your data.

Both data quality management and data governance are critical data management activities. After all, there’s not much point managing your data if you aren’t sure if it is the right data, or if it is of good enough quality to use. Done well, data quality management and governance ensure that the right data of the right quality is available to the people and systems that need it. Add other data management activities to the mix and we can ensure that the data is secure, stored logically on systems, and is made available for analysis and insights. 

But back to our original question and why it is not easy to answer. While in theory they are two separate data management disciplines, they are deeply interrelated and you should not be doing one without the other. I often describe them as symbiotic - they support each other and ideally, both activities will be undertaken by the same team. With that in mind, you can see that it is difficult to decide where to start.

In an ideal world, we would get our Data Governance Framework designed and implemented before focussing on data quality. Unfortunately, we do not live in an ideal world, and I have never come across an organization which has done data governance before commencing any data quality activities.

There’s a reason for this. There are always people in parts of your organization who understand when data is not good enough for them to do their job and they take action to fix it. This results in ad hoc data cleansing activities, and maybe some basic data quality reporting but with varying degrees of success. Sadly, without data governance in place these activities tend to be tactical at best and often short lived in the case of data quality reporting.

I would never want to criticize someone who has taken the initiative to improve data quality, but doing so without the foundation of a solid Data Governance Framework leads to duplication of effort and can even make matters worse. Over the years, I have come across countless instances where multiple teams were manually cleansing or fixing the same data set, but not in exactly the same way. This resulted in a variety of different answers to basic questions like, “How many customers do we have?” or, “How many sales did we make last month?” Significant amounts of time then get wasted as different teams are part of heated debates as to why their answer is the right answer!

Whilst data governance supports all the other data management disciplines, one of the primary reasons it is implemented is to support improved data quality. Having data governance in place enables a proactive approach to data quality issue resolution, fixing the source of the issue once. This stops the endless cycle of continuous data cleansing and endless debates about which data is truly correct.

But why start with data governance?

Well, a key part of data governance is a data catalog. A catalog helps us understand and define the data we have. After all, how can we define the data quality rules, if no-one can agree on what that data is in the first place? Everyone thinks that they know what the data is, but it isn’t until you start documenting data definitions that you realize that people have different views. These different views can cause a host of issues and efficiencies and need to be exposed and resolved.

Having data governance in place first ensures that the right people are making decisions about the data quality rules. This is especially important for data used by multiple people across an organization. Without a data owner, many teams make conflicting decisions about what is acceptable for the quality of a data product.

It's easy to say we should do data governance first, but it’s not always that simple. 

If you work in a distributed organization, it may not be easy to agree on one data owner for widely-used data. It may seem easier to let everyone do their own thing, but it does not lead to positive, long-term results. It’s important to design a structure of Data Owners and Data Stewards that can work across organizational silos and encourage communication, so that any decisions about data quality account for everyone’s requirements.

Unfortunately, not many organizations have the luxury of being able to implement data governance before they commence data quality activities. The one exception is start-ups. I've spoken with people in several start-ups over the years and whilst they are starting from a clean slate, they are usually constrained by the need to be as cost-efficient as possible. This means that they are not always open to defining data before they start using it.

So how can you overcome these obstacles?

You need to tie the implementation of data governance and data quality activities to overall company objectives. Explain that if you implement solid data management practices now, you can avoid future issues and, more importantly, increase the organization’s chances of success.

Remember, data governance may sound like a compliance activity, but in reality, when done properly, it delivers real business value. If you are working at a start-up: get it done early; if you are not, get it implemented as soon as possible. Implementation benefits include enabling easier automation, reducing inefficiencies and costs, stream-lining customer experiences, and improving risk management. Further, we must not forget that data governance and good data quality are essential for the successful adoption of AI, but more on that another time.

Whatever the size and age of your organization, and whether or not you have the budget for tools to support your efforts, you can, and should, implement data governance. With all the benefits to be had, it would be foolish not to!

Here are some examples of the pitfalls you should avoid during implementation:

  • Leaving it too late: Too often, data governance only makes it onto the agenda when data quality issues have caused significant harm, or when new systems have failed due to poor data quality.
  • Siloing responsibility: Many business users believe (mistakenly) that IT owns all data. Use your data governance roles to make it clear who is responsible for what. Done well, data governance breaks down silos and improves collaboration within the organization.
  • Applying short-term fixes over long-term strategy: Data governance can take a long time to implement and deliver value, so the tendency to opt for quick fixes is understandable. However, short-term tactical fixes often have to be repeated regularly, leading to a culture where “data wrangling” becomes the norm. We need to find ways of delivering value while developing and implementing a comprehensive Data Governance Framework to ensure proactive and ongoing data quality.

If your organization has a Data Governance Framework and you have already begun monitoring and testing data quality, there are different pitfalls to avoid:

  • Not involving the Data Governance Team: This team helps ensure that your data quality activities adhere to holistic practices and involve all the correct stakeholders.
  • Leaving it all to the Data Governance Team: They do not know everything about all the data.  It’s important that multiple people in the wider business, including Data Stewards and Data Analysts, are involved.
  • Communicating poorly: A lack of communication between those running data quality initiatives, the Data Governance Team, and Data Stewards can result in misunderstanding and conflicting priorities. 

It can be difficult to convince your organization of the value of adopting both data governance and data quality practices. Adrian Smith, Head of Data at Clearspring, has experienced this challenge and shares his experience and advice.

Case Study: Clearspring

“In every organization I've encountered, there seems to be a universal truth about data quality: it's often overlooked until a problem arises. Whether it's discrepancies in reports, processes failing, or data output resembling more gibberish than valuable information, the issue remains the same. Initially, in start-ups or scale-ups, the data pool is small enough that anomalies and outliers can be manually corrected. However, as the organization expands, the volume of data balloons, and the complexities of managing it multiply at a pace that outstrips both the implementation of Data Governance Frameworks and the expansion of data management resources.

This growth phase often sees resources stretched thin, as the focus is squarely on supporting the burgeoning business—why worry about tomorrow when today's challenges are pressing enough? Yet, this approach sows the seeds for future problems. Larger, more mature organizations have learned this lesson the hard way and have responded by establishing roles like data champions or data stewards, who assume ownership and responsibility for data, treating it as the critical asset it is. In contrast, in many start-ups, data is initially seen as a by-product of operational processes, not an asset in its own right. Ownership falls by default to the IT department, who are expected to 'fix' data quality issues with temporary solutions that don't address the underlying problems, leading to a cycle of recurring issues.

Here lies a common pitfall: neglecting the establishment of a robust Data Governance Framework until data quality issues become unmanageable. The lack of clear data ownership and a holistic, business-wide strategy for data management means that when problems arise, solutions are often reactive rather than proactive, short-term rather than strategic. Another pitfall is viewing data as an IT issue rather than an organizational asset, leading to decisions that fail to leverage data's full potential to drive business growth and success.

As data managers, we embody the spirit of the Roman God Janus, looking in two directions at once. We must manage the current data landscape, addressing immediate data quality issues, while also laying the groundwork for robust data governance that will prevent such issues in the future. This dual focus is critical in environments where resources are limited but ambitions are high. We are tasked with maximizing what can be done today while planning for a future where data governance not only supports but enhances business operations.

Adopting a proactive stance on data governance from the outset can transform data from a potential liability into a significant asset. It enables scalability, supports informed decision-making, and fosters an organizational culture that values data quality as a cornerstone of success. Thus, while it may be tempting to postpone data governance initiatives in favour of more immediate concerns, the most significant challenge—and opportunity—lies in recognizing that investing in data governance is not just about avoiding future problems; it's about enabling future successes. By embracing this challenge, we ensure that our organizations can grow and innovate without being held back by the burdens of technical debt and missed opportunities.”

Think strategically and sustainably

The dilemma of prioritizing data quality or data governance will always cause friction. While it is preferable to establish data governance before addressing data quality, in practice, the opposite usually happens.  

If you don’t have any data management practices in place yet, I would encourage you to think strategically and get started with data governance first to provide a strong foundation for your data quality activities. 

If you are already testing for data quality but have not initiated any data governance, take a hard look at how data governance can add value and sustainability to your existing DQ activities.  

And if you are lucky enough to have data governance in place already, make sure that your data quality activities are aligned with your Data Governance Framework and that you avoid the pitfalls listed above. 

If you’re ready to dive in, access a checklist of steps to ensure alignment between your data governance and data quality activities.

The Data Governance & Data Quality Checklist

  • Identify why your organization needs data governance and good-quality data. I’ve seen several drivers for organizations, such as: improved decision-making, better compliance, improved risk management, increased efficiency and productivity, improved customer satisfaction, reduced costs and data-driven innovation. Overall, data governance and good-quality data are essential for organizations to manage data assets, minimize risk, comply with regulations, and unlock the full potential of data for decision-making and innovation.
  • Assess corporate strategies to identify where data is not good enough to meet key objectives. Look, and you may find discrepancies in operational processes and customer interactions due to inaccurate data. Addressing these issues by improving data quality will, in turn, drive growth through optimized operations and improved customer experience.
  • Interview stakeholders at all levels of seniority across the business to identify data quality horror stories. A horror story I see time and time again is when a critical financial report is sent to the C-suite with outdated figures due to a bad data integration process, leading to a disastrous decision based on incorrect information.
  • Determine any regulatory requirements for data governance in your industry. In the healthcare sector, study regulations like HIPAA (Health Insurance Portability and Accountability Act); financial institutions will surely need to navigate through the complexities of Basel III; and, as a cross-industry regulation, carefully consider GDPR (General Data Protection Regulation) requirements for data governance and quality.
  • Identify key drivers for starting data governance and data quality initiatives within your organization. For example, in response to compliance requirements such as GDPR or any other industry regulation, aim to avoid penalties and reputational damage.
  • Interview a variety of stakeholders across your organization to identify any existing data quality or data governance activities. I’ve often found multiple instances of ad hoc data cleaning processes compensating for systemic issues or data silos.
  • Review any existing data governance, data quality, and data architecture documents to identify existing or previous data governance-related activities. Examine policy documents that may show previous attempts to establish data quality standards or Data Governance Frameworks. Study audits of historical data quality reports that may shed light on recurring issues or trends.
  • Determine whether existing and previous activities are/were successful and the reasons why or why not. Use quantitative metrics like data accuracy rates or compliances levels for a tangible indicator of success, and combine that with qualitative feedback from stakeholders from good, old-fashioned conversations.
  • Use the knowledge you gain from the above-listed actions to design a Data Governance Framework and an approach to data quality best practices that incorporates and aligns existing activities to produce a plan with sustainable value. I always make sure that the framework incorporates a clear policy and roles and responsibilities. The goal is to align with the strategic objectives of your organization while fostering a culture of data ownership. Remember that achieving success isn’t a one-time task– you’ll need to implement regular monitoring, feedback loops, and continuous improvement mechanisms to ensure its ongoing relevance and value to the organization.

What’s next? 

  1. Discover Soda’s data quality platform to operationalize your data governance program.
  2. Learn more about Nicola’s data governance training program. Quote ‘Soda’ to receive a discount.

Good luck!