Back to Blog
Data Governance
The Datamasters: Data
Owners vs. Data Stewards vs.
Data Custodians
By Ben Herzberg
Chief Scientist
November 11, 2021
There is no such thing as a one-size-fits-all data governance framework that works for all
organizations. However, one idea applies universally, regardless of an organization’s scale
or industry: having well-defined roles and ensuring that all stakeholders understand the
overlaps and differences between those roles is crucial for the success of any data
governance initiative. Let’s simplify your path through the data governance maze. Given
how important data governance is, we will demystify the confusion surrounding the
different roles central to data governance in this post. We will take a look at examples of
how these roles may look in practice across varied organizations. And most importantly,
we will examine why this information is so essential and why you should care. In this
article, we will discuss:
Why Does Data Governance Even Matter
The Three “Data Masters”
The Data Owner
The Data Steward
The Data Custodian
The Differences Between Data Governance Roles
Data Owner vs. Data Steward
Data Owner vs. Data Custodian
Real-World Examples of Data Steward Roles
Does It Really Matter What They Are Called?
Takeaways
Why Does Data Governance Even Matter?
First, let’s provide some context: Most people think that the phenomenon of data becoming
a valuable resource for organizations (and being widely viewed in that light) as a relatively
recent development. After all, the term “big data” was only coined in 2005. So, it is easy to
forget that the implicit recognition of the value of data is at least as old as civilization itself,
if not older. As early as 40,000 years ago, ancient tribes maintained “tally sticks” to store
and analyze data about food and harvests in order to predict how long their supplies would
last. Hammurabi is known to have collected detailed statistics about enemy troop
movements and the strength of their forces, often refusing to deploy his troops without
having this data. That said, indeed, the sheer scale of data that we are working with today
dwarfs everything that came before by several orders of magnitude. Consider an example
from the realm of medicine—specifically cancer. One of the most promising breakthroughs
today in treating cancer is the ability to map the genomes of cancer cells to identify
mutations and determine the most appropriate treatments. As Dr. Heath explains in her
brilliant TEDx Talk, the entire genetic profile for a single cancer patient can amount to
about one petabyte (a little over one billion megabytes) of data. To put this number in
perspective, a typical photo you take on your phone is about six megabytes in size. To get
to one petabyte worth of photos, you would need to take 178 million photos, or about
9,805 photos per day for the next 50 years. And that’s just the data for one patient.
Annually, there are an estimated 19.3 million cases of cancer worldwide on which we
collect data today. While data of this magnitude can be astonishing, an important detail to
note is that data is only useful to the extent that it can be effectively managed such that
professionals can store, access, and analyze it as necessary and do so in a manner that
takes full advantage of the high processing speeds possible with modern technology. In
other words, the method used to manage the data should not limit the data’s potential,
which, in this example, presents literal life-or-death stakes.
Is Your Organization’s Data an Asset or a Liability?
Data may play a very different role in your organization than in the example above, but
regardless of its specific purpose, most business executives now agree that data is among
their organization’s most valuable resources and that it is important, if not essential, to
business success. Even so, it remains relatively common for organizations to operate
without good Data Governance practices. Organizations often have a lot of data, but it is
not well documented or standardized, so they lack knowledge about the information they
have. Or, even when they do know, they encounter barriers to finding or accessing the
appropriate data when they need it (which is worse than not having the data in the first
place because you pay the cost for data collection and storage but do not reap the
benefits of it). Further, when organizations can find their data, they are often not entirely
sure whether it is reliable enough to use. We have all laughed at anecdotes where an 85-
year-old retiree receives promotional flyers inviting them to explore the back-to-college
collection because the sender could not get their data right. However, things are not as
amusing when the wrong person gets a traffic ticket or a summons to appear in court
because of biased data. They are definitely not funny when they involve data exposure,
especially of a private nature. For organizations in the modern regulatory environment,
poor data governance can transform data into a severe liability, rather than an asset,
exposing the business to crippling or severe privacy penalties. The moral of this story is
that data can only be valuable when we know how to use it, manage it properly, and give it
the respect it deserves. As an organization, adhering to this standard involves having a
comprehensive and well-established protocol in place on how to manage data and, equally
importantly, having a team of people who understand their specific roles and
responsibilities in implementing these practices. Read More:
Blog: Why Data Ownership is Hard!
Blog: When Does RBAC for Data Access Stop Making Sense?
How Satori helps control access to sensitive data
The Three Key Roles in Data Governance
If you have researched data governance implementation in the past, you have surely
already come across many roles, ranging from the mundane-sounding managers and
librarians to the exotic “ambassadors” and “champions.” Here are the three most important
roles that any organization needs to understand in the context of data governance:
Data Owner
Data Steward
Data Custodian
It’s worth noting that it is seldom the case that any of these Data Governance roles
represent a distinct, exclusive job title. In most cases, you are not going to be hiring a
person into a new position. Rather, your existing team members will take on various data
governance responsibilities, but these are the terms used to describe those different sets
of responsibilities. Here is a quick overview of each role before we examine what they look
like in practice across organizations of various sizes.
What Is a Data Owner?
A Data Owner is the person accountable for the classification, protection, use, and quality
of one or more data sets within an organization. This responsibility involves activities
including, but not limited to, ensuring that:
The organization’s Data Glossary is comprehensive and agreed upon by all
stakeholders
A system is in place for auditing and reporting data quality
An escalation matrix is in place for data quality issues
Actions are taken to resolve data quality issues within a defined timeframe
Most Data Governance experts maintain the view that there should only be one Data
Owner for a given data set. In cases where multiple stakeholders are concerned with the
same set of data, it is important to designate one individual who will assume the Data
Owner role, and then they may consult and collaborate with other stakeholders as closely
as necessary. To fulfill the obligations listed above, a Data Owner needs:
The authority to make any changes required in terms of workflows, practices, and
infrastructure to ensure data quality
The resources to initiate actions for ensuring data quality, such as data cleansing and
data audits
In practice, this means that the Data Owner role has to be assigned to someone relatively
senior, typically in upper management. Without adequate authority and access to
resources, a Data Owner will be ineffective at fulfilling their role, and this shortcoming
cascades down the entire Data Governance chain, defeating the whole initiative. However,
most senior management figures do not necessarily understand the finer technical details
about a data set or its management. They are also almost always constrained for time,
meaning that they cannot realistically implement all of the processes required for a Data
Governance framework to be effective. That’s where Data Stewards come in.
What Is a Data Steward?
A Data Steward is a subject expert with a thorough understanding of a particular data set.
The Data Steward is responsible for ensuring the classification, protection, use, and
quality of that data, in line with the Data Governance standards set by the Data Owner. To
understand the meaning of a Data Steward, remember that “subject expert” does not
necessarily mean they come from an IT background. Depending on an organization’s data
and business nature, a subject expert might have experience in business, operations, IT,
or a project-specific function. Typically, the Data Owner appoints a Data Steward.
Depending on the scale of an organization and its data, one or more Data Stewards may
be appointed to assist the Data Owner in implementing the organization’s Data
Governance policies.
What Is the Role of a Data Steward?
Data Stewards play a crucial part in ensuring that the data in their care is of high quality
and is fit for use by all data stakeholders in the organization who are concerned with that
set of data. Some organizations also describe this role as a “Data Quality Steward.”A good
Data Steward must have the ability to see beyond silos and implement rules and
processes for the data under their care. Although they do not own the data, they must
thoroughly understand how that data needs to be documented, stored, and protected. As
David Plotkin explains in his book, Data Stewardship: An Actionable Guide to Effective
Data, there are four distinct types of Data Stewards:
Business Data Stewards
Operational Data Stewards
Technical Data Stewards
Project Data Stewards
As we mentioned earlier, each of these specific roles refers to that individual’s functional
background in the organization. When an organization requires different types of Data
Stewards, or multiple Data Stewards of the same type, for a common set of data, they
must often work together to ensure effective Data Governance. In many cases, a Data
Steward may not necessarily have the expertise to manage the data’s storage, retrieval,
and formatting. This brings us to our next role: the Data Custodian.
What Is a Data Custodian?
A Data Custodian is responsible for implementing and maintaining security controls for a
given data set in order to meet the requirements specified by the Data Owner in the Data
Governance Framework.
The Differences Between Data Governance Roles
Role titles are useful because they allow individuals both within and outside an
organization to quickly get a sense of the role’s responsibilities. Unfortunately, because
data can be quite abstract, there is a lot of confusion surrounding the titles of the different
roles associated with Data Governance. Let’s uncomplicate it.
Data Owner vs. Data Steward
Given that Data Stewards are appointed to assist a Data Owner in implementing the Data
Governance policies, there is a fair bit of overlap between their profile descriptions.
So What Is the Difference Between a Data Owner and a Data Steward?
A Data Owner is accountable for Data Governance outcomes, whereas a Data Steward is
responsible for the Data Governance tasks required to achieve those outcomes. In other
words, the Data Owner role is results-focused, while the Data Steward role is task-
focused. For instance, a Data Owner might be accountable for data excellence metrics,
such as audit findings and quality scores. They may also be accountable for business
metrics, like the impact of Data Governance on strategic goals — such as the quality of
customer data and the effect it has on the success of a direct mail campaign for example.
By contrast, a Data Steward might be responsible for ensuring that all items on a Data
Governance checklist are implemented and that problems in implementation are
prevented and/or resolved in a timely manner.
Does Your Organization Need Both Data Owners and Data Stewards?
Whether your organization needs both roles depends on the scale and scope of your Data
Governance program. Large organizations most likely need both roles, while, in smaller
businesses, the Data Owner and Data Steward can be one and the same person.
Data Owner vs. Data Custodian
A lot of people confuse Data Custodians with Data Owners. This misconception probably
arises because Data Custodians are often the ones physically or directly handling the
storage and security of a data set. But just because data is stored on a device controlled
by someone does not make them the Data Owner.
The
data may be in their drawer, but that doesn’t make it theirs. A good way to think about this
is in terms of money in a bank. When you deposit your money in a bank, just because the
money is stored in the bank’s vault does not make the bank the owner of that money!
So What Is the Difference Between a Data Owner and a Data Custodian?
A Data Owner is an individual, usually in a senior business role, who is accountable for the
classification, protection, use, and quality of one or more sets of data. A Data Custodian is
typically someone in an IT role who is responsible for maintaining the storage and security
infrastructure for one or more data sets in a manner that meets the requirements of the
organization’s Data Governance policy. In small organizations where the roles of Data
Owner and Data Steward may be held by a single individual, the Data Owner is likely to
directly delegate day-to-day tasks (e.g. backups) to Data Custodians.
Real-World Examples of Data Steward Roles
To understand how Data Stewardship plays out in practice, let’s look at a couple of real-
world examples of these roles in different organizations.
Data Stewardship in a Retail Chain
A high-end retail chain lets customers participate in a sweepstake by dropping their
business cards in the contest boxes located in each store. By providing their personal data
and participating in the contest, customers consent to receive the chain’s promotional
marketing emails. Starting from the bottom-up, in this scenario:
A back-office employee collects and manually records each customer’s data in the
company’s database. This individual is not a Data Owner, Steward, or Custodian, but
rather they are simply a Data Creator.
The customer data is stored on a cloud server, and an IT administrator is the Data
Custodian who must ensure the data is secure and accessible only to authorized
personnel.
A person on the digital marketing team is responsible for cleaning and validating the
data set before using it in email marketing campaigns. They are appointed the Data
Steward, responsible for ensuring the quality of email marketing data through
systematic formatting, cleaning, and enriching procedures as specified by the Data
Governance policy.
The Head of Sales is accountable for sales targets and is very invested in the success
of marketing campaigns. They are designated the Data Owner for this data set
because they are in a senior position with insight into the organization’s goals andhave
the authority and resources to make decisions to improve data quality and security
(e.g. by investing in technology to automate data capture and digitization or by
enforcing authentication safeguards to allow access to the data).
Data Stewardship in a Manufacturing Business
In a contract manufacturing company, the Production Manager is designated as the Data
Owner for all production data. In turn, the Data Owner appoints several Data Stewards as
follows:
Production Shift Supervisors were Data Stewards for material usage, cycle time, and
part output data
Maintenance Engineers were Data Stewards for machine performance, availability,
breakdown, and time-to-repair data
Production Planners were Data Stewards for utilization and efficiency data
The Quality Lead was the Data Steward for defect and rejection data
Each of these Data Stewards is responsible for the quality of the data in their care,
including its capture, storage, security, and availability for concerned stakeholders. It’s
important to note that this structure will not necessarily work for all manufacturing
companies. Even when the different stewards are competitors engaged in the same
activities, their business goals and internal processes are likely to be quite different, which
may require a significantly different map of Data Governance roles. Also, in this example,
the Production Data Stewards, Planning Data Stewards and Maintenance Data Stewards
all need access to data that is generated by the same set of machinery. But this data is
captured and stored in a local server, which is operated and managed by the
organization’s IT department. An individual in that department is appointed as Data
Custodian.
Does It Really Matter What They Are Called?
In some organizations, people still find themselves confused between role titles, despite
having clear definitions in place for each Data Governance role and its respective
responsibilities. There may even be resistance within an organization to some titles. In
such cases, it may be more productive to change the role title to whatever people find less
confusing and more acceptable. Ultimately, it doesn't really matter what each person on
the Data Governance team is called — as long as there is clarity across the organization
on what needs to be done and who is supposed to do it.
Takeaways
Data on its own does not solve problems or add value; effective management and
application of data does.
Unsystematic approaches to managing data can quickly turn data into a liability for an
organization, rather than an asset.
Properly leveraging data as an asset and implementing measures that benefit the
enterprise requires support, buy-in, and involvement at the executive level.
To fulfill their job functions well, many employees who use a data set in an organization
are dependent on others further upstream to process the data correctly, which cannot
be ensured without well-established Data Governance practices.
A key requirement for effective Data Governance is to implement a system with
transparent roles and responsibilities and clear definitions about:
Who is allowed or obliged to take which actions
What specific data sets they are allowed or obliged to act on
When (i.e., in which specific situations) they are allowed or obliged to take such
actions, and
What methods they are allowed to use
While data can be a resource shared by several stakeholders, accountability for Data
Governance is never shared: it is solely the Data Owner’s responsibility. Data
Stewards may have some overlap in responsibilities, but these need to be defined with
clear matrices for escalation in the event of problems.
High data accuracy and strong data management is a team effort. Managing data with
an inclusive approach and distributing responsibilities across traditional boundaries
allows for superior data quality.
Better data quality presents opportunities for improved analytics and increased
business exploration.
How Satori Helps Data Owners, Data Stewards, &
Data Custodians
Satori enables the “data masters” of an organization to enable access to data without
requiring any help from IT or Data Engineering teams. In addition, they can each tag and
describe their data sets, even when it is scattered across several data platforms. Further,
Satori enables continuous sensitive data discovery so that these professionals know
exactly when new sensitive data is introduced. Finally, Satori enables them to create
security policies, including fine-grained security policies, without the need for
implementation by data engineers. Read More:
How Satori helps control access to sensitive data
Book a demo meeting with one of our experts
Blog: Why Data Ownership is Hard!
Blog: When Does RBAC for Data Access Stop Making Sense?
Learn More About Satori
in a Live Demo
BOOK A DEMO
About the author
Ben Herzberg
| Chief Scientist
Ben is an experienced tech leader and book author with a background in endpoint security,
analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of
Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.
Back to Blog
Related articles
Data Governance
Why Native Database
Audit Logs May Have
Limits
Native database auditing
capabilities, while necessary, in
some cases could increase the
complexity of an...
By Ben Herzberg
October 4, 2022
Data Governance
Agile Data Governance
with Satori
Agile data governance is crucial for
organizations using cloud data
stores with constantly changing
data,...
By Ben Herzberg
August 11, 2022
Data Governance
Why Data Engineers
Should Take a Step Back
from Cloud Data
Data engineering Security
teams can spend
a significant portion of their
valuable time on cloud data...
By Ben Herzberg
August 4, 2022
440 N Wolfe Rd, Sunnyvale,
CA 94085, United States
contact@satoricyber.com
COMPANY
Company
Events
Our Team
Careers
News
Contact us
Terms of service
Privacy notice
Cookies policy
RESOURCES
Company
Events
Our Team
Careers
News
Contact us
Terms of service
Privacy notice
Cookies policy
FOLLOW US
© 2023 Satori Cyber Ltd. All rights reserved.