The Importance of Policies and Processes for Governing Your Data
The work of defining and implementing policies and processes could be a book all on its own. Often, organizations we work with are overwhelmed or confused about ”policies and processes”. So in this blog post we will talk about some common categories for organizing your thoughts about data governance policies and processes, and provide some examples to get you thinking about where you might start.
We often find ourselves using home building analogies for some reason. Maybe it’s because we build things, whether it’s a database, a data warehouse, or a dashboard or other visualization. Maybe home building is just a handy reference no matter what you are doing. Building data things is like doing plumbing and electricity. Actually creating data models, pipelines, connecting APIs to data stores, connecting data stores to BI tools, it makes sense here.
You can be the best plumber, build solid leak-proof connections, or similarly, be a wizard at rewiring a house. Now imagine that you have a lot of plumbers and electricians working on a house. You need a blueprint, that’s your data strategy. But you also need to have an agreed upon understanding as to what the best practices are for all the plumbers and electricians, knowing that they are following a plan, that they understand why that’s the plan, and that they all act in similar ways. Not to extend the metaphor too far, but they should agree and act on standards for when you ground a wire and when not to, when to use PVC and when to use copper piping, and know when to tear out a wall and when you can avoid that.
Ok, enough with the building, but I suspect this analogy will come back to us time and again. There are brilliant architects and home designers, but the work we tend to focus on is the nuts and bolts of what, why and how beyond that.
We can think about processes in the planning phase, the building phase, and the sustaining phase. These phases aren’t entirely distinct from each other, but it’s a way to think about the work. For this blog post we will focus on the planning phase. What should you do to get ready for making those decisions, and to prepare for creating and implementing policies related to data governance?
Policies and Processes in the Planning Phase
You could just start making or improving policies at any point. And that’s sometimes what happens. But given the number and types of tasks, departments, and skill sets that are required to establish and improve data governance, and the importance of collaboration and communication, there are some things to think through before you begin to make the process go more smoothly.
Planning often gets a bad rap for not being the "real" work of handling data. But if we continue with the construction metaphor these initial steps outlined below are like getting an inspection of your critical infrastructure before you start changing things.
Determine compliance and privacy requirements
One place to start with data governance planning is looking at your compliance and privacy requirements. Once you know these requirements, and in particular how they relate to your sensitive data, you can determine how they will impact your policies. You want to begin by identifying the data that most needs protection. The need for protection could be imposed externally, such as with HIPAA requirements, or internally, based on what your organization considers sensitive or valuable data. Of course we recognize that this can be VERY important (and stressful) for organizations who worry about being fined. We encourage you to consider compliance requirements related to HIPAA, FERPA and the handling of financial data as simply entry points for implementing good data governance practices.
Identifying what your sensitive data is (whether it’s credit numbers, health records, or the home addresses of domestic violence survivors) is an important component of cataloging and tagging data. Identifying this class of data will prepare you in the implementation phase for incorporating this into your metadata tagging policies, it will inform how you inventory and catalog data sets, and it will inform policies on access and roles.
Knowing what sensitive data you have will also help you make decisions about training staff on how to handle sensitive data. Later you will want to develop policies and communications around not having people print or download data they shouldn’t and other such activities. These activities support HIPAA and other types of compliance but may also be good general data governance policies for your organization anyway.
In the planning phase, you want to have discussions with leadership and data stewards to make sure you’ve identified all the sensitive data and its location. Keep an eye out for any shadow data, where someone might have a spreadsheet or other location for data that might not be incorporated into your primary systems. This is also a good time to think about sharing (receiving or providing) data that is sensitive for other partners or collaborators.
Assess current risks to security and compliance
In identifying the data in the previous step, it’s important to also talk with people in the organization who understand the type of data, or the compliance requirements (this could be your Chief Data Officer, your Lawyer, or any person with knowledge of the compliance or regulatory requirements involved with that type of data) to understand the risks of NOT appropriately protecting your sensitive data. Before creating policies and processes, it’s good to understand what the impact is of not complying or having good data governance practices around the type of data in question. For some organizations that might mean fines, a PR disaster, or a loss of trust. For some it is more of an ethical/values based decision, as no one may ever know that the data was dealt with inappropriately but you have a duty to your members, clients, etc. to protect their privacy. Identifying the level of risk allows you to prioritize not only the sensitive data, but data subsets. This will allow you to place an appropriate level of access control for those data sets, and also to determine accountability measures for those who do not follow the data protection policies. Sanctions for staff and data partners may be more severe when the risks to the organization are higher for a particular data type.
Determine data quality requirements
Another clear and common sense place to focus is on your data quality requirements. In the most basic sense, high quality data is data that you can analyze and trust. You won’t want to analyze or aggregate data if it is not measuring what’s important, is not accurate, has many different formats, or is missing. Remember that the goal of data governance is to have trustworthy data. And clean, high quality data is the first and arguably most important step. Missing data is right up there, too.
In order to ensure data quality you must:
- Create standards around data for your organization
- Decide who is responsible for the data (overall, by business unit, by data set)
- Document and communicate the data standards across the organization
In the planning phase, you decide what data quality looks like and why, before you move on to creating policies and creating accountability around those policies.
Start by thinking about what your metrics are for success. What does it look like in your organization if you have the level of data quality you need? What is the impact of clean data in your organization and how much does it matter? (Remember that no organization has or needs “perfect” data, or 100% data quality). If you have the level of data quality you need, it can facilitate sharing data and collaborating with different departments (or other organizations). High quality data helps people do their jobs, reduces the amount of time people spend hunting information or manually cleaning data, and means that your dashboards and visualizations make sense.
Decide who will execute processes (and how they will be held accountable for those processes)
The planning phase is a good time to decide who will decide what the policies and processes will be around data quality, and who will execute those processes once created. Deciding “who” is part of the roles and responsibilities work we discussed in our blog post on people and data governance, as is the work of holding people accountable. In the planning phase you want to actively acknowledge that you are creating a culture of data governance by making the work something other than an add-on to existing job duties, and that consistent practices are the only way to have consistent data quality.
Inventory and assess your data
You can’t figure out how to manage your data if you don’t know what you have and where it is. Tagging and cataloging your data is a process that will take place throughout the data governance lifecycle, and it’s often the place you start with planning and making decisions about how you will handle your data. Each of these tools for assessing and managing your data will be covered in separate posts, but we want to identify them here as an essential component in the data governance framework.
A data catalog is a detailed inventory of ALL the data in your organization. The purpose of the data catalog is to help data folks quickly find the most appropriate data for any analytical or organizational purpose.
A data catalog is also where you store your metadata (data that describes your data). It tells you where the data is stored, can document what systems are feeding into what other systems, and can also include the class of data, the owner of the data, and the source of data. More technical data catalogs might include information regarding table schemas and names or other information relevant to the database build. Most data catalogs are not so technical that they can’t effectively be used by non-technical but data savvy organizational users.
This is a good time to inventory and archive unused data at least once per year to make sure you have what you need, but nothing more.
Using a data catalog you can
- Search for data by context and description
- Find relevant data stories
- Ask for access (justify)
A data dictionary is another way to organize and standardize your data. It documents information about a specific database and the fields within it. It can help with creating standard data formats across sources, and is also important in making sure that all the people working with the data are defining fields in the same way across the organization.
Data standardization can be thought of as a workflow that converts the structure of different data sets into a common format. Sometimes this takes place by hand, and sometimes by data cleaning tools.
For example, you might have people across the organization recording January 1 in all these ways:
- January 1, 2022
- Jan 1, 2022
When you go to organize and/or analyze the data, this causes problems.
A data dictionary can be as complex (like the one a database analyst or data engineer would use) or as simple as you need. We usually create a spreadsheet that includes information like the field name, whether it should contain numbers or letters, how long it can be, and what the valid values are. You can then share the data dictionary with others so everyone is on the same page, and update as decisions are made about new fields or changes to data formats.
Your data dictionary might also identify important fields that allow you to connect to other data sets such as:
- Unique IDs
- Social Security Numbers
- Data domains
Data Modeling: If Necessary/Helpful/Appropriate
A data model organizes data elements and standardizes how the data elements relate to one another. Data models aid communication between the people defining the requirements for a system and the people defining the design in response to those requirements. They are used to show the data needed and created by organizational processes.
A data model is fairly granular and often used as part of architecting for database development. Data analysts and engineers would use data models in a different way from decision-makers. Consider a data model if you are working to identify which data you need to pull from a data source, how you might integrate data across sources, or how your systems might need to relate or produce reports.
Also, keep in mind that you may need a less granular data model that outlines how your systems connect. Don’t get too bogged down in the difference between a data model, an entity-relationship model, and an architectural diagram. Find the level of detail that you need for the work you are doing, and map out the most important components. Data modeling is documentation but also communication, so there’s a balance.
Creating policies and processes is the heart of data governance
Proper preparation is the key to keeping people engaged and making sure you are focusing on the most important improvements. Determining your compliance requirement (and how you want to treat your sensitive data), determining your data quality standards, and conducting an inventory of your data are a few of the most important aspects to focus on as you are planning and getting ready to tackle the work of changing how your organization operates and creating more consistent practices around your data.
Look out for our next post where we talk about the implementation phase, where you implement policies and practices and build systems to have more trustworthy data.
And sign-up for our newsletter to receive the next post in your inbox!