Data Governance Processes and Policies in the Execute and Build Phase
In the planning phase, you put a lot of thought into what policies and processes you need most and what you want them to look like. Now it’s time to put them to the test in the real world. That means subjecting all of your careful planning to the complex network of technical and social systems already operating in your organization and all of the contingencies therein. For established organizations, this is going to look a lot more like a home renovation than a new build. You want to knock out that wall? Sorry, it’s load-bearing. And by the way, the previous owner was a hobbyist who did their own wiring so you’ll really need to prioritize hiring an electrician so nothing catches on fire. But do not despair! You may need to make some compromises, but the result will enhance everything you loved about the house in the first place.
Create Data Policies, but Don’t Go Overboard
So, during the policy planning phase you cataloged your data, decided what was important, created a data dictionary, and modeled your data. (You did read our last posts, right?) So you know what data is important, how it all connects, what data security you need to address, and who should have access to what. Now it’s time to actually create policies using the people that you engaged (those with technical and domain knowledge) and the knowledge you gathered in the planning phase.
Just so we are clear, creating and implementing policies is essentially about deciding and documenting. You will need to WRITE THINGS DOWN. Don’t let things live in someone’s head, or let each policy hang out with a different person. When turnover happens (or memories aren’t great) this will create chaos. Create some kind of workflow for identifying decisions, writing them down, and storing the information in a shared location.
Furthermore, remember that this is a “renovation”. DO NOT begin without neglecting to start by documenting what you already do and building on the documentation or policies that have already been created.
Below we will discuss some of the most common and useful policies that you should create for your organization. Remember to start where you are.
Data Sharing Policies
To use data effectively across the organization you must have clear policies and practices on data sharing that are consistent with security and privacy standards in your organization. This can be broken down into two general categories:
You need to create a document (documentation) that describes how you share data within your organization. Things you might address in your data sharing policy include:
- How does a user request access to data? Is it automatically given to all employees? How about sensitive data?
- How do you track who has access to sensitive data?
- How will you revoke access to data, either because the user no longer needs it, or because the project has ended?
You will also want to create policies for how you share data outside of your organization. Depending on the type of data, you may consider NDAs (non-disclosure agreements) at a minimum, or a Data Sharing Agreement or Memorandum of Understanding with specific components designed to address things like what data they will receive, how they may use it, how the other organization should protect that data and how and when they will destroy data if they receive it.
In all cases you will want to decide about the roles and responsibilities regarding data sharing. Who decides who gets access and to what data, and who provides or revokes access, for example. You should also decide and be clear about if and when people in or outside your organization may download data. We don’t recommend storing data on individual computers unless there is an express need to do that, and if it is stored there (for example for statistical analysis) that it should be deleted as soon as is feasible.
Identity Access Management (IAM) Policies
Identity Access Management is the development of processes and policies to make sure the right people or machines get access to the right data, at the right time, for the right reasons. For example, staff can access sensitive data if they are working on that project, but only for the length of the project. Or individuals who are responsible for processing and delivering the data to BI tools can access all of the data, but end users of the BI tool can only access de-identified or aggregated data. IAM also involves making sure that people who should not have access are kept away from the data. This often involves revoking access from the appropriate users at the appropriate time.
There are two major components of IAM to consider.
Authentication - This is the component of IAM where you make sure that users are who they say they are so other users can’t impersonate users that SHOULD have access to your data. Authentication requires:
- Something you know (password)
- Something you have (cell phone or token)
- Something you are (bio markers)
- So, if you think about a typical two-factor authentication process (2FA), you might login with something you know, like your password, or something you are, like your fingerprint, and then be required (at what frequency is another decision) to authenticate with something you have, like using a tool like google authenticator or the microsoft equivalent. Identify what authentication is currently required for access to your data, document it, and decide if you want to make it more stringent or granular.
Authorization is the process of determining what resources (data) a user might access, and how they are allowed to engage with the data. For example, once authenticated, a user may be able to only read but not alter data, to update data, or even to alter, copy, and delete data. You may not need granular authorization categories. For your organization, it might be appropriate that all users have all access, or that everyone except for the analyst has read only. Don’t overcomplicate it.
Care and Feeding of Metadata
Metadata plays a role across many different aspects of data governance. You may want to create policies about who can tag and maintain Metadata, as well as standards for creating and storing Metadata, especially related to sensitive data. For example, you would use Metadata in the creation of a data catalog or data dictionary. But tagging data as sensitive will also impact how policies related to protecting data are applied.
Many organizations are passively collecting Metadata thanks to enterprise business tools. These same tools also have seemingly endless self-service knowledge management features that can enable inconsistent and highly customized Metadata at the team or even individual level. If you have ever witnessed a brief but intense period of crowd-sourced file tagging in your organization, metadata may not seem very useful or even like a waste of time. One person's "report" is another person's "brief" is another person's "data set" (or "dataset"); which is why we recommend putting a policy in place to set intentions around when and how to create and use Metadata. Developing and communicating a policy also helps everyone filter out the noise and only spend their time curating Metadata that is actually valuable to the organization. For example:
- Good Metadata often plays a critical role in IAM. Policies related to how data are tagged, and who has access based on the metadata tags should be considered here.
- You may also choose to create policies here regarding when Metadata is captured (usually when data is captured or imported into your systems).
- Decide and document where Metadata is stored. This doesn’t have to be fancy, it could be within your data catalog, in a spreadsheet, or as part of your BI tool.
Data Quality Policies
Data sharing and IAM are essential in terms of keeping data safe. But safe data that is poor quality is not valuable. Data quality is a huge topic, and the element of data governance that is most likely to contribute to trustworthy data. There are many aspects of data quality, and they all could use policies. The following are aspects of data quality to consider:
- Accuracy - is the data correct?
- Completeness - do we have all the data?
- Integrity - is the data complete as well as accurate?
- Timeliness - do we have the most recent data, or the data needed for longitudinal analysis?
- Compliance - does the data meet the appropriate legal and regulatory requirements for our organization?
Remember that creating data policies involves deciding, documenting, and holding people accountable. Decide what an appropriate level of accuracy is (is it ok if we know what year they were born, but not the day of the month?), how complete you need the data to be (you may or may not need physical addresses for example), how timely it needs to be (is it good quality if it’s six months old?), and what compliance looks like for your organization.
Write those decisions down, store them, share them. While you are at it, identify roles and responsibilities. Who is responsible for which aspects of quality? When and how do data quality checks take place? How will you know that there is a problem, and who should be alerted when data quality falls below your set standards? Very few people will determine and implement access management, but anyone who creates, cleans, or transforms data will impact data quality.
Implement ways to hold the people who collect, access and analyze data accountable that are effective for your organizational structure and culture. This doesn’t have to be finger wagging, and can even be turned into a game or competition between colleagues or departments. Measure or estimate a baseline of your data quality, change your practices, and then measure or re-estimate your data quality. Keep the people who have an effect on data quality informed of how it changes. Be sure to acknowledge and even reward them as your data quality improves, and ask them for input on how to make further improvements or address challenges. While you may have assigned data owners or stewards, everyone must understand their own part in the responsibility for your organization’s data.
You Thought You Wouldn’t Have to Deal with People in This Phase? Sorry…
If you only walk away with one thing from this series, let it be this. No matter what anyone (including vendors) might tell you, data governance, like any other business process, involves a lot of human effort. What that means is you need to give people the space to do the thinking, the deciding, the documenting, and also communication and training in addition to the work itself. If you’ve ever worked with a poorly designed and perfectly built but useless database, you know what we are talking about.
Successful data governance initiatives hinge on the ability to assimilate these policies into an organization’s established ways of interacting with data. The execute and build phase of data governance is therefore going to involve data culture shifts. Culture shifts can take time and may not be linear, so be patient with the people part of this. Transparency and trust-building are paramount, as is accountability. If your organizational units don’t trust each other, sharing data is going to take on a different dimension. Try to understand people’s concerns, especially if they have been through failed efforts before. Listen, answer questions, and make sure you understand their concerns. Reinforce, and remember that habit formation takes time.
We will keep telling you this: communication and training is key to making policies effective. Don’t make them a secret. While you are at it, telling people why these are the policies is more powerful than you think. Also, humans being humans, communicate early and often. One time is not enough. Reinforce these policies regularly.
The roles you decided on in the planning phase need to be formalized and reinforced. For leadership, this may mean dedicating financial resources to ensure the people responsible for those roles have the training and support they need. “Support” includes ensuring that new data governance responsibilities aren’t piled onto someone’s already overflowing plate. Let people know that you are aware of the data governance tasks they are ALREADY doing, and that you know it will take effort to do more, or do things differently. Let them know that this is important, and explain why it’s important. If leadership treats data governance as an afterthought, everyone else will too.
Look out for our next Data Governance post where we talk about how to sustain the efforts you have put into creating and executing policies. Sign-up for our newsletter to receive the next post in your inbox!