Posted on

July 14, 2023

Data Governance

Amelia Ayoob

A Deep Dive into Data Cataloging

Image for organized and tagged books on a shelf

A data catalog is one of the more accessible outputs of a data governance project. It is valuable to anyone in your organization who needs data and likes to spend as little time tracking it down as possible.

Remember a couple of blogs ago when we said data problems are people problems? Developing a data catalog can also be something of a diplomatic mission, resulting in a resource that will save your coworkers’ time and demonstrate some of the benefits of good data governance.

Because the concept of data governance and its related terminology is multifaceted at best (and slippery at worst), let’s pause to establish some definitions used in this post: A data catalog for our purposes is a central source of basic information about an organization’s data resources. You could also think of it as an inventory or a directory. By “resources,” we mean whatever assets containing data created or collected by an organization that people need to do their jobs. While the focus of data catalogs is often individual data sets formatted as tables, you can (and, we would argue, should) catalog other resources created with that data such as dashboards, presentations, and written documents, which we will collectively call “reports.”

At Inciter, data cataloging is central to how we create an Impact Blueprint because we need to understand what data is important and how it’s being used across the organization.

‍We approach cataloging as an iterative process of collecting examples of reports and data, consulting the people working with the data and developing the reports, documenting what we’ve heard from data stewards and observed from the examples, and validating that we’ve accurately represented what we learned. The difference between the cataloging we do for an Impact Blueprint and the cataloging an organization would do as part of a data governance project is that the former is a snapshot. A data catalog maintained by an organization should be a living document with (and here’s where the governance part comes in) defined roles, accountability structures, and processes for maintaining it.

Getting Started

To develop a catalog you first need to determine what resources need cataloging. How you identify the resources your organization creates and/or uses will vary depending on the culture and your familiarity with and level of access to systems and data. If you feel confident that you generally know what’s out there, you may opt to start by listing all of the data sets and the representations / configurations of one or more of those data sets you can think of. However, if you are unsure what data people are using, ask around to identify what reports people need and who puts them together. Why focus on reports? Typically more people in an organization interact with parts of a data set - such as exports from a database - than with the entire “raw” data set with all columns and rows visible. While you eventually want to catalog both, starting with the “data products” can help:

Engage more people in your data governance project by talking about data in formats they recognize and value.
Create a resource that is useful to anyone at your organization who interacts with data, whether they are trying to find the appropriate data to analyze or simply trying to track down the link to that quarterly financial report their boss misplaced again.
Identify organizational trends and outliers in the way data is accessed and used, which will come in handy for developing standards around data preparation and quality.

Talking to the people who put these reports together about what the report is for, where the data comes from, and where the report is stored will help you capture the basic information needed for cataloging and help you build relationships with data stewards in your organization. For any reports you can review and catalog without help, you can always engage data stewards by asking them to check your work (while also making them aware that they now have a central location to see what other reports are out there and where to find them!).

Creating the Catalog

Once you know what to start cataloging, you have to decide how to catalog it. Data catalog entries should, at a minimum, briefly explain what the report or data set is, where to find it, and who is responsible for maintaining it. The best design for a data catalog is whatever people in your organization will actually use and keep using. This can be as complex as dedicated software, (stay tuned for our next blog post on technology), or as simple as a bulleted list. We are partial to using spreadsheets because they are quick to set up and organize the metadata you create in a format that is easy to filter and even store in a more complex database later.

Let’s take a look at an example made in Google Sheets by cataloging two resources developed using fake data about the distribution of grant funding for workplace injuries. The first contains data about grant money distributed by state; and the second uses that funding data as part of a dashboard.

Resource #1: A spreadsheet of fake data detailing the distribution of grant funding by a hypothetical organization.

‍

Resource #2: A dashboard displaying grant distribution data by state.

‍

First, establish resource classifications: Using the terms from this post, ours would be “table” and “report.” These terms can vary between organizations and could also be expanded to include, for example, a separate category called “tool” for interactive resources like applications. Including standard classifications allows you to list all of your resources in a flat format that can be sorted or filtered.

‍

Next, decide what information about your resources (metadata) will be required and how to break it down. We would suggest:

Resource Title: What do most people call it? If it’s an acronym, adding the full name is also recommended. An example would be: JSR (Job Status Report).
Description: One to two sentences on what the resource is and what it is for.
Resource Owner: Which unit / department, or role is responsible for maintaining it?
Owner Contact Information: This will likely be an email address.
Access Instructions: This could be a link to a document or website for something public facing or widely accessible within the organization. For anything involving more sensitive data, additional information about how to request access may be included here.

‍

From there, determine whether you want to leave space for other information that is useful for understanding the scope and use of the resource but not essential for finding it, such as links to other documentation or more technical details about data quality and maintenance standards.

Finally, store the data catalog somewhere people can find it and tell everyone - multiple times - that it exists!

‍Be sure to establish roles and a process for regular updates to the catalog so that it continues to be a relevant resource. Our example has a tab to schedule and monitor updates.

(And by the way, if you would like a copy of this template, send an email to amelia@inciter.io)

‍

Additional data catalog examples

Although the Consumer Financial Protection Bureau’s Public Data Inventory is a web-based resource, it uses a simple tabular format similar to our Google Sheets example. Note the different data type options and the addition of update frequency information.
Data.CMS.gov is a great example of a highly detailed catalog that includes more technical information such as links to data dictionaries and related resources.
Take a look at the left sidebar of the Urban Institute’s public data catalog to see a breakdown of how they use categories, content types, and tags to sort their data.

Marry your Data Management System or Date Other Systems: The Difference Between an All-in-One System and a Best-of-Breed (or Best of Need) Solution

At the recent AMS Fest Chicago, I was fortunate enough to present with Moira Edwards of Ellipses Partners, and we addressed a critical question: Should associations marry their AMS (Association Management System) or explore other systems? The conversation centered around the choice between an all-in-one solution and a best-of-breed approach. Moira did an exceptional job helping the participants think through how to choose between these two options from a technology strategy and software selection perspective, and I discussed the data side of this decision. I wanted to share some of the insights that I talked about with you all.

Posted on

June 6, 2024

Navigating the Cloud: AWS vs Azure

Welcome to the cloud dilemma, where the numerous options for hosting your organization's resources can feel overwhelming. In this post, we'll discuss the ongoing debate between two major players: Amazon Web Services (AWS) and Microsoft Azure.

Posted on

May 8, 2024

The Nonprofit Case for Data Warehouses

In the digital age, data is the lifeblood of every organization, including nonprofits. Yet, many find themselves grappling with dispersed data sources stored across various systems, hindering their ability to harness insights effectively. This is where the concept of a data warehouse steps in...

Let’s work together!

Most nonprofits spend days putting together reports for board meetings and funders. The Inciter team brings together data from many sources to create easy and effortless reports. Our clients go from spending days on their reports, to just minutes.‍

Schedule Your free consultation

A Deep Dive into Data Cataloging

A data catalog is one of the more accessible outputs of a data governance project. It is valuable to anyone in your organization who needs data and likes to spend as little time tracking it down as possible.

Getting Started

Creating the Catalog

Additional data catalog examples

Other ways to build good documentation habits

Recent posts

Marry your Data Management System or Date Other Systems: The Difference Between an All-in-One System and a Best-of-Breed (or Best of Need) Solution

Navigating the Cloud: AWS vs Azure

The Nonprofit Case for Data Warehouses

Let’s work together!