What’s Better, Faster, and Cheaper than a New Data System?
“We need a new data system”. The stuff of dreams and nightmares. Whether you aren’t storing the right data in your system, or you can’t get what you want out of it, you either have considered a new data system, or you were in the process of implementing one.
So you got everyone on board, figured out what you needed out of a data system, did an inventory of your current systems, lined up the funding, and then …coronavirus.
There’s a lot of uncertainty going around. We don’t know whether we need to disinfect our mail. We’re not sure how often (or whether) we should be going to the grocery store. No one can say for sure whether this pandemic has peaked or not or when the second wave will be.
Similarly, decisions about allocating your technology resources are up in the air, and a big investment might seem reckless right now. If you aren’t feeling confident about moving forward with that big data project that was (or was about to be) in the works, then you might consider a data integration project instead. Data integration with right-sized data infrastructure allows you get more out of the data that you have in one or more systems without having to move it all to a new system. And the best part–this can be a great transition step to make that big data project you were considering easier to implement later, if and when you decide to move ahead with it. Beyond just repurposing the funds, you can also reduce the risk, time required and potential stress of your staff that is so common during major migrations.
Data Integration Benefits
Data integrations can usually be implemented for less than a third of the cost and effort of a data migration. It makes your data more usable and accessible in the process, so it can reduce the cost and effort required during a future migration as well. When you do decide to migrate, you can also continue to use your data infrastructure with the new system so that you don’t have to migrate all of your data to the new system. Instead, by piping data from your old system and new system into the same data infrastructure, you can seamlessly analyze both together!
Since you’re using repeatable steps (data pipeline pieces can be reused), and you’re only moving data into a common format or easy to use format and not into a system-specific format like you would in a migration, you save quite a lot of time. With a migration, it’s like you have to learn two foreign languages well enough to translate between them. With a data integration, you only have to learn one well enough to understand it in your native language.
Whenever you’re working with data, you’re bound to come across the unexpected. A data integration is a smaller effort than a full migration, so it’s easier to address anything unexpected you come across in the process, and you have fewer systems to contend with overall. You are also not stuck with a particular system. Each of the components in the data integration is modular and can be switched out if needed.
You aren’t changing anything about your current processes. So your staff can keep using whatever system they are used to, but you can still get the reporting you need and answer the questions that your board or leadership is asking. No training required.
If this sounds like something that you can use, check out our overview of a data integration process and the benefits. Generally speaking, data integration is just moving data from one or more sources into a common format or a place that is easily accessible.
NOTE: This approach works best when you have collected the basic data that you need. It may go without saying, but you can’t use data that you don’t have. Drop us a line at email@example.com. We are always happy to talk to people about their data strategy, and how to get started.
Modern Data Integration Concepts
All my data is in one place and ready for analysis. How do I use it?
Connect your data to analysis, business intelligence(BI), and reporting tools. These can be used for:
- Creating dashboards for stakeholders and leadership
- Periodic reporting to answer specific questions
- Ad-hoc reporting to answer general questions or look for trends
- Collecting and organizing data for a future migration
You can use general tools or specialized tools depending on what type of output you want and what type of data you have. These tools often offer advanced visualizations and in-depth analysis that out of box reporting does not provide.
If your data is:
- Stored in a data lake or data warehouse, and you’ve got data pipelines to keep it up to date
- Or in a system that directly integrates with your preferred tools for analysis, BI, and reporting
- Setting up BI and reporting tools is relatively simple because you’re using a common data source or easily integrated data sources
- You can access, analyze and report on data from one or multiple systems in the same place
- Your data has already been processed and cleaned with your data pipelines or your data cleaning tools and processes, so it’s ready to use
That sounds great, but what if my data needs to be cleaned, isn’t easy to connect to, or is in different places or systems?
If your data is cleaned and structured the way that you need for reporting, then a data warehouse is probably a good fit for you. Use a data warehouse to provide access to your data.
What is a data warehouse?
A data warehouse is a location or service that stores data and makes standard connectors available. Data warehouses are good for frequent and consistent access to data. And you only need to add data that you plan to use. Data warehouses store your data in a common format and provide standard connectors for data analysis, visualization, and reporting tools.
So how do I get my data into the data warehouse?
Create data pipelines to move your data from one place to another, translate between formats, or even clean your data.
What is a data pipeline?
A data pipeline is just a process or tool that moves data from one place to another and can make small changes to the data if needed (like decompressing a compressed file or adding a date and time to the end of a file name).
These can get a little technical especially if you have a tricky data source. At Inciter, we specialize in dealing with tricky data sources. Reach out to us at firstname.lastname@example.org if you have challenges getting data out of your system.
Each step in a data pipeline is like one piece of pipe that just does a simple task, and you connect the different pipe pieces together to make a pipeline. This way you can reuse the different pieces of the pipeline. For instance if you created a pipeline that retrieved data from your data system the pieces might look like this:
- Connect to data source or data system
- Download specific data set
- Create a CSV file from the data set
- Compress the CSV file
- Save the compressed CSV file in the right place
You might have another pipeline that watches the folder called “cleaned” and when it sees a new file imports it to your data warehouse.
What’s so useful about this approach is that each of the pieces can be reused as needed. Instead of designing a single process for each combination of data source and end result, you just connect the pieces you need. It’s like an erector set for data.
How do I create a data pipeline?
There are multiple ways to make data pipelines. There are specialized tools that are “low-code” or “no-code” that just give you the pieces you need and you tell them which data or system to act on, and there are others that are based in programming which require coding. There are also many tools that offer pre-built pipelines for common data sources. Some BI tools even include some very basic data pipelines for the most common data sources.
Data pipelines can be manual or partially manual like exporting a file from your CRM and uploading the file into your business intelligence (BI) tool, but most data pipelines are automatic like fetching data from your data system periodically or as it changes and saving that data somewhere.
You can even have data pipelines that “flow” through tools that clean or restructure your data automatically!
What if my data needs a lot of cleaning or restructuring? What if my data is difficult to get into my data warehouse or I don’t want to load everything into my data warehouse?
If your data requires a good bit of cleaning, restructuring or reformatting to get into your reporting or BI tool or into your data warehouse, then a data lake might be a good option. Data lakes are also great for archival storage for infrequently used data and storage of data that arrives faster than it can be prepared for use.
What is a data lake?
A data lake is just a place where you can store and organize lots of data without having to define what the data look like or what format they are in. It could be a Google Drive folder or a shared drive. Cloud data lakes are data lakes stored on redundant servers managed by someone else. They are more reliable and available than locally stored data lakes and have a significantly reduced maintenance and overhead cost.
With data stored in a data lake, it’s a good idea to store your data “as-is” prior to any modification, reformatting, etc. This makes it easy to start again with the original data if you need to use it differently later on or need a different “slice” of the data.
How do I store data in a data lake?
Once you’ve gotten your data stored in your data lake, you’ll probably want to access it! A data warehouse allows you to access the data that’s stored in your data lake by putting it in a common format and providing connectors for data analysis, visualization, and reporting tools.
How do I organize the data I put in the data lake?
Segmentation of data is very important to create a usable data lake, but it’s also pretty simple! You just need to organize your data into folders or segments that correspond to things like:
- Data processing stage or category (raw, cleaned, formatted, published, etc.)
- Data source or destination (fromFinanceSystem, fromGoogleAnalytics, forMembershipDirector, etc.)
- Data set (WebsiteVisits, DonorResponses, Clients, etc.)
- Date and time (2020/04/23, 2020-04-03-18:25, etc.)
Each segment should nest within the previous segment. So you might have multiple spreadsheets saved in a folder called “cleaned/fromGoogleAnalytics/WebsiteVisits/2020/04/23” that would contain all of the cleaned website visit data from Google Analytics on 4/23/2020.
Good segmentation should allow you or a computer to easily navigate and find your data. It prevents your data lake from becoming a data swamp!
How do I use data in a data lake?
- You can add a data pipeline to load only the data that you will need into your data warehouse
- Your data warehouse may connect directly to a data lake if files are in common formats (CSV for instance)
- There are special data warehouses that allow you to asks questions about your data without having to load it
- Some data warehouses can actually “crawl” through your data lake and look for data and add it automatically to your data warehouse
To Sum It Up…
Each of the above pieces of data integration make up your data infrastructure. They can build upon each other to solve complex challenges or be used alone for simpler ones. A reporting or BI tool can be used to gain and share insights from your data. A data warehouse can make your data readily available. A data lake can store and archive your data “as-is” so you don’t have to load it all at once or if you don’t need all of it frequently. Data pipelines modularly connect all the pieces so that you can reuse the processes and tasks for similar challenges.