Data Cleaning in Excel 101, Part 8: Final Summary & Data Cleaning Guide
This is the eighth and final post in our Data Cleaning in Excel blog post series. As the series comes to a close, we’d like to take an opportunity to organize and recap material we’ve covered. We’ll review some of the useful Excel functions we’ve used, and provide some guidance for incorporating the techniques into your data cleaning routine.
Every dataset has its own issues. Our posts in this series have been written to address ad-hoc common problems. We advise referencing this post as you go through your data with the understanding that not everything here will be applicable to your dataset. Also, keep in mind that you’ll always run into unique challenges, and this isn’t an exhaustive list of commands for squeaky-clean data.
When you’re presented with a raw, messy sheet of data, it’s often difficult to determine where you should start. There is no right or wrong process for data cleaning, but there are stages you should get out of the way in the beginning before employing some advanced techniques.
Before messing with your data, become familiar with your dataset to the best of your ability. If you’re familiar with what you’re collecting, and how you’re doing it, you’ll be able to better understand why you’re running into problems. Next time, you can modify your collection to prevent them.
Get rid of blank rows
First, you’ll want to get rid of unnecessary white space. This is simple in the case of removing fully blank rows. In other situations, you’ll need to find out why your rows have consistently blank cells or columns before removing them. Check out our post on removing blank rows to get caught up.
Remove duplicate entries
Before you start any advanced data cleaning, you’ll want to take out accidental repeats in your data. Our post on removing duplicate entries details a few techniques for efficiently discovering duplicates in large datasets.
Manage blank cells
Next, get your blank cells straightened out. You might need to figure out why you’re consistently seeing blanks in cells before you can fix the problem. See our post on managing blank cells.
Format your cells
Take a look at how your cells are formatted, and do a couple of tests to make sure your dates and numbers are formatted correctly, and to weed out issues surrounding numerical data being stored as text. Check out our post on fixing numbers that don’t act like numbers.
Combine and separate columns when necessary
We showcased many methods in our posts on combining and separating data stored in columns. Again, it is important to be familiar with your data so you can know for sure that you’re placing data where it’s supposed to be. In both of those posts, we used the function “Flash Fill”. We even wrote a post dedicated to Flash Fill and its variety of uses for data cleaning.
The process of cleaning and analyzing data in Excel from start to finish is one long sequence of analytical problem-solving exercises. Consider each function you learn to be a tool in your toolkit. The more you work with Excel, the easier it gets to identify which tool to utilize when you run into trouble. We encourage you to think creatively about the techniques you could use to maximize efficiency, and to take your time exploring possibilities before tackling your next obstacle in Excel. Let us know if you have any questions or additional topics you’d like us to cover by emailing firstname.lastname@example.org.