This is the eighth and final post in our Data Cleaning in Excel blog post series. As the series comes to a close, wed like to take an opportunity to organize and recap material weve covered. Well review some of the useful Excel functions weve used, and provide some guidance for incorporating the techniques into your data cleaning routine.
Every dataset has its own issues. Our posts in this series have been written to address ad-hoc common problems. We advise referencing this post as you go through your data with the understanding that not everything here will be applicable to your dataset. Also, keep in mind that youll always run into unique challenges, and this isnt an exhaustive list of commands for squeaky-clean data.
When youre presented with a raw, messy sheet of data, its often difficult to determine where you should start. There is no right or wrong process for data cleaning, but there are stages you should get out of the way in the beginning before employing some advanced techniques.
Before messing with your data, become familiar with your dataset to the best of your ability. If youre familiar with what youre collecting, and how youre doing it, youll be able to better understand why youre running into problems. Next time, you can modify your collection to prevent them.
Get rid of blank rows
First, youll want to get rid of unnecessary white space. This is simple in the case of removing fully blank rows. In other situations, youll need to find out why your rows have consistently blank cells or columns before removing them. Check out our post on removing blank rows to get caught up.
Remove duplicate entries
Before you start any advanced data cleaning, you’ll want to take out accidental repeats in your data. Our post on removing duplicate entries details a few techniques for efficiently discovering duplicates in large datasets.
Manage blank cells
Next, get your blank cells straightened out. You might need to figure out why youre consistently seeing blanks in cells before you can fix the problem. See our post on managing blank cells.
Format your cells
Take a look at how your cells are formatted, and do a couple of tests to make sure your dates and numbers are formatted correctly, and to weed out issues surrounding numerical data being stored as text. Check out our post on fixing numbers that dont act like numbers.
Combine and separate columns when necessary
We showcased many methods in our posts on combining and separating data stored in columns. Again, it is important to be familiar with your data so you can know for sure that youre placing data where its supposed to be. In both of those posts, we used the function Flash Fill. We even wrote a post dedicated to Flash Fill and its variety of uses for data cleaning.
The process of cleaning and analyzing data in Excel from start to finish is one long sequence of analytical problem-solving exercises. Consider each function you learn to be a tool in your toolkit. The more you work with Excel, the easier it gets to identify which tool to utilize when you run into trouble. We encourage you to think creatively about the techniques you could use to maximize efficiency, and to take your time exploring possibilities before tackling your next obstacle in Excel. Let us know if you have any questions or additional topics you’d like us to cover by emailing firstname.lastname@example.org.