In this part of our data cleaning series, well help you find and remove duplicate entries in Excel. Repeats are a very common data entry mistake or error from a data pull. Duplicates in your data can create a variety of unfortunate consequences in administrative duties and analysis. Worst of all, they lead to a real misrepresentation of your results.
This is Part 6 in our Data Cleaning in Excel 101 series. Part 1 showed methods of removing blank rows from your data. Part 2 focuses on splitting data from one cell to multiple cells. Part 3 covers the opposite: combining data from multiple columns into one column. In Part 4, we discussed some incredibly useful and time-saving tricks with Excels Flash Fill function. Part 5 covers solutions for when your numbers aren’t acting like numbers.
Removing Duplicate Entries
There are many ways to remove duplicates in Excel. We decided that rather than list 7 ways to accomplish the same thing, lets just walk you through 2 versatile techniques to choose from to fit your needs.
Method 1: “Remove Duplicates” Command
Removing duplicate values is routine for us data folks. In fact, its so common that Excel has a dedicated command for it. It’s located here in the ribbon, under the “Data” tab.
- Select any cell within your dataset. Then, select Remove Duplicates. Excel will intuitively highlight all of the data in your dataset, and youll be directed to a Remove Duplicates window
- Keep selected all checked if you want Excel to remove duplicates of full rows that are the same.
We typically would use Select All. However, there are other scenarios in which youd need to delete repeats that technically contain different values in some columns. For example, when working with data entered on different days, data in a time or date entry might be different because of when each was logged. In this case, here is when you would need to check the appropriate columns. For instance, you might want to just check a variable like a user or product ID.
- Select OK. Excel will remove duplicates from your dataset, and a window will appear telling you how many duplicates were deleted and how many unique values remain.
Method 2: Formulas and Conditional Formatting
If you want to carefully examine the duplicates before deleting them, conditional formatting is a great tool. To execute this, youll first need to set up your data by creating a new column, and adding filters, as pictured below.
- Create a new column and name it combined, or another term that will help you remember the contents.
2. Apply filters to your headers. Select any cell within the parameters of your dataset and select the filter button under the data tab.
3. In your new combined column, use =CONCAT to combine all of each rows data into the cells in this column.
Refer to our blog post on combining data from multiple cells into one column to learn more about this formula, and a few alternatives you could use instead.
You new column should look like this.
- Under the Home tab, select Conditional Formatting, Highlight Cell Rules and Duplicate Values.
5. You can leave the default format as-is, but here, we’ve customized the format to have a yellow “fill color”. This will make it easier to filter in the next step.
6. Select enter. Your duplicate values will be highlighted.
7. Present a list of just your duplicates by filtering by the color you chose to apply to conditional formatting.
Now, you can go through and examine your duplicate values and see for yourself if there are any outstanding reasons for these errors. If you like, go back and apply method one.
Now you know how to remove duplicates in excel! We encourage you to try out these examples and let us know if you have any questions. We hope these methods help you as you clean your data! Please let us know if there are other topics or data cleaning problems you would like us to tackle by emailing firstname.lastname@example.org.