Home Case Studies MS Office Tips and Tricks Sample Code Helpful Videos Custom Development  Custom Training Join Us Contact Us
Search Our Site:  

Remove Duplicate MS Excel Data

Excel is frequently used for storing data like mailing lists.  While most of this information is better kept in a database because of data integrity and data security issues, we can't help but notice that many more people are familiar and comfortable with Excel than Access or SQL Server. So while we at I Get It! Development, Inc. would emphatically recommend moving such data into a database, we recognize that much of it will stay in Excel and so needs to be worked on there.

One of the most common questions we find on our website is how to remove duplicate rows or de-dupe data. The very first step in deleting duplicates is figuring out what constitutes a duplicate.  The same street address, city and zip code may have different apartment or suite numbers.

A catalog like Land's End may send catalogs to both a husband and wife living at the same address.  Merely finding two addresses that are the same does not constitute a duplicate data record in that case.

A catalog of fishing equipment may send separate catalogs to younger customers and older customers; so even the same names and addresses – James (dad) and James (son) – may also not constitute duplicates but legitimately separate addressees!

For our purposes, let's define a duplicate as a line that contains the same ZIP code and the same street address including apartment or any secondary street address information.  We only want to send a one catalog to one residence.  Our catalog doesn't differentiate between male or female,  parents or children.

We use the & operator (concatenation) to put fields together. We append zip, street, and any secondary street information in that order. See the formulas in the first graphic below, and the results in the graphic beneath that.






A Word From Our Sponsor

Barnes And Noble
Computer Books


The zip code will represent city and state so it isn’t necessary to include them, but if you feel that your zip code information may be compromised, use the city and state as well.

Select this entire column and copy it. Then immediately go to Edit > Paste Special… and choose Values. We should get rid of the formulas before we start sorting the data. Now sort your data on this column but be sure that you are sorting all of your columns! It’s just too easy to corrupt your data in Excel by sorting on only one column instead of all of them!

On row 2 of the first blank column after the dupe data, insert the formula =EXACT(H2,H1)

This formula checks the dupe column on the same row with the dupe column on the previous row. Copy the formula all the way down the column and you’ll see a series of TRUE and FALSE values. The TRUE values indicate a duplicate has been found, so you can examine the multiple records and delete the appropriate ones.
Another technique you can use is to use COUNTIF. The COUNTIF function will show not only that duplicates exist, but how many there are! On row 2 of the first blank column after the EXACT formulas type =COUNTIF(H:H,H2). This says ‘look in all of column H and tell me how many times the value in H2 appears’. Copy this formula all the way down the column. You can sort on this formula (though you should copy, paste special…values first!) in descending order and the duplicates will float up to the top for easy manipulation.

The most effective and the least-liked step is a hard visual search of the data.  This method is 100% manual and 100% disagreeable.  Nobody likes to do it.  But it is the only way to recognize that James Wilson, Jim Wilson, and Jimmy Wilson are all the same person.  Visual inspection is the only way to recognize that One Washington Place is the same as 1 Washington Place. You’ll never get your computer to do that for you!

Hopefully, the steps above will help you get rid of the easy 90% so that you can concentrate your human eye and human brain on the difficult 10%.

Remember that data scrubbing is an ongoing process and you should never go very long without examining your business critical data.
 

References

Journal of Accountancy Articles

Tweaking The Numbers

Block That Spreadsheet Error

Excel Security Issues


 

Clients

Do you need help with Microsoft Office? Contact us

See how we've helped companies like yours develop effective business processes

Client Success Stories


 

Consultants

I Get It! Consultants set their own hours, work with their own clients, and choose their own projects.

If you're a Microsoft Office guru and would like to explore being a full-time consultant, please see our consultant information pages.