Home Case Studies MS Office Tips and Tricks Sample Code Helpful Videos Custom Development  Custom Training Join Us Contact Us
Search Our Site:  

Scrubbing Your Data

Our example database from the prior article is quite simple. We need to break the transactional data shown into three separate tables, each table containing only one type of information. For instance, we need a place we can store Customer data and only customer data—name, address, credit limit, etc. Each order placed by that customer includes a customer code that links to the customer data.

To get a list of our customers we’ll use the Advanced Filter and check the data in Excel before the database import . You’ll often hear this process called ‘scrubbing the data’. It is critically important and cannot be automated:

DO NOT UNDERESTIMATE THE AMOUNT OF TIME THIS PROCESS TAKES.

Data which has been left in an unsecured environment for any length of time may take many days to clean. This process will be easier to do in Excel if that is the environment you’re comfortable in.

In Excel, highlight the column labels that describe uniqueness for a customer. In other words, don’t just select the name; select the name, full address and phone number(s). This way, if a company has the same name but different contact information, it will show up twice in our unique list. That’s what we want! We can then do the research to find out what information is correct. Copy the labels to an empty place in the workbook. Go to Data, Advanced Filter—you’ll see this dialog box.


Click the picture to see the demonstration video!


A Word From Our Sponsor

Online Courses For CPE Credit
CPE Link is a new provider of web-based continuing professional education (CPE) for accounting professionals.


Click the box ‘Copy to another location’. Indicate the full range of your existing data for the List Range. For the Criteria Range, highlight the labels you pasted into an empty place in your workbook. For the Copy To, highlight the criteria range and more than enough rows to hold the unique list of customer data. This is critical, as Excel will simply chop off any data which does not fit into the copy to range! Finally, click unique records only and OK.

Sort the resulting list of company data by name to force company names to appear next to each other. This will make companies with inaccurate data stand out. If you are concerned that you may have wildly different company spellings, you may wish to sort by telephone number or by address to see if there are duplicate addresses with different company names.

Perform this process for all data to be imported into a relational database. When you’re sure that the data is clean, you may import each type of data into its own table with a unique code for each row. Your company may already assign unique codes for each company (like a D&B number) and product (like a SKU). These are the best codes to use as they already have meaning within your company. Should the time come to integrate your database with another in your company, you will have the ability to tie together data with the same company or product code.
 

References

Journal of Accountancy Articles

Tweaking The Numbers

Block That Spreadsheet Error

Excel Security Issues


 

Clients

Do you need help with Microsoft Office? Contact us

See how we've helped companies like yours develop effective business processes

Client Success Stories


 

Consultants

I Get It! Consultants set their own hours, work with their own clients, and choose their own projects.

If you're a Microsoft Office guru and would like to explore being a full-time consultant, please see our consultant information pages.