Automatic correction of contractor data
Project Description

This project was made for Nestle Polska S.A. from Torun. My task was to create a script for automatic correction of contractor's data with SAP, MS Access, MS Excel and VBA. I needed to analyze existing big dataset (over 300 thousands rows), to find and correct details or remove duplicated companies from it.

The problem was that every day sales representatives add tens of new companies into the database. Unfortunately while doing so they sometimes make mistakes, i.e. adding company that already exists in the database under a slightly different name, misspelling the name of a city, street, company (e.g. the Polish city Białystok was written in more than 40 different ways), do not providing all required information.

The final solution was made in two steps:

  • Solution #1 - automatic correction of address data
    Because of misspellings issues in companies' data, it was hard to indicate duplicated companies.
    Misspellings and lack of crucial data like VAT identification number, names of cities, streets etc. were the main problems in finding duplicated companies.
    To solve those problems I created a script that corrects and fills in the gaps automatically. The main part of it were tables which contained known errors and mistakes and validating the data based on them.
  • Step #2 - finding duplicated company names
    The corrected data was check for finding duplicates. This part was based on one-to-one database relationships on a variety of columns such as VAT identification number, address and name.

The results of such approach were more than satisfying. Sometimes system required a user attention to correct the data or create new rules in tables mentioned above. But the longer system worked, the more rules it contained and the less user attention was required.

Using the created application it was shown that ~300 000 companies written into the database were in fact around 110 000 unique companies.
In first 1 month, more than 5 000 correction rules were created resulting in just a few beging added each day and just 10% of the newly created companies required a user intervention.

Project Details