Results 1 to 2 of 2
  1. #1
    tigers is offline Novice
    Windows XP Access 2003
    Join Date
    Mar 2006
    Posts
    28

    Can Access do this?

    We have a list of vendors that is 100,000 records long. We just need the vendor names so I was able to eliminate the duplicates (due to multiple vendor addresses) so now it is down to 50,000 records.



    But, there is still duplication. For example. If there is a vendor name called "John's Business Inc.", there is a vendor number for that. But then, there is also a "Johns Business Inc" record where the apostrophe and period after "inc" is not in the name. By creating a new name with these differences, a new vendor number was created. But again, all we need are the names.

    So, at this point, I am thinking the only way to eliminate these duplicates is for a human to go through the list. But the list is 50,000 records!

    I just need to ask if there is any other way to find similar records somehow and eliminate them.

    The end use for this list will be that we do searches of all of the vendor names but we were trying to quantify just how many vendor names we need to search and since there are duplicates, the list may be a lot shorter than 50,000 records which will reduce the amount of time we will be doing with the searches.

    Any ideas?

  2. #2
    dcrake's Avatar
    dcrake is offline Competent Performer
    Windows XP Access 2003
    Join Date
    Aug 2009
    Posts
    435
    One possible solution would be to create a query that lists the company name with all the grammatical characters removed. These are removed dynamically not physically.

    First Create a function that the full name with passed to
    Code:
     
     
    Public Function StripDown(ComanyName As String) As Sting
     
    Dim tmpString As String
     
    tmpString = Replace(CompanyName," ","") ' remove spaces
    tmpString = Replace(tmpString,".","") ' remove dots
    tmpString = Replace(tmpString,"'","") ' remove apostrophes
    tmpString = Replace(tmpString,",","") ' remove commas
    'Repeat for any other characters to strip
     
    StripDown = tmpString
     
    End Function
    Then in a new query create the column

    Company:StripDown([CompanyName])

    Add another column
    Cnt:1

    Group by Company

    Count Cnt

    Sort Cnt Descending

    Filter Cnt>1

    This will then display all the comany names without the offending characters giving a list of all duplicate names that appear more than once with the worst offender at the top.

    David

Please reply to this thread with any new information or opinions.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums