Results 1 to 8 of 8
  1. #1
    BassettProvidentia is offline Novice
    Windows 7 32bit Access 2010 32bit
    Join Date
    Feb 2011
    Location
    Moray, United Kingdom
    Posts
    4

    De-duplicating new data

    Good morning,

    I have a prospective client that is interested in having a User Defined Function written that will prevent duplicated data from being appended to his database.

    The option I am proposing is that each of the new rows is appended to the master database where the UDF returns false.

    The UDF, each interation, will go through each row in the master database and apply another UDF to field in question. If the result is 'false', the outer UDF will return false to query.



    The database is quite large and I am keen to ensure that the solution I offer is as efficient as possible. Is there a better way to achieve this?

  2. #2
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,848
    What exactly constitutes duplicate data in this case?
    Why not have a PrimaryKey on the Table involved - Access will not allow duplicates then.
    How would an "append" occur? Hopefully access to tables is controlled by Forms in the application.

    You could use DCount (in a BeforeUpdate event) and check for 0 records then allow append, if >0 a potential dup so bypass an append

  3. #3
    BassettProvidentia is offline Novice
    Windows 7 32bit Access 2010 32bit
    Join Date
    Feb 2011
    Location
    Moray, United Kingdom
    Posts
    4
    Thanks for replying orange.

    The fields are a list of organisation names that come from an external source. Their is not a unique identifier within the new incoming data, so I am writing a VBA function that searches the "master table" for a similar name and reports back what the matching record is.

    There are 500,000 master records and 1000 new rocords, so my function is, which searches 500,000 records is being run 1000 times. Although it is working fairly well at this stage, it is quite slow and I was wondering if there was an easier (for the machine's resources) was of doing things.

    The append is being run from an append query, as the data is not input manually but somes from an external datasource. Basically, the row from the list of new companies will be added if the UDF DeDupe() returns null.

    Can you think of anything quicker?

  4. #4
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,848
    So, not knowing anything about your business/application, I have to ask what exactly is a Master table? Master of what and to whom? If you maintain some sort of authoritative list of 500,000 records, I don't see why you can't assign a unique number within your scope.
    In any case, I would build a Table of Master names, and using whatever criteria you have, Name/PostalCode or whatever, establish a list of unique records. I would hold that table -almost sacred. Then, using whatever algorithm you have for the uniqueness of names, compare incoming records with the current Master.

    You could use a Dcount to check for duplicates, such that 0 means we don't have it and a 1 indicates - already present.

    What do you use these Companies for? Don't you need addresses, PostalCode, etc for your needs?

    A little more background on the business needs would help in any advice. I would think a list of 500,000 unique companies would be a significant list that others may have an interest in - if it is maintained and somewhat comprehensive.

  5. #5
    BassettProvidentia is offline Novice
    Windows 7 32bit Access 2010 32bit
    Join Date
    Feb 2011
    Location
    Moray, United Kingdom
    Posts
    4
    Thanks for the reply.

    I am writing the function for a client.

    They have a table within their database that contains a list of company names (and I am sure they have primary keys, etc).

    Every month or so, they receive a list of new company names to add to the database. The list comes from external sources, and often contains names that are simular, but not an exact match, to names in the database.

    Example:

    JimBob's Widgets (Ltd)
    JimBobs Widgets Limited

    I am writing a VBA function that is used within an append query, to search through the database using a supplied string for "fuzzy" matches.

    The concept is working well now, but about 40% of the matches that are found are not matches at all. I want to reduce this.

    I am using an SQL query from a DAO object, which uses a like statement.

    At the moment I have it like this:

    SELECT name FROM table WHERE name LIKE "*Jimbobs* Widgets*";

    An example like that works well, but if the name is like:

    A J H Ltd

    The result set returns all sorts of unrelated records. Is there an wildcard that I can use within the LIKE clause to represent a space or nothing?

  6. #6
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,848
    From my previous post
    What do you use these Companies for? Don't you need addresses, PostalCode, etc for your needs?

    A little more background on the business needs would help in any advice. I would think a list of 500,000 unique companies would be a significant list that others may have an interest in - if it is maintained and somewhat comprehensive.
    I'm not sure I understand your "fuzzy" match approach.
    You may want to investigate the following algorithms/links
    to refine your test for "fuzzily similar names".

    1) SOUNDEX http://en.wikipedia.org/wiki/Soundex
    2) Levenstein distance http://en.wikipedia.org/wiki/Levenshtein_distance
    3) metaphone http://en.wikipedia.org/wiki/Metaphone

    As for wildcards and other string approaches, you could investigate
    "regular expressions". http://en.wikipedia.org/wiki/Regular_expression

    Good luck with whatever you decide.

  7. #7
    BassettProvidentia is offline Novice
    Windows 7 32bit Access 2010 32bit
    Join Date
    Feb 2011
    Location
    Moray, United Kingdom
    Posts
    4
    Thanks for the links. They have certainly given me something to work with.

  8. #8
    dskulman is offline Novice
    Windows Vista Access 2007
    Join Date
    Mar 2011
    Posts
    4

    Talking Appending non-duplicated records

    In the table that is being appended to, make all the fields you don't want duplicated primary keys (you can have multiple keys).

    Then, an append query will NOT append anything that is duplicated.



    Quote Originally Posted by BassettProvidentia View Post
    Good morning,

    I have a prospective client that is interested in having a User Defined Function written that will prevent duplicated data from being appended to his database.

    The option I am proposing is that each of the new rows is appended to the master database where the UDF returns false.

    The UDF, each interation, will go through each row in the master database and apply another UDF to field in question. If the result is 'false', the outer UDF will return false to query.

    The database is quite large and I am keen to ensure that the solution I offer is as efficient as possible. Is there a better way to achieve this?

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Duplicating numbers....why??
    By jgelpi16 in forum Programming
    Replies: 2
    Last Post: 01-21-2011, 01:32 PM
  2. duplicating records
    By kstyles in forum Queries
    Replies: 7
    Last Post: 12-31-2010, 02:31 PM
  3. labels duplicating themselves
    By RedGoneWILD in forum Forms
    Replies: 2
    Last Post: 08-17-2010, 07:42 AM
  4. Report duplicating itself many many times
    By TrudyD1474 in forum Reports
    Replies: 1
    Last Post: 06-24-2010, 11:59 AM
  5. duplicating and editing the record
    By Airis in forum Forms
    Replies: 1
    Last Post: 04-12-2010, 07:41 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums