View Poll Results: who is the highest paid CEO of world

Voters
0. You may not vote on this poll
  • Larry page

    0 0%
  • Tim cook

    0 0%
  • Bill gates

    0 0%
Results 1 to 11 of 11
  1. #1
    nikhil007 is offline Novice
    Windows 7 64bit Access 2010 64bit
    Join Date
    Apr 2012
    Posts
    3

    Fuzzy matchingforIdentifying Multiple Records Of Users With certain resemblace %age

    Hi,
    i have different fields in access database which have data, which are identical in a sense, but due to wrong data entry in those fields, it is not getting extracted as duplicate entries with total count. Sample I have attached already below along with attachment with sample database containing fields:

    sr.no. 1,3,4 in database being same record of one customer and sr. no. 2,6,7 also being same record of other customer. But due to error in punching in name, fathername, address field, basic queries are not picking it as multiple record of one and same customer.

    I want to compare Ist record containing name, father name, address with next record containing same and then with next till the last record and output from access to be generated like as mentioned below. It is actually fuzzy matching

    "wherever Ist record has >80% resemblance with any other record in database, query should generate all those records in order. Similary for 2nd record, it should compare with all other records and produce the output in query for records which has >80% resemblance. Sample records I have attached above for reference

    For information, database is in access and has 16-17 lacs of records.

    Access based vba code or sql query is highly appreciated

    Can anyone help me in this regard in this forum.
    Rgds
    Nikhil

    # Name Father Name Address
    1 Mohd. Sulaiman Habba Kadal, rajouri
    2 Ram Mohan Santosh Ram Nagar, Udhampur


    3 Mohd. Sulaiman Habba Kadal, rajouri
    4 Mohammad Sulaiman Habba Kadal, rajouri
    5 Ramsharan Rupinder Talabtillo
    6 Ram Santosh Ram Nagar, Udhampur
    7 Ram Mohan Santosh RamNagarudhampur
    Attached Files Attached Files

  2. #2
    June7's Avatar
    June7 is offline VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,929
    Never seen anything like this and sounds very difficult.

    The root of the problem is poor database design. Instead of repeatedly entering name, fathername, address, make choice from a combobox. Names should not be a pk/fk, use a unique identifier like a customer number.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  3. #3
    Join Date
    Apr 2012
    Location
    Los Angeles
    Posts
    75
    How many records does the database have? Is it 16 to 17 thousand? Not sure what unit of measure Lacs represents -sorry. This will be a long running routine as it will have to take the first record and loop through every other record, then move to the next record and loop through every other record, etc. First pass would be based on first x letters of first name and first x letters of father name. It can be done but will be a time consuming process.

    Are you using SQL Server?
    Last edited by AccessMSSQL; 04-17-2012 at 12:12 AM. Reason: ask another question

  4. #4
    recyan's Avatar
    recyan is offline Expert
    Windows 2K Access 2000
    Join Date
    Dec 2011
    Posts
    662

  5. #5
    nikhil007 is offline Novice
    Windows 7 64bit Access 2010 64bit
    Join Date
    Apr 2012
    Posts
    3

    Fuzzy Matching-- Help

    Database has 16-17 lakhs of records. Currently using MS access 2007. You are dead right in how to traverse through these records to find the fuzzy match. Yeah it will be a time consuming process but it is worth it. Looking forward to VBA code from your side for resolving this issue. It is highly urgent. Any help from your side will be highly appreciated.

    Rgds
    Nikhil

  6. #6
    June7's Avatar
    June7 is offline VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,929
    recyan has offered you code sources. Suggest you review them. Allen Browne's looks promising.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  7. #7
    recyan's Avatar
    recyan is offline Expert
    Windows 2K Access 2000
    Join Date
    Dec 2011
    Posts
    662
    Frankly speaking, I posted those links for the benefit of others & not the OP.
    It appears from the OP's first post that he / she is already aware of fuzzy matching, but perhaps is pressed for time to put in the effort & hence wants a ready made solution.

    Thanks

  8. #8
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,726
    What exactly is the data entry issue?
    due to wrong data entry in those fields, it is not getting extracted as duplicate entries with total count
    Is it misspelling/inconsistent spelling; lack of comma separator on some addresses???

    It appears that the names are similar; and father names could be repeated and not necessarily duplicates; and the addresses are not consistently separated by a comma. As others have said - some unique identifiers and structural changes may be a good start.

    Do you have a clear statement of WHAT you want to do with the final data?
    You have said "Yeah it will be a time consuming process but it is worth it."

    How exactly do you determine that 1,3, 4 are same record and 2,6, 7 are same record?

    See http://en.wikipedia.org/wiki/South_A...mbering_system for lakhs definition

  9. #9
    RagJose is offline Seasoned user
    Windows XP Access 2002
    Join Date
    Apr 2012
    Location
    Brazil
    Posts
    42
    Approximate string matching has been a side interest of mine since highschool. Have been to Soundex, but found it best suited for the original purpose of detecting phonetic similarity, with a strong bias towards English. It is not as good in other languages, and a simple inversion of letters caused by a common typo can compromise a detection.

    From recyan's suggestions, three take to the same source: George Brown, aka "opnseason", who posted related articles in several forums between 2006 and 2007, reporting a 99% score for correct matches with the combination of four different algorithms he tested and optimized. He made part of his VBA coding available at the time, but the files are no longer online. If anyone knows of his whereabouts, he might still be of help in the subject, and I would be interested in having the code for the project I'm currently working on.

    FYI, 17 "lakhs" mean 1,700,000. In one article, opnseason mentioned his tests on a 17,000-record base took like 7 hours. Even with nowadays' hardware, processing a 100-fold larger set like the OP's looks daunting.

  10. #10
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,726
    RagJose,

    I agree with your comments on Soundex and its bias. I have tried things like Levenstein in the past and found extremely long running times.

    info re levenstein http://www.merriampark.com/ld.htm#VB

    Other info
    http://stackoverflow.com/questions/5...859823#5859823

  11. #11
    RagJose is offline Seasoned user
    Windows XP Access 2002
    Join Date
    Apr 2012
    Location
    Brazil
    Posts
    42
    Thanks, orange. I was already familiar with Levenshtein Distance at the first link, but the second one provides an interesting discussion, besides proposing a simpler VBA code. Like opnseason, the author combines more than one criterion to yield a final result. Curiously, although the thread dates back to May, 2011, it was edited just a couple of weeks ago.

    nikhil007, orange's second link might be a possibility in your case. It provides some insight into (and VBA code for) treating also mulltiple-word strings, even if the reader prefers to stop before proceeding to AI territory. You will anyway have to tailor the code to your particular application.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. multiple users
    By slimjen in forum Security
    Replies: 8
    Last Post: 02-09-2012, 10:50 PM
  2. Multiple Users
    By jo15765 in forum Forms
    Replies: 2
    Last Post: 12-03-2010, 09:51 PM
  3. Multiple Users
    By Nettie in forum Access
    Replies: 5
    Last Post: 10-15-2010, 02:47 PM
  4. multiple users
    By ramkitty in forum Access
    Replies: 1
    Last Post: 03-22-2010, 07:15 PM
  5. How to have multiple users
    By Eric1066 in forum Access
    Replies: 5
    Last Post: 11-19-2009, 05:14 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums