Results 1 to 8 of 8
  1. #1
    mkallover is offline Novice
    Windows XP Access 2007
    Join Date
    Sep 2010
    Posts
    25

    Identify "duplicated" rows

    This might be a pretty common query but I'm not really sure how to go about doing it.



    My company has a table that identifies doctors and their forms of ID. Because of some faulty load programs, we have an issue where a single doctor may have multiple rows on the table. I'm trying to identify and correct those situations. Here is what the issue looks like:

    InternalID
    ID1
    ID2 Name
    1 12345 AA1234 Mike Smith, MD
    2 12345 12345 Mike Smith, MD

    That second row should not be there but because of the load program, we have millions of rows on our table like that.

    So what I would like my query to do is identify all of the rows on the table where there is one row that is correct (InternalID = 1) and one row that is incorrect (InternalID = 2). The eventual goal being to run a job that would delete the incorrect rows. There are millions of rows on the table so optimally the query should probably only return the InternalID of the incorrect rows.

    I'd appreciate any ideas you all might have.

  2. #2
    alansidman's Avatar
    alansidman is offline VIP
    Windows 7 32bit Access 2010 32bit
    Join Date
    Apr 2010
    Location
    Steamboat Springs
    Posts
    2,529
    Have you attempted to resolve this using the Duplicates Query Wizard built into Access? That would be my first step in this project.

  3. #3
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,722
    I agree with Alan, and hope you have a backup in a safe place.
    One of the things that jumps out to me is WHY/HOW did duplicates get added? There appears to be a major glitch in some processes or procedures.
    Do you have a data model? Have you NORMALIZED your tables?

    There are millions of rows on the table
    Really? How do you get millions of rows into this situation?

  4. #4
    alansidman's Avatar
    alansidman is offline VIP
    Windows 7 32bit Access 2010 32bit
    Join Date
    Apr 2010
    Location
    Steamboat Springs
    Posts
    2,529
    Just recalled a video that will also work for you.

    http://www.datapigtechnologies.com/f...teproblem.html

  5. #5
    mkallover is offline Novice
    Windows XP Access 2007
    Join Date
    Sep 2010
    Posts
    25
    I agree with Alan, and hope you have a backup in a safe place. - It is backed up. I am querying a table that sits in a data warehouse.


    One of the things that jumps out to me is WHY/HOW did duplicates get added? There appears to be a major glitch in some processes or procedures. - I think I addressed this in my original post by to clarify: We have programs that load the table based on multiple sources files we receive. There are faults in the load programs that are being corrected in another effort but I am also trying to identify the impacted rows.


    Do you have a data model? Have you NORMALIZED your tables? - No, the table is not normalized and unfortunately I have no power to do anything about that.

    There are millions of rows on the table


    Really? How do you get millions of rows into this situation? - When you're dealing with a table that has records for every single doctor that practices in the US, you can build up a lot of rows quickly when you have a flaw. I was not involved with the creation of the table or the load programs but now I am part of a process to try and clean it up.

  6. #6
    mkallover is offline Novice
    Windows XP Access 2007
    Join Date
    Sep 2010
    Posts
    25
    To Alan:

    I've tried the duplicate wizard but I'm still having a tough time identifying only the incorrect rows and not returning the "good" rows.

  7. #7
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,722
    Can you tell us in plain English what makes the "good" rows? Do you have a definition of what a unique/good record is?

    If you are dealing with millions of records, multiple sources, I hope someone is looking at Table design and has a plan.
    It sounds like there may be a lot of independent work efforts underway.

  8. #8
    mkallover is offline Novice
    Windows XP Access 2007
    Join Date
    Sep 2010
    Posts
    25
    A "good" row would be one where ID1 <> ID2. Those are the rows that are caused by the faulty load programs that are being corrected.

    I think I can first identify all of the duplicate rows using the query wizard and then use an additional query to pull out the rows where ID1 = ID2 and that will give me my population of incorrect rows.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 5
    Last Post: 06-19-2012, 03:16 AM
  2. Replies: 11
    Last Post: 03-29-2012, 02:32 PM
  3. Identify " " in String
    By jgelpi16 in forum Programming
    Replies: 7
    Last Post: 08-31-2010, 10:12 AM
  4. Identify " " and ","
    By jgelpi16 in forum Queries
    Replies: 1
    Last Post: 08-30-2010, 01:43 PM
  5. "Internal" timestamp on table rows?
    By quahtrader in forum Access
    Replies: 1
    Last Post: 07-13-2010, 10:43 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums