Page 1 of 3 123 LastLast
Results 1 to 15 of 45
  1. #1
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038

    VBA routine to match legacy source values with target values (challenging!!!)

    Experts:

    Over the past few weeks, I have posted multiple threads in support of "legacy system" to "target system" data transformation.

    For reference purposes only, please see below just three (3) posts where forum experts "orange" (Jack), "accesstos" (John), moke123 provided some superb help. Again, including these posts are merely for references purposes, but while this new thread is related it is an independent problem from the other 3 questions. So, feel free to ignore the URLs.
    https://www.accessforums.net/showthread.php?t=83240
    https://www.accessforums.net/showthread.php?t=83270
    https://www.accessforums.net/showthread.php?t=83287

    Ok, to have a baseline, let me just dive into the weeds and provide some background on what I would like to accomplish in this post.



    Background (Source Data):

    a. I'm working on a project that requires the migration of legacy data (multiple independent but yet related organizations) into a single new *overarching* target system.
    b. As we often can find, the old legacy systems were lacking structure (i.e., systems/spreadsheets allowed for non-standardized data entry vs. utilizing drop-down menus with standard menu options).
    c. Based on b., you only can imagine the variety of data values that can be found in a single field/column.
    ** At this point, I suggest you review the attached DB, specifically table [LK_INJURY_TYPE] which contains 160 "injury types".
    ** When reviewing some of the first records (e.g., 12, 13, 14) the term "abrasion" is used in three (3) different contexts... i.e., "Abrasion to right knee"; "abrasion on the corneal...", "abrasion to the right...". Also, record # 15 utilizes the adjective "abrasive".
    d. And, it pretty much goes w/o saying (but let me indicate it anyhow), I have no control over how the legacy data was entered... I merely have to deal w/ what I'm given now.

    Background (Target Data):
    a. This sample DB also includes table [tbl_TARGET_INJURY_TYPE] which ultimately is a reference table from the (new) target system.
    b. [tbl_TARGET_INJURY_TYPE] includes 52 distinct injury types. While we do have some influence of adding new values in the long run, let's assume for now that these 52 values make up the full record set and cannot be changed (for now).

    Here's what I need some help with (background):

    - I think you already guessed it... the question is: "How can I link the source data (160 records) to the target data (52 records)?" If that's what you thought, I have one word for you.... BINGO!!!
    - I could go through this manually and succeed (at least to some degree) and associate source record #67 ("Hearing loss") to target record #20 ("Hearing Loss"). Another example would be where source #46 "cut/infection" could be linked to target #21 "Infection". And, arguably, a few/many more links could be drawn when manually scanning through these two tables.
    - Now, while manual review may be doable in this relatively short sample data (1 source table and 1 target table), I ultimately have nearly 100 source tables (w/ different fields/context) which would have to be linked to their associated target tables. And in some cases, my source values exceed 1,000 records. So, doing this manually would not only give me a major headache, it would also take a tremendous amount of time (and even worse, result in mistakes).

    Here's what I need some help with (using VBA in Access):
    - I would love to have some algorithm (VBA) which would allow me to do the following:

    1. Indicate which source and target tables I want to compare. For instance, compare table [LK_INJURY_TYPE] with [tbl_TARGET_INJURY_TYPE]. Given that I have 100 source tables and may 100 target tables, I probably need to have the source table names stored in one reference tables while having the target table names stored in a 2nd reference table. But that's part of the programming logic. As long as I have the ability to specific compare table "X" with table "Y" and then pick field "X1" with "Y1" (in this case, compare field [INJURY TYPE] with [tbl_TARGET_INJURY_TYPE].[TARGET_FIELD]).

    2. Next is the "apples to apples" data value comparison. In this case, however, it's more like a "green apple (Granny Smith)" to "Apples" comparison. So, based on certain text criteria, I would find they keyword "apple" (in the source data) and match it to "Apples" in the target data. I fully understand the latter could get really complex.

    3. Based on #2, I am 99% confident that NOT all values will be matched. However, if I could reach e.g., a 75% or even 80% match rate, I'd be thrilled. And then, for any record where no match could be found in the target source data, I merely would like to display "No match found!".

    4. Lastly, and this is again, a programming "thingy", previous experts (accesstos & orange) provided a solution where -- amongst other -- the field [TARGET_FIELD] is already added to all of my lookup (LK) source tables. So, ideally, [LK_INJURY_TYPE].[TARGET_FIELD] would be populated with any matching values from [tbl_TARGET_INJURY_TYPE].[TARGET_FIELD].

    I fully understand this is post/question falls into the BIG KAHUUNA category. I know it won't be easy to develop. However, any assistance w/ this "puzzle" would be greatly appreciated.

    Thousand thanks in advance,
    Tom
    Attached Files Attached Files

  2. #2
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    Tom,

    Do you have definitions/descriptions for each of your 52 proposed injury types? Has your list been deduced from some standard/authoritative list? The reason I ask is because they do not seem to be at a common level. That is, some are quite general, others more specific. May not be an issue for your purposes,but you may need some breakout/leveling.
    Always helpful if you have some description/guidelines as to what is included in a particular item/selection.

    These are 3 where identifying the "correct pigeon hole" may be difficult. Again it depends on the ultimate usage.

    ID TARGET_FIELD
    27 Mental Disorder, Amnesia
    28 Mental Disorder, Psychosis
    29 Mental Disorders or Syndromes


    ??Where do these map?? (ignore spelling variants)

    Code:
    SELECT LK_INJURY_TYPE.INJURY_TYPE
    FROM LK_INJURY_TYPE
    WHERE (((LK_INJURY_TYPE.INJURY_TYPE) Like "*twist*")) OR 
    (((LK_INJURY_TYPE.INJURY_TYPE) Like "*sprain*")) OR 
    (((LK_INJURY_TYPE.INJURY_TYPE) Like "Strain*"));
    INJURY_TYPE
    back of left knee pulled when twisted during fall
    knee sprain or strain
    Pain in hand (possible strain or sprain)
    Right ankle twist
    Severe sprain
    sprain
    sprain of left foot
    Sprain/strain
    sprained knee
    Strain
    strain to left thigh/buttock and abraision to righ
    Strain to right arm
    Strained / Tweaked back
    strained back
    twisted ankle
    Twisted right knee.
    Last edited by orange; 03-24-2021 at 09:37 AM. Reason: spelling/add'l

  3. #3
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    Jack:

    As always, your questions are laser-focused and spot-on. One can tell you have extensive knowledge in the field.

    Here are my responses to your questions:

    Q1. "Do you have definitions/descriptions for each of your 52 proposed injury types?"
    A1. Unfortunately, the answer is 'no'. While we have requested that information, we have yet to receive it. I understand this makes it more challenging but we're dealing w/ a number of SMEs and not everyone is as dedicated to the project as we'd like them to be.

    Q2. "Has your list been deduced from some standard/authoritative list?"
    A2. This reference list (target) was developed by another government agency. That entity falls outside our area of influence, we our program is merely tapping into the existing source. Can't go into all details due to proprierty reasons. I'm sure you understand. So, again, I can see that #27, #28, #29 all utilize "Mental Disorder". For now, the only/best answer that I can provide is to find the best/first possible match. That is, if the source data does not specify "amnesia" or "psychosis" or "syndromes", the default should be the 1st value from the target. Naturally, I'll have to double-check it but I don't want to hold up any progress.

    Q3. "Where do these map?? (ignore spelling variants)"
    A3. Hopefully I'm understanding the question correctly... if none of the keywords (e.g., "back" and/or "knee" can be found in the target, the match will be "No match found!" That said, I can later run a query on the "No match" and put together a consolidated question catelog (for each table) that we'll have the customer/SME review. Then they can come back and either indicate "Other" or have them create a new category (in the target reference tables).

    So, I hope I provided some valid/helpful answers to your preliminary questions. I wish I had more/better answers, but honestly, this is a learning experience for most/all of us. We'll plan to the best of our abilities but sometimes the 80% solution is good enough in order to move on to the next activity.

    Thank you for any help w/ this in advance!!

    Cheers,
    Tom

  4. #4
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    Oh, here's one "by the way"... yesterday's routine added four (4) new fields into the LK tables. [TARGET_TABLE], [TARGET_FIELD], [TARGET_PK_FK] and [COMMENT].

    While I'm getting started on the manual mapping (hopefully only for now), I realized that I needed a 5th field "TARGET_VALUE". So, rather putting a match value into column [TARGET_FIELD], all matching values should go into the new 5th field (to be inserted between [TARGET_FIELD] and [TARGET_PK_FK).

    As in so many cases, we often need to get into the subject before we realize something else was left out.

    See attached updated "Sample DB".
    Attached Thumbnails Attached Thumbnails ModifiedLK_Table.JPG  
    Attached Files Attached Files

  5. #5
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    1. Do you need a revision to the LK* table Create routine for the new Field?

    2. I would advise against lumping a lot of things into "the match will be "No match found!"" And then "I can later run a query on the "No match" and put together a consolidated question catalog (for each table) that we'll have the customer/SME review."

    My experience says do whatever to reduce the overall amount of work and adjustments. Doing the mapping over and over will eventually get there, but some review and planning will reduce the timeframe.

    3. I have set up a few queries to attempt to get some broad groupings for review, but I'm not familiar with your organization nor the level of detail/precision needed.

    Try these: To get some groupings that you may be able handle programatically. I would look for patterns/similarities and get them reviewed with someone knowledgeable of the categories and the intent of the values. Refine the groups to reduce the manual mapping effort.

    A: CutsAbrasionNotEye
    Code:
    SELECT LK_INJURY_TYPE.INJURY_TYPE
    FROM LK_INJURY_TYPE
    WHERE (((LK_INJURY_TYPE.INJURY_TYPE) Like "*cut*")) OR
     (((LK_INJURY_TYPE.INJURY_TYPE) Like "*scrape*")) OR 
    (((LK_INJURY_TYPE.INJURY_TYPE) Like "abr*")) OR
    (LK_INJURY_TYPE.INJURY_TYPE) Like "*lacer*"
    AND NOT
    (LK_INJURY_TYPE.INJURY_TYPE) Like "*eye*";
    B: Sore

    Code:
    SELECT LK_INJURY_TYPE.INJURY_TYPE
    FROM LK_INJURY_TYPE
    WHERE (((LK_INJURY_TYPE.INJURY_TYPE) Like "*sor*"));
    C: SprainStrainTwist

    Code:
    SELECT LK_INJURY_TYPE.INJURY_TYPE
    FROM LK_INJURY_TYPE
    WHERE (((LK_INJURY_TYPE.INJURY_TYPE) Like "*twist*")) OR
     (((LK_INJURY_TYPE.INJURY_TYPE) Like "*sprain*")) OR
     (((LK_INJURY_TYPE.INJURY_TYPE) Like "Strain*"));

  6. #6
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    I modified the following line:


    Code:
    20        newInfo1 = "  ,Target_Table text(40),Target_Field, Target_Value text(40) "                   'part1
    ... it throws an error presently. See JPG below.

    //

    I'll run the 3 queries in a moment and see what it "spits out" (sorry for the laments term... this is NOT meant to be negative in any shape or form).

    And, I agree w/ you, if I can minimize the number of "Not found...!" that would be the ultimately goal. I just have the constraint that I can neither change the source value nor can I append new categories to the target reference table. However, if multiple looping would -- somehow -- flag records as "candidate values", that they be ok.

    I hope this helps a bit...
    Attached Thumbnails Attached Thumbnails error.JPG  

  7. #7
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    Jack:

    As originally mentioned, I consider this problem as a BIG KAHUUNA one, so I completely understand that design of a solid logic will take some time.

    That said (mentioned), I ran the 3 queries... their output (as you know) is illustrated in the attached JPG.

    For my own curiosity, does the mapping for Source_Injury_Value to Target_Injury_Value require a "data dictionary" for each single lookup table. So, here we have "CutsAbrasionNotEye", "Sore", and "SprainStrainTwist", but ultimately there could be another 100 (or more/less) keywords/queries just for the injury table mapping.

    So, again, I'm merely curious how this could work out. I certainly do NOT want to get ahead of myself and think about this in the wrong manner... I am simply hopeful that I would **NOT** end up with, e.g., 70 LK tables multiplied by 100 query terms... resulting in thousands of queries.

    I only can imagine that's NOT the case, right?

    Thanks,
    Tom
    Attached Thumbnails Attached Thumbnails QueryOutput.JPG  

  8. #8
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    These were 3 quick queries to show "potential groupings".

    Many others are possible. The approach I would suggest is to:
    get the query groups
    then test to see if all injury_types in LK_Injury_Type were accounted for.

    ALSO, I'll look at the code for the new field in each LK table.

    Update a field say Comment in LK_INJURY_TYPE with an "X" for each max to a value in a group query.
    then review LK_INJURY_TYPE to see if all entries have X's.
    If not then "suggest another grouping"
    Repeat until all values in the table have an X. This means you can focus on a smaller number of groups rather than individual 160+ entries.

  9. #9
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    Tom,

    These lines should be 20 and 30 in the CreateLKTables.

    NOTE: I ADDED IT AS TEXT(40) AS PER YOUR SAMPLE.

    20 newInfo1 = " ,Target_table text(40), Target_Field " 'part1 40 char default for target_table name
    30 newInfo2 = " ,Target_Value text(40), Target_PK_FK Number, Comments Text(250))" 'part2

    Click image for larger version. 

Name:	Heres2TablesWithNewField.PNG 
Views:	35 
Size:	28.2 KB 
ID:	44798

  10. #10
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    Jack -- those 3 groupings make sense to me.

    Does that help w/ establishing a VBA function which would then automate the process? Do you require additional info or is the existing info sufficient for the "1st loop" design?

  11. #11
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    I think it's just another tool to help with automating some parts of the project. In the end it takes someone knowledgeable of the business usage to make decisions.

    Do the items in group 1 fit into mapping field X? All/some?? As always the devil is in the details.

    You see the hierarchy:
    Injury name
    generic Injury Type
    specific injury to "body part(s) left/right/front/back"

  12. #12
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    "It takes some knowledge of the business usage to make decisions". I 100% concur but the information posted in the original thread is pretty much what I have to work with.

    Thus, if the limited information is NOT sufficient to embark into automating it (or at least some portion), please let me know. In that case, I'll roll up my sleeves and get ready for consuming plenty of Ibuprofen.

    //

    However, based those 19 records presented in the 1st query, there are definitely some matches. See attached JPG. The mapping occurred based on the **19** records for query "Cuts" (not the original LK_Injury_Type).
    Attached Thumbnails Attached Thumbnails Example.JPG  

  13. #13
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    And going back to the other thread... "LK_" table creation. Would it be wise to include an AUTONUMBER (as primary key) when creating the LK_ tables? If so, is that difficult to add?

  14. #14
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    My view is that every table should have a Primary Key. Not necessarily an autonumber, but I usually start with autonumber PK.

    It can be done. Do you have an identified use/need for an autonumber pk vs the PK currently used in the tables?

    Also, regarding the mapping of the Injury_types.
    My current testing uses 13 query groups and handles 115 values out of the 160.
    Finished run using 13 queries
    115 Assigned out of 160

  15. #15
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,038
    Q. "Do you have an identified use/need for an autonumber pk vs the PK currently used in the tables?"
    A. No, I just thought it might be easier to link "Source.ID" to "Target.ID" vs. Source."SomeTextValue" to "Target.ID". If not required for this process, I'm good.

    Continuing w/ the injury_type queries. Allow me to follow up w/ a few questions myself (just for clarification purposes).

    1. Did you manually define the 13 query criteria or did you (somehow) auto-generate the criteria/conditions?
    2. The 115 (out of 160) catches more than 70%. That's great! Naturally, some of the existing values (0, 1, ..., 7) are some coded/masked values in one of the legacy systems, so I can exclude them for now. Thus, the success rate goes up even further (115/153), let's say.
    3. Now, here's the "million dollar" question... once you ran the query, did it copy the matching target (T) values and inserted them into the appropriate records in the source (S)'s LK table? If not, could that be automated so that I don't have to manually (one-by-one) copy/paste values from "T" into "S"?

    To recap:
    - Although tedious, it may be ok if I had to create the 13 queries (+/- n) for each of the ~100 LK tables. Not ideal but doable.
    - Once the queries are run and, let's say, find the majority of matches, it would be extremely helpful to have some VBA copy/paste the T values into the appropriate S records (into the newly inserted field TARGET_VALUE in LK table). Is that doable?

    Thank you,
    Tom

Page 1 of 3 123 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 53
    Last Post: 06-24-2021, 10:49 AM
  2. Replies: 18
    Last Post: 04-16-2014, 11:45 AM
  3. Replies: 9
    Last Post: 07-24-2013, 02:56 PM
  4. Replies: 0
    Last Post: 07-26-2011, 02:03 PM
  5. Replies: 0
    Last Post: 10-24-2010, 02:27 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums