Experts:
Over the past few weeks, I have posted multiple threads in support of "legacy system" to "target system" data transformation.
For reference purposes only, please see below just three (3) posts where forum experts "orange" (Jack), "accesstos" (John), moke123 provided some superb help. Again, including these posts are merely for references purposes, but while this new thread is related it is an independent problem from the other 3 questions. So, feel free to ignore the URLs.
https://www.accessforums.net/showthread.php?t=83240
https://www.accessforums.net/showthread.php?t=83270
https://www.accessforums.net/showthread.php?t=83287
Ok, to have a baseline, let me just dive into the weeds and provide some background on what I would like to accomplish in this post.
Background (Source Data):
a. I'm working on a project that requires the migration of legacy data (multiple independent but yet related organizations) into a single new *overarching* target system.
b. As we often can find, the old legacy systems were lacking structure (i.e., systems/spreadsheets allowed for non-standardized data entry vs. utilizing drop-down menus with standard menu options).
c. Based on b., you only can imagine the variety of data values that can be found in a single field/column.
** At this point, I suggest you review the attached DB, specifically table [LK_INJURY_TYPE] which contains 160 "injury types".
** When reviewing some of the first records (e.g., 12, 13, 14) the term "abrasion" is used in three (3) different contexts... i.e., "Abrasion to right knee"; "abrasion on the corneal...", "abrasion to the right...". Also, record # 15 utilizes the adjective "abrasive".
d. And, it pretty much goes w/o saying (but let me indicate it anyhow), I have no control over how the legacy data was entered... I merely have to deal w/ what I'm given now.
Background (Target Data):
a. This sample DB also includes table [tbl_TARGET_INJURY_TYPE] which ultimately is a reference table from the (new) target system.
b. [tbl_TARGET_INJURY_TYPE] includes 52 distinct injury types. While we do have some influence of adding new values in the long run, let's assume for now that these 52 values make up the full record set and cannot be changed (for now).
Here's what I need some help with (background):
- I think you already guessed it... the question is: "How can I link the source data (160 records) to the target data (52 records)?" If that's what you thought, I have one word for you.... BINGO!!!
- I could go through this manually and succeed (at least to some degree) and associate source record #67 ("Hearing loss") to target record #20 ("Hearing Loss"). Another example would be where source #46 "cut/infection" could be linked to target #21 "Infection". And, arguably, a few/many more links could be drawn when manually scanning through these two tables.
- Now, while manual review may be doable in this relatively short sample data (1 source table and 1 target table), I ultimately have nearly 100 source tables (w/ different fields/context) which would have to be linked to their associated target tables. And in some cases, my source values exceed 1,000 records. So, doing this manually would not only give me a major headache, it would also take a tremendous amount of time (and even worse, result in mistakes).
Here's what I need some help with (using VBA in Access):
- I would love to have some algorithm (VBA) which would allow me to do the following:
1. Indicate which source and target tables I want to compare. For instance, compare table [LK_INJURY_TYPE] with [tbl_TARGET_INJURY_TYPE]. Given that I have 100 source tables and may 100 target tables, I probably need to have the source table names stored in one reference tables while having the target table names stored in a 2nd reference table. But that's part of the programming logic. As long as I have the ability to specific compare table "X" with table "Y" and then pick field "X1" with "Y1" (in this case, compare field [INJURY TYPE] with [tbl_TARGET_INJURY_TYPE].[TARGET_FIELD]).
2. Next is the "apples to apples" data value comparison. In this case, however, it's more like a "green apple (Granny Smith)" to "Apples" comparison. So, based on certain text criteria, I would find they keyword "apple" (in the source data) and match it to "Apples" in the target data. I fully understand the latter could get really complex.
3. Based on #2, I am 99% confident that NOT all values will be matched. However, if I could reach e.g., a 75% or even 80% match rate, I'd be thrilled. And then, for any record where no match could be found in the target source data, I merely would like to display "No match found!".
4. Lastly, and this is again, a programming "thingy", previous experts (accesstos & orange) provided a solution where -- amongst other -- the field [TARGET_FIELD] is already added to all of my lookup (LK) source tables. So, ideally, [LK_INJURY_TYPE].[TARGET_FIELD] would be populated with any matching values from [tbl_TARGET_INJURY_TYPE].[TARGET_FIELD].
I fully understand this is post/question falls into the BIG KAHUUNA category. I know it won't be easy to develop. However, any assistance w/ this "puzzle" would be greatly appreciated.
Thousand thanks in advance,
Tom