Experts:
I need some general recommendation for *screening* AND *validating* data values across multiple tables.
Allow me to provide some background first:
- First of all, the attached database includes only **EXAMPLE DATA**. However, I attempted to mimic the essence of the actual data.
- The sample DB includes only 4 source tables (Source1, Source2, Source3, Source4). My actual data set includes dozens of tables.
- Finally, the sample tables includes at the most up to 6 fields per table. My actual data tables *may* include up to 150 fields each.
Additional background:
- The 5th table (00_tblFields) includes the total # of fields based on the 4 source tables.
- Also, 00_tblFields contains a boolean field "PotentialDuplicate" as well as a field referencing the source table (for that particular field).
Existing Process:
- First, crosstab query "qry00_Transformation_AllFields" transposes all DISTINCT field names into a matrix and places a "●" under each source column that includes that particular field name.
- Next, crosstab query "qry02_Transformation_PotentialDuplicates" only displays those fields where the analyst placed a checkmark into field "PotentialDuplicate" (in table 00_tblFields).
- So, ultimately, based on the intial screening of **fieldnames**, it appears that "Gender" (from Source1) & "Gender1" (from Source2) may be include the same type of data. At this point no actual data screening has occurred yet.
- Similarly, there are 4 source tables which include a fieldname for last name. 3 of the 4 source tables, however, have a different spelling of last name field (Last Name, Last_Name, LastName). As a result, all 4 fields were checked [00_tblFields].[PotentialDuplicate].
I hope all above makes sense thus far.
Data Screening (Background):
a. Based on the initial hunch a) 2 gender-related fields and b) 4 last name fields may hold similar data (or data types).
b. However, based on a., I do need to validate/verify that the latter is actually true. Now, based on the example data, it would be extremely easy to quickly open up 4 tables and glance at those up to 6 fields.
c. Reality is though that step b. is difficult when looking at the actual organizational data (again, each source table may include 100+ fields and 1000s of records).
d. Also, given that each table has a different structure, it doesn't make sense to attempt to put all of the fields into a single table. Besides, Access (I think) cannot have more than ~250 fields in a single table.
Data Screening (Manual Process) -- when performing manual data screening the following should be realized:
- Table/field [Source1].[Gender] includes values = "M" and "F"; alternatively, table/field [Source2].[Gender1] includes values equal to "Male" and "Female". Ultimately though, both source tables describe the same type of data. Thus, the user/analyst should synchronize both *fieldname* and *underlying data*.
- As part of the manual review, all four source tables include actual last names EXCEPT table Source2. Here -- for whatever reason -- date values are stored in the field [Last Name].
Now, here's what I need some help with:
- I would like to use the *most efficient* way to quickly screen data of the potential duplicate fields across the 4 source tables.
- Based on the latter, I'm not entire certain if I should have 4 queries open at the same time and then quickly look/screen for the underlying data values.
- So, my question is as follows: Given the different tables structures, is there an way to somehow join/union the four tables and then quickly drag/down the candidate fields into one (1) query so that I can realize that "M/F" = "Male/Female" but that "Lastname" (date) does NOT equal "LastName" (string)?
So, any ideas as to how I could quickly identify any a) equivalent data types OR b) data type mismatches would be greatly appreciated.
Thank you in advance!
Tom