Page 1 of 4 1234 LastLast
Results 1 to 15 of 54
  1. #1
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046

    VBA Data Import Routine (multipe XLS into Access) with duplicate fields/values into DISTINCT tables

    Hello Experts:

    I would like to develop an automated *data import routine* (from Excel into Access). Please find attached 3 sample XLS. I used http://www.randat.com/ to generate these *random* data.

    Allow me to recap the information of the (sample) spreadsheets first:
    - I have three organizations (Organization_A; Organization_B; Organization_C)
    - WRT to field names (XLS columns) there is some overlap; however, some organizations may have additional fields/columns that the other two don't have. And, in actuality, I will have more than just 3 XLS... but for sake of arguments, XLS = 3.
    - That is, Org_A has fields: FirstName; LastName; Gender; Age; Email; Phone; Education; Occupation; MaritalStatus
    - Org_B has fields: FirstName; LastName; Gender; Age; Salary; NumberOfChildren


    - Org_C has fields: LastName; Gender; Age
    - Also, across the same fields (e.g., "FirstName") there is a good chance that multiple values such as "Marcus" may exist in one or more (potentially all) spreadsheets. Duplicates may/will exist in other fields, too.

    Here's what I'd like to achieve:
    - Place all 3 (or more if available) spreadsheets into a folder
    - Open up MS-Access and use a VBA function (e.g., called from a form) which imports all DISTINCT data into # (n) of separate tables (one table for each field across n number of spreadsheets).
    - So, let's take the example of [FirstName].
    - Field [FirstName] exists in Org_A (50 records); Field [FirstName] also exists in Org_B (30) records; Field [FirstName], however, does NOT exist in Org_C
    - So, **without the DISTINCT**, there's a total of 80 records that need to be imported into a table. I'd like the table to have the lookup prefix "LK_". So, ultimately, I want these 80 records to be imported into "LK_FirstName".
    - However, given that there are some duplicate first names across the two spreadsheets, I only want to have the DISTINCT value set. In this case, only 66 records should be imported into "LK_FirstName".
    - Finally, and maybe I should have mentioned this before, the total # of fields/columns for all 3 spreadsheets = 20. However, given that I want to merge all similar fields/columns, the DISTINCT fields/columns is only eleven (11).

    Final product:
    * Once the import routine has been executed, I should see 11 tables:
    1. LK_Age (with 23 distinct values)
    2. LK_Education (with 6 distinct values)
    3. LK_Email (with 30 distinct values)
    4. LK_FirstName (with 66 distinct values)
    5. LK_Gender (with 2 distinct values)
    6. LK_LastName (with 63 distinct values)
    7. LK_MaritalStatus (with 3 distinct values)
    8. LK_NumberOfChildren (with 6 distint values)
    9. LK_Occupation (with 24 distinct values)
    10. LK_Phone (with 30 distinct values)
    11. LK_Salary (with 50 distinct values)

    Last, but not least, my actual data set *may* also have some NULL values or " " values (they appear to be empty but are not truly NULL). Naturally, neither one needs (NULL or " ") needs to be imported into the lookup tables.

    My question:
    What would the, e.g., VBA look like to import all # (n) number of spreadsheets and end up with the appropriate number of "LK_..." tables (one for each distinct field)?

    Thank you for your help in advance!

    Cheers,
    Tom
    Attached Files Attached Files

  2. #2
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    Tom,

    Thanks for getting back. I see you're in the analysis stage. Do you have a data model for the "new and improved database"?
    You may get some ideas from info in the Database Planning and Design link in my signature. "Stump the model" with people familiar with any/all of the 3 organizations you mentioned may get everyone on the same wavelength.
    The youtube by Alec that I suggested recently was intended to help with the variations in the names and definitions of things when merging disparate systems.

  3. #3
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Orange -- thank you for the prompt response. We're currently working w/ the designer to get the model of the new system.

    The above example, however, pertains to the old legacy systems. Ultimately, we'll pull multiple flat files (from Excel). Once we utilize a "VLOOKUP" function to standardize all field names (e.g., original XLS of Org_A may show "First Name" while Org_B uses "NameFirst" and Org_C uses "FirstName" w/o the space). But again, the the preliminary cleanup. Upon doing so, however, we're ready to bring all legacy data into a single file in order to begin -- where required -- data value transformation).

    But I'm getting ahead of myself... at the present time, I'm merely trying to automate (semi-automate) -- in Access -- the merging of the same fields into one lookup table. If you have any feedback whether or not such VBA code has been developed, I'd welcome any pointers.

  4. #4
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    I don't think there is a readily available function/routine to do what you're asking. Is this all the data involved in the legacy systems?
    I would/did import the Excel files into separate Access tables and work within Access. (but I'm not an excel person)

    You can work from this code set up to create your tables
    Code:
    SELECT * INTO LK_FirstName from
    (SELECT distinct firstname,"orgA" as Org from orga
    Union 
    SELECT distinct firstname,"orgB" as Org from orgb)
    You could make it a function, and use the Distinct fieldname list below to create your LK tables.

    I added the table name from where the data originated-in case you need it for reconciliation. But with less than 100 records in each of the tables, I don't think you need it. If you had 100s or 1000s of records, then additional attributes eg email, phone.. can be used to separate "apparent duplicates".

    So with the above, drop the Org stuff and get your LK_firstname table.

    Also, if you use the data dictionary "thingy" I provided previously, you can use this set up to get distinct field names across your 3 tables.

    Code:
    SELECT Distinct data_dictionary.field_name
    FROM data_dictionary
    WHERE (((data_dictionary.field_name)<>"ID") AND 
    (Not 
    ((data_dictionary.table_name)="data_dictionary" Or 
    (data_dictionary.table_name) Like "LK*"))
    );
    To give:
    field_name
    Age
    Education
    Email
    FirstName
    Gender
    LastName
    MaritalStatus
    NumberOfChildren
    Occupation
    Phone
    Salary

  5. #5
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Orange -- I think your solution will work. Allow me to ask a follow-up question pertaining the existing code:

    Code:
    SELECT * INTO LK_FirstName from
    (SELECT distinct firstname,"orgA" as Org from orga
    Union 
    SELECT distinct firstname,"orgB" as Org from orgb)

    The above SQL would suggest that the analyst had the opportunity to identify that "FirstName" does NOT exist in OrgC.

    Given the volume of fields, however, it would be much more convenient to include all organizations in the query (regardless of the field existing or not). In other words, I would create a *TEMPLATE SQL* and then merely change the fieldname from, e.g., "FirstName" to "Age".

    Code:
    SELECT * INTO LK_FirstName from (
    
    SELECT distinct firstname from orga
    Union 
    
    SELECT distinct firstname from orgb
    Union 
    
    SELECT distinct firstname from orgc
    )
    When I tried executing this code, the query opened a dialog box 'firstname'. I would presume that is due to OrgC NOT having this field.

    So, my question... is there a way I could tweak the modified SQL AND (somehow) include OrgC in the code with the understanding that first names only from OrgA and OrgB would be added to the table? Again, it would make the management of the various MakeTable queries so much easier.

    Thank you,
    Tom

  6. #6
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    I think this would work

    Code:
    SELECT * INTO LK_FirstName from
    (SELECT distinct firstname  from orga
    Union 
    SELECT distinct firstname  from orgb
    union 
    select distinct firstname from Orgc);
    Maybe???
    Hit ok when you get the prompt.
    Note: You may end up with a Null valued record
    If so, then Manually delete the Null record.

    How much data do you really have?

  7. #7
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Ok... that is definitely an option.

    Now, I just created the 11 SQL queries and added them to a form's listbox.

    Code:
    Private Sub Listbox_Forms_AfterUpdate()
            
        'View Table FIELDS
        If Me.Listbox_Forms = "Create Lookup Tables" Then
        
            DoCmd.SetWarnings False
            
            DoCmd.OpenQuery "qry_Age"
            DoCmd.OpenQuery "qry_Education"
            DoCmd.OpenQuery "qry_Email"
            DoCmd.OpenQuery "qry_FirstName"
            DoCmd.OpenQuery "qry_Gender"
            DoCmd.OpenQuery "qry_LastName"
            DoCmd.OpenQuery "qry_MaritalStatus"
            DoCmd.OpenQuery "qry_NumberOfChildren"
            DoCmd.OpenQuery "qry_Occupation"
            DoCmd.OpenQuery "qry_Phone"
            DoCmd.OpenQuery "qry_Salary"
            
            DoCmd.SetWarnings True
            
        End If
        
    End Sub

    How can I prevent the need to click on "Ok" for each of the dialogue boxes (parameter value box)? Also, is there a way to get rid of the single NULL value?

    # of fields: Nearly 200
    # of records: Several thousand (depending on the source)

  8. #8
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    Since there are only 11 queries, seems reasonable to look at each of the tables and modify the query(s) accordingly.
    If the field involved is not in TableA, then remove the table A reference -similarly for B and C.

    You could create some more extensive vba to check if the field was in tableA if so then build the sql, then check tableB, then tableC. But that may be more effort than required.

    I see this activity more of vba to create the code that would eventually be executed. More like building a tool to address a specific problem. Just trying to quantify how big the "problem" is.

    With 200 fields across multiple tables, you may want adjust your strategy. Sometimes when removing replicates you want more attributes of specific record to identify this field value/record from others.

    Can you give us some more info /details re the 200 fields? What tables would these span?
    There may be ways to use the data dictionary stuff to simplify some coding, but more info needed.

    Update:

    Just ran this query to see which table(s) contain each field. Could use this sort of thing to adjust your query definitions.

    Code:
    SELECT FieldNamesQ.field_name, data_dictionary.table_name
    FROM FieldNamesQ INNER JOIN data_dictionary ON
     FieldNamesQ.field_name = data_dictionary.field_name
    WHERE ((Not (([data_dictionary].[table_name])="data_dictionary" Or ([data_dictionary].[table_name]) Like "LK*")));
    Returns:

    field_name table_name
    Age OrgA
    Age OrgB
    Age OrgC
    Education OrgB
    Education OrgA
    Email OrgA
    FirstName OrgB
    FirstName OrgA
    Gender OrgC
    Gender OrgA
    Gender OrgB
    LastName OrgB
    LastName OrgA
    LastName OrgC
    MaritalStatus OrgA
    NumberOfChildren OrgB
    Occupation OrgA
    Phone OrgA
    Salary OrgB


    Almost forgot:

    Here is query FieldNamesQ

    Code:
    SELECT DISTINCT data_dictionary.field_name
    FROM data_dictionary
    WHERE (((data_dictionary.field_name)<>"ID") AND
     ((([data_dictionary].[table_name])="data_dictionary" Or 
    ([data_dictionary].[table_name]) Like "LK*")=False));
    Last edited by orange; 03-17-2021 at 04:05 PM. Reason: added missing query

  9. #9
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Ok... yes, maybe the juice is not worth the squeeze; thus, manual manipulation/identification of which fields exist/do not exist in a source should be done manually. Just thought there might be a way over overwriting the parameter value box.

    I'll do some more digging on this process... your initially proposed SQL statement are definitely helpful!

  10. #10
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Orange -- I'm getting a few errors (referencing missing table names and field names) when running the Data Dictionary "thingy". Do I need to set up a table for the data dictionary first? Not sure exactly how to implement this recommendation?

  11. #11
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    No need to set up a table.

    You run DocumentTables sub from the module. The routine creates a table and fills it based on the tables and fields in the database.

    I was thinking of another way to do this, but it's mostly vba.

    Create a table for each fieldname, make a unique index of the fieldname to prevent duplicates.
    Using the Field usage info create and run sql to fill the table with the fieldname values. Since there's a unique index, it won't allow duplicates, you get distinct values.

    If you run the document tables and have an issue, send me a note and we can work it out.

  12. #12
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Orange -- forgive me please, but I'm not fully understanding the proposed logic.

    Let's recap the existing process:
    1. For this example, I have 3 XLSX. Copy them into a subfolder (e.g., "C:\MyDocuments\Spreadsheets").
    2. Open up the attached DB and -- in form "F001_MainMenu" -- change the strPath reference to your folder location (see above).
    3. In form "F001_MainMenu", click on "Import Source Data". This should replace the 3 XLSX files.
    4. Click on "Create Lookup Tables". In this example, 11 LK tables will be created.

    Additional information:
    - Based on the actual data, I may have 200 fields across multiple spreadsheets.
    - Ultimately, out of the 200 fields/columns, there will be some overlap so the distinct numbers of fields maybe (let's say) 150 fields though.
    - Thus, I will end up with 150 "make table" queries.
    - Based on the latter, running the "Create Lookup Tables" command (in form) will then create 150 Lookup tables... .. so far so good.

    What I wanted to achieve:
    - Given your recommended SQL code, it automatically creates the Lookup tables... great.
    - However, depending on the source file (Excel) I will have different fields/columns in each Excel.
    - So, I was hoping to use a generic SQL template where I merely have to modify the fieldname. However, when including all source files (e.g., Org_C in the FirstName query), I will get prompted with the parameter dialogue box + (as you correctly indicated) I will also have an additional NULL value for each lookup table.
    - Thus, the latter is not entirely desirable.

    Data Dictionary:
    - I like the idea of having the "Data Dictionary" which you proposed.
    - You suggest to "Create a table for each fieldname"... does that mean, I would end up with 300 tables (150 for the lookup tables and 150 for the data dictionary fields)? If so, that's too many tables in my view.

    In the event I misunderstood you, could you please provide additional pointers. If all possible, you could please use the existing sample DB to include the required code? Again, my goal is to have one lookup table for each distnct field. Duplicating the # of tables would not be ideal though.

    Thank you,
    Tom
    Attached Files Attached Files

  13. #13
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    Tom,

    I'm attaching a zip with a database.
    I imported your xls files into tables OrgA, B and C.
    Then in module1 I created 2 routines.
    createLK creates the LK_ tables and populates them with data from the OrgA, B and C.

    The Deletes routine removes the LK- tables structure and data when testing.

    The only issue is that I made all fields short text to simplify the sql.

    With the database, run the code from the module1
    -create, then if you need to
    -deletes

    It's fairly quick and dirty, but I don't know the details of the requirement.

    Let me know if it's useful or not.
    Attached Files Attached Files

  14. #14
    skydivetom is offline VIP
    Windows 8 Access 2010 64bit
    Join Date
    Feb 2019
    Posts
    1,046
    Wow... this is great... lots of code to digest. I'll see how I can translate it the actual requirements.

    For starters, I opened up table OrgC and added a new field [Nickname]. Then, I populated the new field w/ 20 values.

    Next, I executed the procedure in Module1. At this time, I had thought/hoped that -- besides the 11 LK tables -- I would also generate LK_Nickname. Unfortunately, that wasn't the case.

    So, in the event of new fields being added to the "mix", do I first need to include that field in table "data_dictionary". If not, where would I have to modify something to automatically include any new fields?

    Also, what's the purpose of "Query2"... can that one be deleted?

  15. #15
    orange's Avatar
    orange is offline Moderator
    Windows 10 Office 365
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,725
    If you add tables or fields you have to rerun the data dictionary. Because module1 stuff uses the latest info.
    Query2 was my test stuff and can be deleted. I adjusted the approach so you can simply insert/append the fields to the lk_ table based on
    query FieldNameUsageQ.

    I'm still thinking about the numeric values for age, Salary and NumberOfChildren.
    My thinking is (untested) if I included the datatype with fieldnameUsageQ, then I could see if the fieldname was text or numeric, then run the appropriate SQL.

    Something like CreateTableSqlNumeric = " create table LK_XXX ( XXX Number );" for those fields that are numeric.
    I'll take a further look in the morning.

Page 1 of 4 1234 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 9
    Last Post: 08-01-2019, 06:09 PM
  2. Replies: 3
    Last Post: 05-06-2015, 12:19 PM
  3. Relating Multipe Tables With Similar Data To A Master Table
    By TotalChaos in forum Database Design
    Replies: 1
    Last Post: 04-13-2013, 12:57 AM
  4. Replies: 2
    Last Post: 03-28-2013, 06:21 PM
  5. Replies: 7
    Last Post: 03-17-2013, 07:12 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums