Results 1 to 12 of 12
  1. #1
    TerraEarth is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Mar 2018
    Posts
    86

    Suggestions on Mass Import of Data

    I have a database that I'll have to update each day, records going up to the thousands. Old records will be updated accordingly and new ones will be added in. The data format will either be an excel sheet or a .csv, maybe tab delimited.



    My question is, is there a way to set up Access to enforce checks and balances in regards to the integrity of the data? If some user accidentally imported an older file, it could throw the database out of whack. Do you guys have any suggestions on how to move forward?

  2. #2
    isladogs's Avatar
    isladogs is offline MVP / VIP
    Windows 10 Access 2010 32bit
    Join Date
    Jan 2014
    Location
    Somerset, UK
    Posts
    5,954
    The best solution is to automate the process so the latest file is automatically used to avoid the risk of error
    Failing that you need a reference datetime field somewhere in the import which can be compared with the equivalent stored in the database.
    Only allow the import if that datetime is newer
    Colin, Access MVP, Website, email
    The more I learn, the more I know I don't know. When I don't know, I keep quiet!
    If I don't know that I don't know, I don't know whether to answer

  3. #3
    TerraEarth is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Mar 2018
    Posts
    86
    Interesting, I can see how that would work. How then would you approach it if there was no date/time stamp on the import file?

  4. #4
    isladogs's Avatar
    isladogs is offline MVP / VIP
    Windows 10 Access 2010 32bit
    Join Date
    Jan 2014
    Location
    Somerset, UK
    Posts
    5,954
    Use the date created or the date modified property that all Windows files have
    Colin, Access MVP, Website, email
    The more I learn, the more I know I don't know. When I don't know, I keep quiet!
    If I don't know that I don't know, I don't know whether to answer

  5. #5
    TerraEarth is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Mar 2018
    Posts
    86
    A little update. I've been cleaning the data given to me (it's really messy and bad - the data that is) and then funneling that into Access. I use R to clean and form tables from the data after which I skim through the data to see any blatantly obvious errors (there are usually at least one or two) and correct them before importing them into Access.

    Thing is, I was wondering whether or not it would be better for me to have this done in Access instead? So currently I'm given data from two sources, one which is quite clean and accurate, the other being all over the place (multiple objects per field box, typos and errors galore). For example, it would look something like:

    Object Column
    Entry 1: Object 1-version0, Object 2-version1, Object3-version0 & Object4-version1 and Object5-version2

    I run the data through a script to parse out the Objects and to separate out the versions and to copy all the other columns and attach it to each cut out Object. I then only take the latest and greatest version and exclude the older ones.

  6. #6
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,399
    Don't know what R is, but pretty much anything can be done in Access, either using queries or vba or a combination of them both. Usual basis would be to link to the file, or import to a temporary table, then use append and update queries as required to move the data to your destination tables.

  7. #7
    turbofish is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2017
    Posts
    25
    Using R [statistical language] to clean data is kind of like using a chain saw to hammer in a nail. Any chance that you have access to a SQL server with SSIS? This task screams SSIS since that is what it was built for [ETL]

  8. #8
    TerraEarth is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Mar 2018
    Posts
    86
    Quote Originally Posted by turbofish View Post
    Using R [statistical language] to clean data is kind of like using a chain saw to hammer in a nail. Any chance that you have access to a SQL server with SSIS? This task screams SSIS since that is what it was built for [ETL]
    I don't believe I do. This is what I'm currently doing:

    1) I start out with two .csv files, a financial and a administrative import from a outside source. Both require cleaning; the administrative import is extremely messy.
    2) I feed the two .csv files into R and I clean and split the financial data into two tables: FiscalYears and Financial. Likewise the administrative data is split into two tables as well: Document and POCInfo
    3) I feed the cleaned tables into Access
    4) I created master tables for each table, so 4 total one for Fiscal Years, one for financial, one for Document and another for POCInfo.
    5) I use action queries to update each table in sequence depending on the relational position of the table in the database.

    I would use other things to accomplish what I'm doing here, only problem would be that the only language I ever worked with before is R.

    While I'm on the subject, would anyone care to tell me if its useful at all to use a linked table to update query a master table versus just importing the data as a table and then deleting it after use in the query?

  9. #9
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,399
    either method is OK. importing to a temporary table and then deleting will cause bloat (solved be regular compacting), but the usual practice would be to import to a temporary db. Bear in mind that all db's should be split anyway as a precaution against corruption - unless they are just being used as a 'quick and dirty' one off. Benefit of importing is you can add indexing which will improve performance for larger files

  10. #10
    TerraEarth is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Mar 2018
    Posts
    86
    Quote Originally Posted by Ajax View Post
    either method is OK. importing to a temporary table and then deleting will cause bloat (solved be regular compacting), but the usual practice would be to import to a temporary db. Bear in mind that all db's should be split anyway as a precaution against corruption - unless they are just being used as a 'quick and dirty' one off. Benefit of importing is you can add indexing which will improve performance for larger files
    Oh that's something to think about. I'm pretty much brand new to all this, Ive played around with access/watched videos/read a book on VBA but I'm still fairly green when it comes to what standard practices might be, I'm really just making a lot of this stuff up as I go along. If you or anyone else could suggest some texts or topics to research related to this I would be glad to hear it.

    Another question I have related to this would be if splitting -right now- is necessary to save work or would the transition be fairly simple? As in if I built all the code and features up in the database and then split it down the road would that pose any issues as opposed to splitting from the start?

  11. #11
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,399
    whilst in development you can leave unsplit until deployment. However make sure you take plenty of backups. My preference is to split once the basic table design and relationships have been determined - so typically before much code has been developed

  12. #12
    turbofish is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2017
    Posts
    25
    More questions:
    - How are you receiving the data? Email, FTP, a shared directory?
    - How consistent is data? You mentioned csv, excel, tab. Different formats for different data?
    - Are you looking for automation?
    - How big is the database at this point?
    - How big do you expect it to get?
    - Is this for reporting? If so, is the reports sent to others?

    Index notes: indexes will slow imports, not speed them up. When I'm doing a fairly large import [360+ million rows], I will create a new database without indexes, copy the data over and then recreate the indexes. You have to remember that when inserting new data, you aren't just writing the new rows, but to all of the indexes on the tables as well.

    ETL is one of those things that I get to work with quite a bit with my job. Sometimes I even do it just for nerd giggles. Right now my home machine is churning away inserting over a trillion rows from thousands of CVS files [gaia spacecraft data]. Before you get too deep into it, be sure to use the proper tool for the job.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 4
    Last Post: 08-14-2017, 10:46 AM
  2. Mass Import from Excel to aggregate data
    By swift1 in forum Access
    Replies: 2
    Last Post: 06-16-2015, 04:47 PM
  3. Mass Import Repetitive Txt Files
    By Todd84 in forum Import/Export Data
    Replies: 4
    Last Post: 01-03-2014, 12:04 AM
  4. VBA to Mass Import from Excel
    By Kimbertha in forum Programming
    Replies: 1
    Last Post: 07-27-2012, 12:22 PM
  5. How to import a mass set of xls files
    By pkstormy in forum Code Repository
    Replies: 0
    Last Post: 11-24-2010, 06:25 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums