Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35

    How does Access determine field formatting when all tables are linked Excel files?

    Originally posted in Excel, but think it fits here instead.

    I have Access combining data records from a bunch of Excel files into a full table, adding fields containing data from a couple lookup tables. That query is linked to in a couple Excel files where the table data is used to for analysis and reporting, mainly using Excel functions like Index/Match and Sumproduct.

    Some of the field formatting in the final query doesn't seem to match whats in the Excel transaction files. Specifically, some of the fields containing numeric data are shown as numbers stored as text. I'm assuming Access tables use the field formatting in the linked Excel files, so somewhere between the tables and the analysis and reporting Excel files, some field formatting is changing.

    All the Access tables are linked Excel files, and include the transaction data tables, a few tables containing lookup data, and a table that has some constants used to control the date range of the final query.

    There are five queries, four of which are Unions. I think there are limits on the number of Select statements that can be in a single Union, so there are three that collectively include all the tables, and one that is the union of those three.

    The final query simple selects every record from the last Union, and adds field data from the lookups.

    There's no field formatting or any data changes in Access. The only field use, other than just inclusion, is a date range selection and a single field sort in the final query.

    I have pretty high confidence that the linked Excel files are all formatted correctly, but haven't yet ruled that out completely. I did validate the files listed first in each of the three unions, assuming that queries might decide on field formatting based on the format of the first record they encounter.

    I've been looking at the first record encountered by each of the three Unions that collectively select all the linked Excel tables to see the actual field data. Again, I'm assuming Access queries might do field formatting based on the format of field data in the first record.



    Am I on the right path? Assuming there is no explicit field formatting in Access in any of the tables or queries, does Access decide field formatting for linked Excel tables based on the formatting of the fields in the Excel files? Do queries decide on field formatting based on the data in the first record encountered?

    I have encountered some differences in the order that the Unions are extracting the data, so have been sidetracked a little figuring things out. What I expected was that records from the query would be in the same order as the Select statements in each Union, but one of the Unions didn't sequence that way. For ease of maintenance, my practice had been to order the Union Select statements in the alphabetically by file name, but I noticed that wasn't the case in one of the Unions which appeared as if it sorted the Selects into file alphabetic order before executing the query. I'll do some more on this today, but it left some doubt about what data record the Union is actually first encountering.

    Tom

  2. #2
    isladogs's Avatar
    isladogs is offline MVP / VIP
    Windows 10 Access 2010 32bit
    Join Date
    Jan 2014
    Location
    Somerset, UK
    Posts
    5,954
    Hopefully I'm interpreting your post correctly ....

    When Excel files are imported to Access, it decides the datatype of each field and the field size based on the first few records. About 5 or 6 if I recall correctly.
    That can cause problems where a text field has number values in the initial few rows or where the field size is much larger in later records.

    For reasons like that, I always import to a 'temporary' buffer or staging table then do any necessary processing of that data before importing to the final tables.
    After final importing of the data, empty the buffer tables.

    Using that approach ALWAYS works
    Colin, Access MVP, Website, email
    The more I learn, the more I know I don't know. When I don't know, I keep quiet!
    If I don't know that I don't know, I don't know whether to answer

  3. #3
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    In addition to Colin's comments if the first few rows of a numeric column are blank, it will default to text.

    Excel is often problematic in that the cell types are variants (whereas databases need typed fields), not to mention users doing things like add a ? or comment to a date or number value, changing names, swapping columns round etc. much better to treat everything as text, then when you import, you know where you are starting from. Either save the excel file as a .csv, or set hasheaders=false which means the header row gets imported as well which forces it to be text. The header row can then be deleted or ignored depending on what you are doing - field names will be F1, F2, F3 etc which need to be 'translated' but users shouldn't be seeing that

  4. #4
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Thanks both of you for the posts. It's helpful to know to look at more than just the first records in the tables.

    I've been using this setup for a long time, so am confident all the transaction Excel files have the same fields in the same order. There are half dozen fields that contain dollar data, and only a couple are ending up as numbers stored as text. I think I can avoid putting in protections, like temporary files, or doing any formatting in Access since there should be no non-numeric data in the records.

    There are blanks, but they occur in every record in at least one of the dollar fields (these are financial transactions and have either a debit or credit, and those are separate fields in the tables. One or the other is always blank).

    Sounds like at least one of the linked Excel files actually has some text on a couple of the dollar fields in the first few records.

    I'll post back testing results.

    I'll also post the question of how the Unions sequence the data in another thread.

  5. #5
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    don't confuse formatting with data types. Just because it looks like a duck, doesn't make it a duck.

    Sounds like at least one of the linked Excel files actually has some text on a couple of the dollar fields in the first few records.
    the dollar sign, if used, can often cause access to think it is text - plus the fields might contain trailing spaces or other hidden characters.

    I think I can avoid putting in protections, like temporary files, or doing any formatting in Access since there should be no non-numeric data in the records.
    maybe you can, maybe you can't, but why take the risk? (you do say 'should') a couple of hours spent now will potentially save hours and hours down the line.

    But it is your app, good luck

  6. #6
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Thanks for hanging in there, and the caution of formatting vs. data types. I think I understand the differences but am probably using the wrong term. BTW, I don't display the dollar signs, so it's probably something else.

    Although originally intending to do everything in Access, I already had experience with using Excel so ended up in the current design to avoided the learning curve. I did try the design without any Access, but found Excel a bit cumbersome with the steps to bring all the transaction data into one table.

    Maybe it is time to put in some protections against these numeric stored as text situations.

    Would you or Colin expand a little on how to do that? Just a reminder, the current design has almost 70 linked Excel tables. Although willing, I'm reluctant to do much consolidation of those files partially because I want to limit the number of records in each. While adding transaction data, usually when statements for the accounts are generated, I do some editing to enter lookup keys in a couple fields (account, category) using Data Validation to select from lists. During the process, it's not unusual to sort the data which, with larger volumes, can take a bit of time.

    With a few exceptions, all of those linked Excel file have only one financial "account", and the files are named for those accounts. This makes it convenient to open the file with the account transactions I'm going to work on. The exceptions that contain multiple accounts have the suffix "_group" added to the file name, and are used for sets of accounts that have some similarity, but with low transaction volumes. One example is an "assets" group that has transactions for appliance, furniture, jewelry, etc. accounts that rarely require updates.

    Colin's approach seems to use a buffer, but could use some help understand how to do that in the current design.

    Tom

  7. #7
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    I have a number of different ways - it really depends on how many fields and volume of data. But all rely on interrogating the excel file as text (the no header route) for anomalies before importing. Also has the benefit that the same code can be applied to text files. typically you would use transferspreadsheet to link to the excel file or transfertext to text files, but there are other ways.

    If the final destination is an access table I can reference the tabledef to ascertain the field properties. Typically these will be number type (integer/long/etc), and string length. If access is just a staging post for another db such as sql server or there are other considerations such as number should be positive, dates must be later than a certain date, whatever, I would have a meta table that maps source data to final table.

    the file is interrogated and any values out of spec are reported and typically the import is aborted for the user to fix the file.

    Interrogation is generally automated because validation is pretty straight forward, just one query generated as a recordset in vba. For a file with a number column and a text column you might build a query that looks something like this

    SELECT *
    FROM tblXLtxt
    WHERE [F1]<>"numfield" AND (not isnumeric([F1]) OR len([F2])>50)


    in vba if the recordset count is 0 then there are no data errors so ok to continue (otherwise generate an error report for the user)

    to continue, you run an append query


    INSERT INTO destTable (numfieldname, txtfieldname)
    SELECT F1, F2 FROM tblXLtxt WHERE [F1]<>"numfield"

    In this way you do not have to fix anything once it is in the db, everything is checked before import - and you have a useful report that the user can use to navigate to the problem rows.

    Part of the process can also be a log to record what has been imported, what has been rejected.

    With regards sorting, I wouldn't bother. Access will store the data randomly in the db anyway. Often it will appear to be in order, but I would never want to rely on it.

  8. #8
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Thanks. I think that's what I'm doing (i.e. checking the Excel files for the anomalies). There are 19 fields, 9 are numeric, 8 of them are for dollar values (formatted as numeric with 2 decimal places) and one is a price with has 3 decimal places. All the rest are text. In total there are over 62,000 records spread across 68 files.

    Because of the number of Excel files (68), I'm testing groups of them by only including them in the Unions, and then looking at the results in the reporting Excel files. So far I've been able to eliminate all in the files in the "archive" and all the "inactive" Unions (24 files), and am currently checking a group of about half of the files in the "current" Union. Once I cut the list down some, I'll look in each of the Excel files as you describe. My suspicion is there maybe as few as one Excel file that needs to be corrected.

    Colin seemed to imply there was a buffer in Access that somehow corrected any of these possible situations. I may not be understanding yet.

  9. #9
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    the buffer is where you identify issues, not sure how Colin does it, but I've explained how I do it. It doesn't fix them per se although you can do things like convert a 'text' number to a 'number' number. without it being reported as bad data. But I come from the school of thought that if something is wrong 'send it back'. If you fix it and fix it 'wrong' you've lost the link between data coming in and data going out. It's more work for you and whoever sent it in doesn't learn to do it right.

  10. #10
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Thanks Alex

    I may not be doing a good job of explaining. Not surprising since my comments are getting more focused as I do some testing.

    I know how to convert the numbers stored as text to actual numbers, but I don't bother with it when routinely working with the data because it doesn't appear to impact the analysis and reporting.

    So this post, and another about Union sequencing, are just because this shouldn't be happening, and, as you point out, should be corrected. This is a personal finance system so the "whoever sent it", along with all other roles, is me.

    Decided to open all of the 68 Excel transaction files, but didn't find anything obvious that would cause Access to format only 3 of 9 numeric fields in a way where the data would be stored as text numbers. I'm assuming I must have missed something, and would rather not open them all again, so am looking for a next step that would narrow the number of possible files that need to be checked again.

    I learned here that the first few records (not just the first record) are what Access uses to decide on data formatting, so is it true that the sequence of records in a Union are the same as the actual sequence of records it encountered? In other words, will the formatting decision be from the first few records in the Union that is later used in a Query. Note, that the Union does not have an ORDER BY clause or any other controls or selection criteria.

    [I do have another post about how Access sequences Union tables. My initial testing showed inconsistencies.]

    Please hang in there. Once we find the cause, I'll make sure to post a summary.

  11. #11
    isladogs's Avatar
    isladogs is offline MVP / VIP
    Windows 10 Access 2010 32bit
    Join Date
    Jan 2014
    Location
    Somerset, UK
    Posts
    5,954
    To expand on my original suggestion, for many years I used to import and process a large amount of student/staff & school data from an external database.
    As this was needed to be done when the network was quiet, the process was automated so it ran without any human intervention using a scheduled task each night - therefore it had to work without fail.

    The process was as follows:
    1. The scheduled task exported 30 CSV files each night to a specified location then opened the Access app that did the rest of the work. CSV files were used to ensure ALL fields were text at this point
    2. The CSV files were linked to to the Access app and were then imported to 'temporary' Import tables (AKA buffer or staging tables).
    3. A lengthy routine then went through each table in turn and ran a series of append/update/delete SQL statements and/or recordset equivalents to update the main tables (which were in SQL Server)
    Non text fields were explicitly specified where necessary using e.g. CLng, CDate, CInt etc to ensure the data was in the correct format when imported.
    4. After all this was done, the temporary tables were emptied and the Access app (effectively a 'side-end app') was then closed.
    To prevent 'bloat' a new copy of the side end app was used each night
    5. During processing each stage was logged and on completion an email was sent automatically to the system admin to check for any issues the next morning.
    This ensured the the main Access FE (which was also linked to SQL Server) would work perfectly and with the latest imported data each morning when staff arrived.

    There was a lot more to it than that but that's the parts that may be relevant to you.
    There was never any reason to use UNION queries in this sequence.
    Where data was needed from several sources, each was appended/updated in turn e.g. the latest student attendance marks for each year group

    Hope all that makes sense. It took a while to develop but it was just about foolproof and avoided any of the issues you are describing

    EDIT: if you want to know what the various limits are in Access queries etc, see https://support.microsoft.com/en-us/...8-98c1025bb47c
    Colin, Access MVP, Website, email
    The more I learn, the more I know I don't know. When I don't know, I keep quiet!
    If I don't know that I don't know, I don't know whether to answer

  12. #12
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Thanks Colin,

    Good example. I understand the key parts.

    Turns out I've been able to isolate the linked Excel file that contains the numeric field(s) that Access formats as text when queried, and want to go back to the original question to fine-tune the understanding of when Access does this.

    Your response was:

    When Excel files are imported to Access, it decides the datatype of each field and the field size based on the first few records. About 5 or 6 if I recall correctly.
    That can cause problems where a text field has number values in the initial few rows or where the field size is much larger in later records.
    This scenario wasn't numbers in text fields; it was fields defined as numeric in the linked Excel table, ending up as text (i.e. numbers stored as text), but the question's the same.

    To simplify, I was able to narrow it down to two linked Excel tables, and focus on two of the numeric fields. Only one of the tables formatted only one of the fields as text.

    Looking at the first six records, there is no text (including blanks). Cells either have a number, or are empty.

    These are the values in the first six records in each table:





    Other than that actual number values, both start out with empty cells in the first record. The only difference appears to be that in one there's at least one record with a number in the first few records. The first example formatted the Credit column as text.

    Is it accurate to say that Access will format a numeric field as text if there isn't a number in one of first few records?

    It was easy to correct the situation by just putting a zero in the first cell of the Credit column in the first table. Looking at your approach to the school data, it may make sense in my case to force some formatting in Access. Given the number of input tables I have, I'd rather not go back to re-save them as text (csv).

    I'll do some homework on how I might force Access to format fields during a query.

    Thanks again.

  13. #13
    Tom Carp is offline Advanced Beginner
    Windows 10 Access 2016
    Join Date
    Sep 2020
    Posts
    35
    Doesn't appear that the pasted screen shots are showing up.

    The difference between the first and second tables were that one has a number in each of the columns in the first six records. The other did not. The latter formatted the field as text.
    Last edited by Tom Carp; 09-05-2020 at 08:19 AM. Reason: typo

  14. #14
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    Cells either have a number, or are empty.
    not quite clear what this is describing. if they are all empty in the first few rows, the column is treated as text
    Is it accurate to say that Access will format a numeric field as text if there isn't a number in one of first few records?
    a good enough description of the situation

  15. #15
    isladogs's Avatar
    isladogs is offline MVP / VIP
    Windows 10 Access 2010 32bit
    Join Date
    Jan 2014
    Location
    Somerset, UK
    Posts
    5,954
    I agree with Ajax.
    There is a good reason why the fields default as text if there is no data. Text fields will accept numbers/dates etc but the converse isn't true.

    Although you are trying not to 'reinvent the wheel', in the end I think it will be less work for you to do so.
    As it is you are spending a lot of time investigating a situation rather than managing it with a better approach.

    Good luck whatever you decide to do.
    Colin, Access MVP, Website, email
    The more I learn, the more I know I don't know. When I don't know, I keep quiet!
    If I don't know that I don't know, I don't know whether to answer

Page 1 of 2 12 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 6
    Last Post: 04-18-2018, 06:56 AM
  2. Replies: 11
    Last Post: 09-07-2017, 09:20 AM
  3. Replies: 3
    Last Post: 04-07-2016, 04:36 AM
  4. Excel files on sharepoint linked to access
    By Bertrand82 in forum Access
    Replies: 3
    Last Post: 01-31-2013, 11:57 PM
  5. Replies: 10
    Last Post: 12-28-2012, 02:06 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums