Results 1 to 4 of 4
  1. #1
    EvanRosenlieb is offline Advanced Beginner
    Windows 7 64bit Access 2010 64bit
    Join Date
    Jun 2011
    Location
    Fort Collins, CO
    Posts
    37

    What is the most normal way to store datasets repeatedly measured over time?

    Hello all,



    I am currently working on project that is dealing with census data. We received a table from our research associates, which has data from three census years. It is currently organized in an awkward fashion with the following columns:

    FIPS, a numeric interval field, which is a unique identifier of county

    The "Item" field, a text field that describes the data that row contains, for instance "number of Households 1990", "Number of Households 2000", "Population 1990", "Median Family income 1980" et cetera.

    The "Data" field, which is a numeric field that contains a number that relates to whatever category the item field describes.

    Obviously, at the very least, this table needs to be decomposed so that there is a table for each category, i.e. a table for Population by county, a table for median family income per county. This solved many problems, not the least of which is that the "Data" column no longer contains numbers that represent different units.

    However, I am little more confused about how the year by year data should be saved. It seems that there are multiple ways of organizing the data that do not explicitly violate normality.

    1.) There could be one "Population by county by year" table, that would contain the columns "County Code", "Year", and "Population". The key would be the combination of county and year, with one additional informational column which is "population". This would be the easiest to do initially.

    2.) There could be another "Population by county by year" table, but instead would have columns, "County Code", "Pop 1980", "Pop 1990", and "Pop 2000". I don't think that this violates normality, as far as I can figure, but it still makes a little bit nervous because if we wanted to add 2010 census data later then we would have to add to data to fields that already exist instead of adding new fields.

    3.) There could be three Population tables: "Population by county 1980", "Population by county 1990", "Population by county 2000", each one simply having the columns "County Code" and "Population". My general instincts tell me that this may be the most proper way, as I want to be able to decompose when there isn't an explicit reason not too. However, functionally, I can't really think of how this would actually operate differently than option one.

    What do you more-experienced-than-me people think? Am I correct that none of these violate normality? If none of them do, is there a reason why one of them would be the most proper/robust/flexible?

    Thank you for your help ahead of time!

  2. #2
    pbaldy's Avatar
    pbaldy is offline Who is John Galt?
    Windows XP Access 2007
    Join Date
    Feb 2010
    Location
    Nevada, USA
    Posts
    22,652
    Deep subject, but I'll dive in and start. 2 & 3 are definitely not normalized:

    http://www.r937.com/Relational.html

    The discussion around Figure 7 is relevant, particularly to 2. 3 is basically the same but with tables for each year instead of fields. What are you going to do when you want to compare across years (population growth type of thing)? With the data in one table, you can isolate a particular year or compare years easily. The fact that a new batch of data would require design changes to the database is a tip-off that it's not normalized, and both 2 & 3 would require them.
    Paul (wino moderator)
    MS Access MVP 2007-2019
    www.BaldyWeb.com

  3. #3
    EvanRosenlieb is offline Advanced Beginner
    Windows 7 64bit Access 2010 64bit
    Join Date
    Jun 2011
    Location
    Fort Collins, CO
    Posts
    37
    Pbaldy,

    Thanks for the help, once again. The explanation that the DB design would have to change in order to enter new data makes a lot of sense.

    Let me ask this question then: would it be even better not to split up the different "items" into different tables? Obviously the fact that the item field is a text field that combines multiple data (the year and the category) violates 1NF. However, would it be better design to have one table that that had the columns: "County Code", "Category Code" (which would link to another table that contained the categories and the units that they are in), "Year", and "Data" that would give the number that corresponds the superkey of county, year, and category? To me, this seems to follow a similar principle that the different years should not be split up into different tables of columns.

    Once again, thank you, you are a lifesaver

    --Evan

  4. #4
    pbaldy's Avatar
    pbaldy is offline Who is John Galt?
    Windows XP Access 2007
    Join Date
    Feb 2010
    Location
    Nevada, USA
    Posts
    22,652
    Yes, I agree. The original table wasn't that bad except for burying the year in the category text. A categories table would also be a good idea.
    Paul (wino moderator)
    MS Access MVP 2007-2019
    www.BaldyWeb.com

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 4
    Last Post: 08-31-2012, 12:55 PM
  2. How do I store a period of time
    By OscarCat in forum Access
    Replies: 2
    Last Post: 06-10-2012, 05:08 PM
  3. Replies: 1
    Last Post: 04-25-2012, 12:36 PM
  4. 0 normal form, 1st normal form....
    By dict in forum Access
    Replies: 2
    Last Post: 03-21-2011, 04:04 PM
  5. capturing the address repeatedly
    By simba in forum Forms
    Replies: 12
    Last Post: 03-24-2010, 11:05 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums