What is the most normal way to store datasets repeatedly measured over time?

**EvanRosenlieb** · 12-21-2012, 02:04 PM

Hello all,

I am currently working on project that is dealing with census data. We received a table from our research associates, which has data from three census years. It is currently organized in an awkward fashion with the following columns:

FIPS, a numeric interval field, which is a unique identifier of county

The "Item" field, a text field that describes the data that row contains, for instance "number of Households 1990", "Number of Households 2000", "Population 1990", "Median Family income 1980" et cetera.

The "Data" field, which is a numeric field that contains a number that relates to whatever category the item field describes.

Obviously, at the very least, this table needs to be decomposed so that there is a table for each category, i.e. a table for Population by county, a table for median family income per county. This solved many problems, not the least of which is that the "Data" column no longer contains numbers that represent different units.

However, I am little more confused about how the year by year data should be saved. It seems that there are multiple ways of organizing the data that do not explicitly violate normality.

1.) There could be one "Population by county by year" table, that would contain the columns "County Code", "Year", and "Population". The key would be the combination of county and year, with one additional informational column which is "population". This would be the easiest to do initially.

2.) There could be another "Population by county by year" table, but instead would have columns, "County Code", "Pop 1980", "Pop 1990", and "Pop 2000". I don't think that this violates normality, as far as I can figure, but it still makes a little bit nervous because if we wanted to add 2010 census data later then we would have to add to data to fields that already exist instead of adding new fields.

3.) There could be three Population tables: "Population by county 1980", "Population by county 1990", "Population by county 2000", each one simply having the columns "County Code" and "Population". My general instincts tell me that this may be the most proper way, as I want to be able to decompose when there isn't an explicit reason not too. However, functionally, I can't really think of how this would actually operate differently than option one.

What do you more-experienced-than-me people think? Am I correct that none of these violate normality? If none of them do, is there a reason why one of them would be the most proper/robust/flexible?

Thank you for your help ahead of time!

**pbaldy** · 12-21-2012, 03:03 PM

Deep subject, but I'll dive in and start. 2 & 3 are definitely not normalized:

http://www.r937.com/Relational.html

The discussion around Figure 7 is relevant, particularly to 2. 3 is basically the same but with tables for each year instead of fields. What are you going to do when you want to compare across years (population growth type of thing)? With the data in one table, you can isolate a particular year or compare years easily. The fact that a new batch of data would require design changes to the database is a tip-off that it's not normalized, and both 2 & 3 would require them.

**EvanRosenlieb** · 12-21-2012, 03:12 PM

Pbaldy,

Thanks for the help, once again. The explanation that the DB design would have to change in order to enter new data makes a lot of sense.

Let me ask this question then: would it be even better not to split up the different "items" into different tables? Obviously the fact that the item field is a text field that combines multiple data (the year and the category) violates 1NF. However, would it be better design to have one table that that had the columns: "County Code", "Category Code" (which would link to another table that contained the categories and the units that they are in), "Year", and "Data" that would give the number that corresponds the superkey of county, year, and category? To me, this seems to follow a similar principle that the different years should not be split up into different tables of columns.

Once again, thank you, you are a lifesaver

--Evan

**pbaldy** · 12-21-2012, 03:27 PM

Yes, I agree. The original table wasn't that bad except for burying the year in the category text. A categories table would also be a good idea.

What is the most normal way to store datasets repeatedly measured over time?

Thread Tools

What is the most normal way to store datasets repeatedly measured over time?

Similar Threads

my database backend is replicating itself repeatedly and i don't know why

How do I store a period of time

A button to store current grade in a dated field to show progress over time.

0 normal form, 1st normal form....

capturing the address repeatedly

Tags for this Thread

Posting Permissions