Hello all,
I am currently working on project that is dealing with census data. We received a table from our research associates, which has data from three census years. It is currently organized in an awkward fashion with the following columns:
FIPS, a numeric interval field, which is a unique identifier of county
The "Item" field, a text field that describes the data that row contains, for instance "number of Households 1990", "Number of Households 2000", "Population 1990", "Median Family income 1980" et cetera.
The "Data" field, which is a numeric field that contains a number that relates to whatever category the item field describes.
Obviously, at the very least, this table needs to be decomposed so that there is a table for each category, i.e. a table for Population by county, a table for median family income per county. This solved many problems, not the least of which is that the "Data" column no longer contains numbers that represent different units.
However, I am little more confused about how the year by year data should be saved. It seems that there are multiple ways of organizing the data that do not explicitly violate normality.
1.) There could be one "Population by county by year" table, that would contain the columns "County Code", "Year", and "Population". The key would be the combination of county and year, with one additional informational column which is "population". This would be the easiest to do initially.
2.) There could be another "Population by county by year" table, but instead would have columns, "County Code", "Pop 1980", "Pop 1990", and "Pop 2000". I don't think that this violates normality, as far as I can figure, but it still makes a little bit nervous because if we wanted to add 2010 census data later then we would have to add to data to fields that already exist instead of adding new fields.
3.) There could be three Population tables: "Population by county 1980", "Population by county 1990", "Population by county 2000", each one simply having the columns "County Code" and "Population". My general instincts tell me that this may be the most proper way, as I want to be able to decompose when there isn't an explicit reason not too. However, functionally, I can't really think of how this would actually operate differently than option one.
What do you more-experienced-than-me people think? Am I correct that none of these violate normality? If none of them do, is there a reason why one of them would be the most proper/robust/flexible?
Thank you for your help ahead of time!