Fish Biologist

**Nick F** · 10-05-2011, 03:21 PM

Hello,

Newbie here stumbling my way through building a database to store our data. We have a lot of stream data that we have collected. I have set up table to enter general data for our area streams. I am building a form to populate that table. Each stream has a unique identifier known as its "Kittle Number" which is an alpha numeric code. I have set this as my primary key and have it as a relationship to my fish sampling table. On the form, I would like to place a command button that will automatically open the fish sampling form and fill in the Kittle Number from the active record.

Thanks for the help,
Nick

**boblarson** · 10-05-2011, 03:29 PM

I know someone on another forum who has done something similar in the past and I think it would be great to get them here to help. Anyone else can toss in a suggestion while I go PM them and see if they'll come assist.

**CraigDolphin** · 10-05-2011, 04:02 PM

Hi Nick,

I guess I must be the person Bob meant since I got his PM

I'm a biologist working in WA, USA. I have some experience with fish sampling databases, however I am unfamiliar with 'kittle' numbers. Provided its a unique identifier for streams then we can work with it. However, previous experience has taught me to be highly sceptical of 'unique stream identification codes'. I have found that the 'uniqueness' almost always end up breaking down when the river/stream changes course during winter floods etc. Or because there was insufficient control over allocation of new codes and codes ended up being accidentally issued twice for a stream, or some such. Or because some unforeseen combination of circumstance ends up breaking the way the codes are generated for some streams. My feeling is that it's almost always a better idea to list those kinds of codes as attribute information, while using an autonumber primary key to act as the 'true' unique identifier used within the database table structure itself.

I guess I have a few questions. Most of which would be best answered by showing me your table structure and table relationships. If you can attach a zipped copy of the database that would be best.

The first question is do you only ever sample one location in each stream, or is it possible that there will be more than one sampling location in a given stream? If so, do you need to report on your data at the specific site level, or only at the stream level of detail?

Do you only sample a site on one occassion, or will you be re-sampling each site over time? Will you always resample the same site in a stream over time, or might you choose to sample a different location on the stream for some reason?

What kind of data are you collecting specifically? i.e., fish counts, fish sizes, fish diversity, fish capture cpue, .... do you collect data at the individual fish level, at the level of groups of fish individuals (e.g., summary data for each species), or both?

Getting the table design and relationships right is the absolutely most important part of designing your database.

Once we ensure your table design is correct, then that will help us figure out which form you will be placing the 'command button' on, and whether that is, in fact, the correct approach to achieve what you want. Off hand, I suspect that a form/subform master/child key field arrangement might be a better approach. Or perhaps a cascading combo box. Or perhaps storing the sampling location identifier that is linked to the kittle number through the table structure will end up being the best solution, but this all depends on the specifics of your sampling program.

**Nick F** · 10-06-2011, 09:03 AM

Thanks for the quick reply. I need to do a little cleaning up and additions/editing from things that you said and that I have read. I will post a zipped copy as soon as I can.

Thanks again for the help
Nick

**Nick F** · 10-06-2011, 02:31 PM

Hello,

I am having problems attaching file as it is still 4mb after zipping. It is becoming bigger by the hour it seems as our biologists and myself remember things to add. I will keep trying but will try to answer your questions now.

To answer your questions.

The first question is do you only ever sample one location in each stream, or is it possible that there will be more than one sampling location in a given stream? If so, do you need to report on your data at the specific site level, or only at the stream level of detail?
We typically sample more than one station in a stream. Though we do have index stations on some streams that are the only sampling area on that stream. Most of our reports are written at a stream level. Though there are times that we are looking at sit specific detail. For example, fish populations at a site located above a culvert.

Do you only sample a site on one occassion, or will you be re-sampling each site over time? Will you always resample the same site in a stream over time, or might you choose to sample a different location on the stream for some reason?
There is a high probability that we would sample a site again in the future. Again there are times where we add new sites that may or may not be visited again in the future. We might add another site if we don't feel the past survey was comprehensive enough. Or we might sample a site at the request of another agency.

What kind of data are you collecting specifically? i.e., fish counts, fish sizes, fish diversity, fish capture cpue, .... do you collect data at the individual fish level, at the level of groups of fish individuals (e.g., summary data for each species), or both?
All kinds. Individual data on gamefish species or species of interest. Though if there is a lot, we do subsample. We do counts on non-game species. We typically calculate a CPUE for reports. Lengths. Weights. Scale samples. Observe clips. Give clips.

Kittle Number.
This should absolutely be a unique identifier as it is the accepted way to refer to a stream between agencies. Example. S01-06-01. Read from end to beginning. This would be tributary 01 on tributary 06. Tributary 06 is a tributary to stream 01. Stream 01 ends in S which in this case is Lake Superior. They go through a detailed process to be assigned a Kittle Number since it is the common stream language between agencies.

If you open the relationship. I start at the stream level. Each sample is connected to the stream via a station # which in this case is River Mile to the nearest 0.1. We realize and have experienced streams changing. Therefore we are going to decide on a year which all river miles will be calculated. Future surveys will have to decide location based on this map. Shouldn't be a problem since most surveys are tied to a permanent landscape feature (road, trail,etc.)

From this, we have a multitude of data entry options. Two basic categories. Fish and Temperature. I have made a table for each fish sampling technique. Off to the right I have a bunch of data tables that can be used with each sampling technique but don't know relate them to each sampling technique. The temperature date refers to our temp loggers. We place loggers at a station for three years and try to do at least one fish survey during that time period. This data will be imported from excel.

If you have any other questions. Shoot them my way. If I think of further useful info, I will post.

Thanks
Nick

**Nick F** · 10-07-2011, 12:22 PM

Hello,
I still have my previous question but you are correct in that I was getting ahead of myself. As I examined my database closer, I realized I had a lot of work to do on the tables and relationships. The database is too big to upload even zipped so I am posting a screenshot of the relationship table so far. I am a bit stuck at this point on how to relate the datasheets to each sampling technique. For example, the individual fish data sheet can be used with each of nine sampling techniques. It is important to know what specific sampling technique (which includes date and specfic site) was used to catch each individual fish. Also a bit nervous about the temperature sampling. The data will be imported from excel. The excel file has three fields (Date, Time, Temp.) so when I bring them into Access, I want a unique ID to be assigned to each record and each record to be assigned the correct logger ID automatically. Any advice/guidance would be much appreciated.

Cheers
Nick

**CraigDolphin** · 10-10-2011, 05:06 PM

Sorry. Got caught up at work and haven't had a chance to reply until now.

See the attached table relationship diagram and check it out. I'm heading home for the day right now but I'll be back tomorrow and try to explain the concepts a bit more. This is just a quick example of how you should be trying to organize your data. I haven't looked at your temp data but that would likely be linked to the SamplingEvents table if they occur at the same time, or to the SamplingSites table if they exist as a continuous record throughout the year, say.

**CraigDolphin** · 10-11-2011, 11:28 AM

Ok, here's another example for you to look at. It's a variant on the first with the addition of a table to contain logged continuous temperature data linked to a sampling site.

With either situation, you are the best arbiter as to which is most appropriate for your program. And it could well be that neither is 100%appropriate. But in my experience, something pretty close to this is likely going to be what you need.

Streams, Reaches, and Sampling Sites

First up: the 'Streams' table was named based on your statement that you will be sampling streams. But I imagine that potentially you could be sampling lakes, rivers, etc. That's why there's a difference in my first example and the second in how the Streams, Reaches, and SamplingSites tables are related to each other.

In the first example, the structure assumes that you will always and only be sampling in Streams. Streams always have at least one 'reach', and sites can always be linked to a particular reach.

The advantage to this is that you only need to store the reach for each site, and you will know which stream the site is in by querying which stream the reach is associated with whenever you need the information.

The disadvantage is that you are constrained by the table structure if any of the assumptions prove to be false.

The table relationships in the second example assumes that each sampling site occurs in a stream or a reach or both. I have set the StreamID field in the SamplingSites table to be required, but not the ReachID. This means that all sites must be related to a Stream, but that a Reach is optional. Thus, if it turns out that some of your 'streams' turn out to be lakes or ponds where a 'reach' is not an applicable concept, you can leave this out.

Now, if you are absolutely 100% certain you will never sample a lake, pond, river, etc, then the naming of the 'Streams' table is fine. But if there's any chance that this could happen, then it would be better to rename the 'Streams' table as something like 'Waterbodies' or somesuch, with a WaterbodyID instead of StreamID etc.

The reasons for this are to prevent confusion for anyone else who may end up working with your database.

Temperature

The temperature data I've linked to a SamplingSiteID from the SamplingSite table. This is because I am assuming that the temperature data is collected over a longer period of time than the fish sampling events themselves, but it always collected at the site where you are collecting the fish data too. Usually temperature loggers are deployed for weeks/months.

If it turns out that you are only collecting the temperature data while you are actively fishing, then it would be more appropriate to link the data to the SamplingEvent table.

Or, if you are collecting temperature data at an entirely different location than you sample for fish, then it may be more appropriate to link the data to the Streams or Reaches table, depending on whichever is most appropriate.

Kittle Numbers

Ideally you would imagine that defining a stream identifier by reference to it's order as a tributary would be infallible. But experience has taught me to be sceptical. Since I know nothing of your location or work environment, let me give you a couple of examples that could confound your system.

Consider a tributary of a tributary of a main river. The smaller trib feeds into the larger trib about 50 yards upstream from its own confluence with the main river. During winter, a rock slide from the surrounding hills blocks the smaller tribs' path to the larger trib, and as a consequence, the smaller trib undergoes an avulsion and finds a new path that happens to end up in it discharging directly into the main river: bypassing the larger trib altogether.

Now, which kittle number should the small trib be known by? The order of the trib has changed!

I should note that this has actually happened more than once in our local watershed.

Also, how does your kittle number deal with distributary channels of rivers/streams?

My advice, store the kittle number, make use of it....but never ever depend on it to maintain the integrity of your data structure. That's why I always use an autonumber field as the primary key of all my tables.

SamplingEvents, Methods, Gear, Fishes and Samples

In both examples this bit is the same.

Each sampling event can use a different gear type, and a different sampling method. For example, you could use a 10-ft seine net (gear type) to conduct a 'beach seine' (method). Or you could use a 20-ft seine net to conduct the same sampling method.

In your example, you showed lots and lots of tables: one for each method. Notice how all the table list essentially the same types of information over and over again? That's often a big clue that you aren't organising your data correctly (normalizing your tables).

In my examples, each different method is just a row in the SamplingMethods table. The specifics of the date of the sample is stored in the related row of the SamplingEvent table. In cases where you are passively sampling using a scretrap/incline plane trap/fyke net etc, the duration of the event is how you determine the effort portion of your sampling event CPUE calculations.

The Fishes table stores a row for each individual and/or group of fishes that you collect data for. By definition, a count of 1 indicates the record pertains to an individual fish, while counts higher than 1 indicates the record represents a group of fish. Some of the attribute fields I've included in the fishes table are really intended to be applicable to individuals, not group. A more correct but slightly more abstract method would be to have another couple of tables to store attribute information for the fishes.

In that situation, you would have a table Metric names (e.g., Individual_FishLength_mm, Gender, GroupMaxLength_mm, GroupMinLength_mm, etc etc)

Metrics
MetricID (auto, pk)
MetricName (text)

And the the actual values and applicable metrics for each record in the Fishes table would be strored in a table named FishMetrics
FishMetricID (auto, pk)
FishID
MetricID
MetricValue

Depending on the diversity of metrics that you gather about each fish, though, I've found that denormalizing the data by keeping the metrics as additional fields in the Fishes table, instead of the tables described above, can be preferable.

On the other hand, I have never regretted keeping biological samples (DNA, otolith, CWT, etc) in a separate table. This is one situation where I find screwups happen frequently enough to have you pulling your hair out if you insist on providing a field for each type of sample that you think will be taken. Invariably you end up with duplicated samples, or samples not taken that should be, or sample vials re-used by accident, or any number of other problems. The structure I've shown allows you to check for duplicate sample identifiers across all types of samples very easily, and to notify the user about the problem during data entry. The alternative approach makes it necessary to make changes to field forms and to your code each time a new sample type field is added to the table. My approach just means you add the new type of sample as a new row in the Sample types table and that's all you need to do.

Note, I haven't shown the SampleTypes table in either example but it would simply be:
SampleTypes
SampleTypeID (auto,pk)
SampleType (text)

And the pk from that table would link to the same-named fk field in the FishSamples table.

Other Thoughts

Once you have a normalized table structure, you will be in good shape to create a functional and easy to maintain database.

Tables Generally

If you're a heavy/power excel user will you need to change the way you think about data. Tables should generally be narrow and tall (few fields, many rows) instead of short and wide (many fields and few records). Read up on the topic of data normalization and get to grips with it (dry reading, but critical for success).

Queries in Access

Queries are amazing for combining information from several tables whenever you need it arranged in a more familiar flat-file/spreadsheet arrangement. To make queries work the way you want: learn about the difference between innner joins and outer joins when constructing a query in Access. Outer joins (also called left or right joins) permit you to retrieve all records from one table, and to show matching records from the other table if they exist, or nulls if they don't. Inner joins only show you the records from both tables where there is a match. Thus an inner join would not show a sampling event with no fish caught, while an outer join would show all events, whether a fish was caught or not.

This is probably the most frustrating thing for excel users to experience. And by default, all joins in access are created as innner joins. You need to double click on the join line while viewing the query in design view to change the join type.

Forms

There are two major things to learn about forms that were not intuitive for me.

First, forms can have subforms as 'windows' on them (but only when they are in single form view). A form is bound to a table or a query. A subform is usually bound to a different but related table/query. A subform is linked to a form by a field that the two bound tables have in common. Thus, if the main form is bound to the SamplingEvents table, and the subform is bound to the Fishes table, the form and subform are linked by the SamplingEventID field. The beauty of this is that whenever you input a new fish into the subform, the database will automatically populate the SamplingEventID field value to be the same as the current SamplingEventID in the main form. And whenever you open the main form to a particular event, the subform will automatically show only those records from the fishes table that have the correct SamplingEventID.

(I just noticed I mis-typed the SamplingEventID field in the Fishes table, that field name should be the same as in the SamplingEvents table)

The other thing to learn is that there are triggers called events built-in to many form objects. Many of these events occur in a predictable order. You can make use of these triggers to perform tasks in code when a user does something on the form.

For example, if you wish for something to happen when a user selects a particular value from a drop-down/combo control, then you could use the after_update event of that control to run some vba code or a macro that you've created to handle that situation.

Or, you could use the on_click event of a command button to run code to allow the user to import your water temperature data from an excel spreadsheet file and then automatically add it to the TemperatureLog table using a vba-generated append query along with the current SamplingSiteID.

VBA Coding

This is, essentially, programming using the VBA coding language. It's not that hard to pick up and making judicious use of it can take the pain out the user-experience with your database.

One thing to learn is that the code is stroed in a database obkject called a module. Some modules are linked to specific forms, and some are not linked to forms at all (public modules). The difference is that code functions/subs in public forms are available to be run/called from anywhere else in the database, while code in form-linked modules can only be called from within that same module.

Hopefully this information is helpful.

**Nick F** · 10-12-2011, 07:13 AM

Hi Craig,

Thanks for the reply!! A huge help. I now should be able to solve my relationship questions. Also thanks for the added information on forms, queries, and VBA. Before I start building the forms, I am going to do the reading/research you suggested. I am going to mark this thread as solved though I am sure I will have questions in the future. I hope you won't mind me firing a question in the future? Thanks again.

Cheers
Nick

**CraigDolphin** · 10-12-2011, 11:52 AM

Glad I could help. I know I found Access to be a very foreign and non-intuitive program when I first started using it after years of Excel. There's a steep learning curve, but there's an amazing ability to handle vast amounts of data that you just can't achieve in Excel and it's well worth the effort to learn. And of course, if you have any other specific questions, feel free to ask.

Fish Biologist

Thread Tools

Fish Biologist

Fish Database

My Relationship Table

Posting Permissions