Struggling with analyzing some data and I'm looking for assistance from those that may have traveled this road before. My problem is always taking a complex idea and conveying what I desire clearly to those from whom I seek assistance. So let me do my best.
I have two sets of dependent variables I wish to analyze. I'll first tell you what they are and then draw analogies to hopefully clarify:
Variable One: A collection of 63 medicinal drugs
Variable Two: A collection of 243 different cell lines
Each drug is tested against each of the 243 cell lines to determine how well it works. The resulting output is a number that ranges between 0.01 and 10,000, the lower this number is the better the drug works.
Here's the analogy:
Variable One: A collection of 63 keys.
Variable Two: A collection of 243 locks.
One key can open many locks and is a master key, but is hard to work with. The other can only open a few and works excellently on one or two locks. A low number means the key fits into the lock and turns so easily you cannot even sense any resistance - like butter. The other extreme is a key that cannot even be inserted into the lock. Butter gets a value of 0.01 and doesn't fit is 10,000.
With me so far?
Now, what I want to do is rank how selective each key is for the collection of locks to find the one key that opens the fewest locks (is most specific). I will do so using a series of calculations.
First I want to exclude anything that doesn't fit (10,000). Then I want to take the highest value below that number to represent a key that fits but doesn't turn. My model then will rank where the keys fall between two extremes:
1. I fit into all the locks and open most off them, but for every one the key has to be jiggled and bumped to work. This is the low specificity set and scores close to (but always under) 10,000.
2. I only fit into a few locks and one of them I turn like butter. Very specific key. Fit score close to 0.01.
To accomplish this I have a multi-component formula that goes like this:
A. For each key, find the lock that is the poorest fit. It's value is a cut-off for my fit metric - as mentioned it is typically around 9,000 (really difficult to open).
B. Count the total number of locks that score below this fit metric.
C. Establish gating criteria for keys that work with increasing ease. For this metric I arbitrarily define cut-off values that are 10% or 1% of overall fit. In other words, if a key that fits but doesn't turn has a score of 9,459, then I'm going to create two criteria that have cut-off values of 945.9 and 94.59. A key that scores 945.9 isn't as selective and is more difficult to use on my favorite lock than one that scores 94.59
D. Count the number of locks that fall under these two gating criteria.
E. Finally divide the number of locks that fall within each criteria by the total number of locks that the key fits into. A score of 1.0 means the key fits all locks and can open every single one (a master key). A key that scores 0.01 would only fit into one or two locks and open only one.
If you have remained patiently with me, thank you. Now comes the question - I have my list of keys, I have my list of locks, and I have measured how well each works. I want to group my keys so that I can apply my metric to each key and rank them. I have already done this with a form, but that form can only show me the metric for one key at a time and it looks like this:
But now what I want is to have all the keys listed as a single object so I can run additional calculations. I've run into a brick wall as far as my knowledge is concerned and have no idea if I want to use a form, report or query to do this. I've tried a query and couldn't get beyond counting the max value. That query looks like this:
I know I'm not making friends with such a long post and deserve no compassion. But I would be so very very thankful if you could help me right this sinking ship.
Gracias!