ACCESS Program

**johnseito** · 07-31-2014, 06:59 PM

Originally Posted by John_G

Hi -

Referring to your post #12 - ("more like this!")

That makes life a whole lot simpler! A million rows sounds like a lot, but isn't really in the overall scheme of things. One crosstab query could generate your diagram as above for the whole dataset. (Not as pretty of course, and how long it would take is another issue entirely).

But with one ID - Event per row, it's easy to come up with counts for various analyses and decisions.

John

you mean generate the diagram I had in pivot table ? I presume it will take very long if is done in query.
Actually is not just one ID and one Event, one ID can have many events (more than one, two, three events).

**John_G** · 07-31-2014, 09:36 PM

What data is in each row of the input (source) table?

Is it ONE ID and ONLY ONE event per row, or is it ONE ID and MORE THAN ONE event? You have given different answers in different places.
If you have MORE THAN ONE event in each row, then a lot of extra work (= processing time) may be required to reformat the data first.

Which is it?

You have described the process you would need reasonably well in the previous post, and it is more or less what I would do to select a sample, given the percentage required.

It would help if you could post a screenshot of the table description, showing what the fields are, and we can go from there.

John

**johnseito** · 07-31-2014, 09:59 PM

What data is in each row of the input (source) table?

Hi John,

Nice to see your message again. We are only focusing on two columns.
One is ID and the other is Event. The other columns although in the table, are there and can be
anything but I don't think it will affect our program in is calculation for the final result of the outcome.

Is it ONE ID and ONLY ONE event per row, or is it ONE ID and MORE THAN ONE event?

We are only focusing on two columns so it should be ONE ID and ONLY ONE event per row.

and it is more or less what I would do to select a sample, given the percentage required.

yes you are correct, but meeting all the rules.

Let me know if other clarification could be made ! Thanks ! :-)

**johnseito** · 07-31-2014, 10:17 PM

then a lot of extra work (= processing time) may be required to reformat the data first.

The event is in a single column and the ID is in a single column.
So that is one event per row.

I believe even if the events are in one row per event, it would still take a lot of processing time because you could have duplicated events
in different rows for the same ID.

For example, you have a total of ten rows.

row 2 you have ID A, you have event 1a, row 5 you have ID A, you can have event 1a,
row 10, you have ID A, you have event 1a.

So total 10 rows, for ID A, you have three duplicated Event.

**johnseito** · 07-31-2014, 10:40 PM

Actually is not just one ID and one Event, one ID can have many events (more than one, two, three events).

Let make it clear, what I am saying here is different rows, not columns. One ID can have many and duplicated events in different rows. We are only focusing on two columns.

**johnseito** · 08-01-2014, 07:27 AM

I will give you a screen shot of the table in example. I'll do it later today. Thanks for your patience. :-)

**John_G** · 08-01-2014, 07:52 AM

Hi -

Glad we got that all straightened out. Access queries will do a lot of what you need, including duplicate removal if the query is set up properly, and if the tables are properly indexed they can be quite efficient time-wise.

The difficulty you will encounter will be in meeting the criteria of All ID and All Events being represented. The ID is easy enough - either it is in the source table or it isn't, but ensuring that you get at least one of each event is probably not easy.

How many different ID's and different events do you have - you might be able to do something with that.

John

**orange** · 08-01-2014, 12:49 PM

johnseito,

See if this helps or "fits" your situation. I'm finding your post is dealing with intangibles and or theory -- nothing concrete.
If you can show us some real terms of what your trying to do, then perhaps more focused responses are possible.

The link I'm suggesting deals with Random Top N from a group.

Hope the link is helpful.

**johnseito** · 08-01-2014, 02:11 PM

Hi Orange,

Thanks for answering.

How do you know that this is not concrete I provided examples. I will look at your link but your link is top N % however, is more than that because the rules need to be met.

**johnseito** · 08-02-2014, 04:38 AM

Orange

I thought I had provided well detailed and concret examples. If you think somewhere that I didn't then let me know and I will be as clear and as detailed and concrete as possible.

I think your link with the top N and randomized is good but the rule is to select top N randomized with weighted for every ID and their associated events.

The end goal is if a user enter a percent, in this case 21.24% that 21.24% (and it has to be an exact number rounding up) is of the entire pool from the table meeting every ID and event that is weighted according to how many ID and how many event it has. All ID and all events is in our final pool when the export is given to our user.

Again thanks for the link, it is helpful and I am going to look more into it. I think your link provide partial solution with more work needed and there may be obstacle, just my opinion.

Thanks. :-)

**johnseito** · 08-02-2014, 04:54 AM

I made this thread to see how difficult it would be to do this in access and how efficient (speed in minutes) it is to do this in access. I believe query in itself can't do this.

I haven't implemented this in access at all but would like to, however I think it will be difficult and there will be challenges as I built it as I am not an expert in access, although I wish I was. :-)

Not sure what you guys think, and how efficient and how difficult it would be to do this in access. Is it even possible ? I am thinking that you may work with one example, let's say 500 in a pool from a table and it works fine, now the program is given 5000, or 50000. It may not work for the 5000 or 50000 but it worked perfectly for the tested 500 example.

**John_G** · 08-02-2014, 05:59 PM

Hi -

Some more thought on what you want to do. First, I think Access will do what you want to do. Queries are actually quite efficient if written properly. I did a quick test on a table I had of almost 600,000 records - to do a count of the number of occurrences of each value in a field took about 5 seconds, and that was over a network. So it's not bad.

A question about your sampling requirement. In the data tables you want to sample, is every ID associated with every possible event at least once, or will some events be associated with only a few ID's? In terms of how to solve your problem, it's not a major issue, but the answer to that question will help clarify your criteria.

Above, in post #19, you stated that you can't have duplicates in the sample, i.e.no duplicate ID - event combinations. Did I understand that correctly? If that is correct, it will make a difference in how to proceed.

John

**johnseito** · 08-02-2014, 08:39 PM

is every ID associated with every possible event at least once, or will some events be associated with only a few ID's?

No, not every ID will have every possible event (referring to all unique IDs), For example there are 5 unique events, some ID will have only 3 of those events,
and other ID might have all the events and some might just have 2.

Above, in post #19, you stated that you can't have duplicates

Looks like I said you could have duplicate events per ID but in different rows in a table.
You can also have same event for one ID that other ID can have too.

i.e.no duplicate ID - event combinations.

It can have duplicate ID - event combinations. ID can appear many times in the table in different rows,
and events for that ID can also appear many times (same or different events) in different rows.

**John_G** · 08-03-2014, 07:26 PM

Hi -

It's a quiet Sunday afternoon, so I did some experimenting with this.

I made a simple table, with three fields:

ID - Text Type
Issue - Numeric Integer
Sample_Data - Numeric Single
Created Indexes on ID and Issue.

For the test, I assumed 26 different ID's (A-Z) , and 50 different events (1-50).
Using Visual Basic (VBA) I populated this table with 1,000,000 rows, using the random number function to generate the ID, Issue and Sample_Data values.

Then using queries and VBA, I created a sample data set in another table, using 32.257% as the selection percentage, and adhering to the requirements of all ID's, and all events as you outlined in post #15. But I added to that by taking 32.257% of all the records in each combination of ID + Issue. That automatically looks after the weighting requirement.

The time required to create the 32.257% sample data set? About 1 minute, 15 seconds!

My conclusion is that Access will meet your requirements, better than I expected it might.

HTH

John

**johnseito** · 08-03-2014, 10:23 PM

cool, that is pretty good.

However, could I see the program and test other percentages too ? Thanks !!

ACCESS Program

Thread Tools

Similar Threads

Access Program from Win xp to win 7

Old Access Program issues

Help: Create a program access

Program a 30-day trial into my Access Program?

Is Access the right program for me?

Posting Permissions