Page 1 of 2 12 LastLast
Results 1 to 15 of 25
  1. #1
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419

    HAVING id IN (SELECT TOP 10 PERCENT ... Question !!

    Hi I have a SQL query that I don't quite understand why is producing what is producing.



    Here is the example of the query statement:

    Code:
    SELECT tableEXAMPLE.ID, tableEXAMPLE.B, tableEXAMPLE.C, tableEXAMPLE.D, tableEXAMPLE.E
    FROM tableEXAMPLE
    GROUP BY tableEXAMPLE.ID, tableEXAMPLE.B, tableEXAMPLE.C, tableEXAMPLE.D, tableEXAMPLE.E
    HAVING  ((
                        (tableEXAMPLE.ID) IN (SELECT TOP 10 PERCENT ID FROM tableEXAMPLE as q
                        WHERE q.C = tableEXAMPLE.C and t.D = tableEXAMPLE.D ORDER BY id)));

    tableEXAMPLE has 1304 records, ID is a unique identifier for all 1304 records. So GROUP BY is
    not condensing any of the 1304 records. Field C and D can have duplicated records. After running
    the SQL statement I got a total of 195 records as it's result.



    My question is why and how the SQL statement on top is different than this one below,
    I get a result of 131 records which is 10% of 1304.

    How is the top one giving a very different result (195 records) than the bottom, I don't understand even
    after I did my own investigation. Could someone help explain this to me. Thanks, appreciate it !!! :-)

    Code:
    SELECT TOP 10 PERCENT tableEXAMPLE.ID, tableEXAMPLE.B, tableEXAMPLE.C, tableEXAMPLE.D, tableEXAMPLE.E
    FROM tableEXAMPLE
    GROUP BY tableEXAMPLE.ID, tableEXAMPLE.B, tableEXAMPLE.C, tableEXAMPLE.D, tableEXAMPLE.E

  2. #2
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,931
    Why use GROUP BY if no aggregation?

    Not really understanding the nested SELECT. Baffled how it even works at all. What is the alias t table? Why using table aliases?
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  3. #3
    John_G is offline VIP
    Windows XP Access 2003
    Join Date
    Oct 2011
    Location
    Ottawa, ON (area)
    Posts
    2,615
    Hi -

    Your first example shouldn't work at all - it should give you an error; what is "t" in t.D?

    Like June7, I don't quite see the logic there, but the higher-than-expected number of resulting rows is probaly due to the fact that there are duplications in the data.

    John

  4. #4
    lfpm062010 is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Oct 2013
    Location
    US
    Posts
    415
    It like what you say. The field C and D could have duplicate records. So, when you use top 10 on match those criteria. You are selecting up to 10 % duplicate records for one of C and D. My guess those extra 64 records will be the top 10 matching C and D (for each unique ID).

  5. #5
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419
    Quote Originally Posted by June7 View Post
    Why use GROUP BY if no aggregation?
    Not mine, I am just trying to understand someone's logic in their code.

    I guess GROUP BY so they could then using having and nested SELECT statement.



    Not really understanding the nested SELECT. Baffled how it even works at all. What is the alias t table? Why using table aliases?
    I would say the t alias, one is calling one of the same tables, that is the SELECT TOP 10 PERCENT ID FROM, and one that is from the outer SELECT
    FROM.

    That is why there is the alias t. using the table twice.


    I actually made a minor mistake with the code.

    Code:
    WHERE q.C = tableEXAMPLE.C and t.D = tableEXAMPLE.D ORDER BY id)));
    
    should be q.D instead

    Code:
    WHERE q.C = tableEXAMPLE.C and q.D = tableEXAMPLE.D ORDER BY id)));

  6. #6
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419
    Quote Originally Posted by John_G View Post
    Hi -

    Your first example shouldn't work at all - it should give you an error; what is "t" in t.D?

    Like June7, I don't quite see the logic there, but the higher-than-expected number of resulting rows is probaly due to the fact that there are duplications in the data.

    John

    Hi John,

    Thanks for your response and help in looking at this.

    But I actually made a minor mistake with the code.

    Code:
    WHERE q.C = tableEXAMPLE.C and t.D = tableEXAMPLE.D ORDER BY id)));
    
    should be q.D instead

    Code:
    WHERE q.C = tableEXAMPLE.C and q.D = tableEXAMPLE.D ORDER BY id)));
    Thanks !!!

    I don't understand what that code is trying to achieve.

  7. #7
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419
    Quote Originally Posted by lfpm062010 View Post
    It like what you say. The field C and D could have duplicate records. So, when you use top 10 on match those criteria. You are selecting up to 10 % duplicate records for one of C and D. My guess those extra 64 records will be the top 10 matching C and D (for each unique ID).

    Ok, I put the information on excel and tried to replicate what the SQL statement is doing.

    1304 records - 10% 1304 = about 131.

    When I replicated it in EXCEL of the same logic behind the SQL statement, I got 138 records.
    I don't know what is the logic in the SQL statement to give a result of 195.

  8. #8
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,931
    Can use nested subqueries without having to make GROUP BY query.

    If ID is truly unique in tableEXAMPLE, then I also do not understand the results of the nested query. If you want to provide db for analysis, follow instructions at bottom of my post.

    However, if the TOP 10 PERCENT gives what you want, why bother with the nesting?
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  9. #9
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419

    Having Id in SELECT

    Quote Originally Posted by June7 View Post
    Can use nested subqueries without having to make GROUP BY query.
    Ok, good to know.

    If ID is truly unique in tableEXAMPLE, then I also do not understand the results of the nested query. If you want to provide db for analysis, follow instructions at bottom of my post.
    exDATABASE.zip
    I attached an example. A is unique, B and C have duplicate. I am not sure how it gets 195 items and
    without the having ID in, just select the % (Query - 131) has 131 items). Not sure what is the difference.
    Hope we could figure it out. thanks !!! :-)


    However, if the TOP 10 PERCENT gives what you want, why bother with the nesting?
    I don't think is just 10%, is 10% with a condition.

    Thanks !!! :-)

  10. #10
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,931
    The difference in results is due to the WHERE clause in the nested query - which is not in the Query-131 object. As already pointed out, this causes duplications of data. Remove the table alias and WHERE clause from the inner query and the results will be same as Query-131. Why bother with nested query anyway? What are you really trying to accomplish with it?

    Note the same result in Query-131 without GROUP BY.

    SELECT TOP 10 PERCENT * FROM exampleTABLE;
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  11. #11
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419
    What are you really trying to accomplish with it?


    I am not sure what is trying to accomplish too, that is what I am trying to find out, and hence this thread.
    I am puzzle why another SELECT statement is used in the HAVING CLAUSE, why it needs that
    and that our items are more (you are correct, our result is more because of the where clause)
    but why is that even necessary, maybe the SELECT is necessary to get what we want the 195 items and without it, just the 10%
    is just wrong.


    Do you know in the code
    exampleTABLE as t and when WHERE t.B = exampleTABLE.B and t.C = exampleTABLE.C order by id

    is the t table (exampleTABLE) within the SELECT statement matching to the first table from the select statement (exampleTABLE) ?
    not sure how that logic with the SQL statement is flow.

    So we have the first SELECT, it GROUP BY all the fields however because id is unique, the group by doesn't take effect.
    so we still have 1304 items.

    these items from this table are then matched to the same table (alias) in field B, and C.

    My questions is how is the logic flow:
    From the alias table exampleTABLE as t, top 10% of it is retrieved, and then it matched it to
    the first exampleTABLE of the 1304 records ?

    Let's just say that field B is for example the title of a book, and C are the count of that book and id is unique, and field A is unique as
    well as it is the identity of each of the books (1304 of them).

  12. #12
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,931
    I certainly don't know what the developer's intent was. Several of us have attempted to interpret the results and identify how they are generated. Why the query was built, we can't know. Is it used in a report? If it isn't used anywhere, then perhaps the developer was just exploring and analyzing and left the query in place even after the research was completed.

    If you don't know the reason the query was built, ignore it or delete. Especially if it returns dataset you have no use for.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  13. #13
    johnseito is offline Competent Performer
    Windows 7 64bit Access 2010 64bit
    Join Date
    Aug 2013
    Posts
    419
    If you don't know the reason the query was built, ignore it or delete. Especially if it returns dataset you have no use for.


    Hi June, I like to follow your suggestion and I think is a great advice. However, the thing is that when I can't figure out something, because I hope to learn from it,
    it really bugs me and then I just don't take SQL or the next problem as seriously and I am afraid I can become a slack off.

    actually this is not just the sole SQL statement for the project, is just one piece of it. I know what the general goal is,
    I am just looking at it step by step to see how and what the developer is doing to achieve the goal. This is the statement that I got stuck in.

    one thing that really puzzle me is how it get 195 when I think the most it should have gotten is 138.

    The reason I came up with 138 is because after I took out the WHERE clause below:

    WHERE t.B =exampleTABLE.B and t.C = exampleTABLE.C Order by id


    yes I got a total line item of 131. Then I looked at the table "exampleTABLE" (total of 1304 line items)
    and compare it to the 131 result, I only see 7 result that matches to the 131 result of field B and field C
    for the 131 (result) and 1304 (original). And we are using the WHERE CLAUSE where t.B = exampleTABLE.B and t.C = exampleTABLE.C.
    So if we are doing these matches, I think the result should only be 138 instead.

    Thanks and I appreciate you looking at it.
    If we can't find the what, how, and why, then I will just move on.
    Thanks for your advice. :-)






  14. #14
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,931
    It's really a very weird query and I am not able to fully understand it. If the nested query is run independently it fails because it can't find exampleTABLE, it only has the alias [t] table. But embedded as nested query it will execute. I have no idea why the developer want to do a self-join, which is what the WHERE clause accomplishes.

    The WHERE clause is a filter criteria that wants all records where t.B=exampleTABLE.B AND t.C=exampleTABLE.B. This returns 195 records. If you filter on only one of the criteria the results are:

    t.B=exampleTABLE.B returns 195 records

    t.C=exampleTABLE.C returns 142 records

    The same 195 records of the first filter meet the combined filter.

    Look what happens with a self-join query on the B and C fields:

    SELECT exampleTABLE.ID, exampleTABLE_1.ID
    FROM exampleTABLE AS exampleTABLE_1 INNER JOIN exampleTABLE ON (exampleTABLE_1.C = exampleTABLE.C) AND (exampleTABLE_1.B = exampleTABLE.B);

    The result is 35612 records.

    The same result with WHERE clause instead of INNER JOIN.

    SELECT exampleTABLE.ID
    FROM exampleTABLE, exampleTABLE AS exampleTABLE_1
    WHERE (((exampleTABLE.B)=[exampleTABLE_1].[B]) AND ((exampleTABLE.C)=[exampleTABLE_1].[C])) ORDER BY exampleTable.ID;

    Now add TOP 10 PERCENT and GROUP BY clause and 131 records return.

    SELECT TOP 10 PERCENT exampleTABLE.ID
    FROM exampleTABLE, exampleTABLE AS exampleTABLE_1
    WHERE (((exampleTABLE.B)=[exampleTABLE_1].[B]) AND ((exampleTABLE.C)=[exampleTABLE_1].[C]))
    GROUP BY exampleTABLE.ID
    ORDER BY exampleTABLE.ID;

    So why the self-join? Why the nested? What did developer want to learn from this? No idea. Makes no sense to me.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  15. #15
    hapm is offline Competent Performer
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2014
    Posts
    197
    The nested query will be evaluated for every record in the normal query. If the record with given B and C is not in the first 10% of all records with the given B and C it will be removed from the query. Only a first guess without executing the query.

Page 1 of 2 12 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. percent
    By blackstatic42 in forum Access
    Replies: 4
    Last Post: 06-02-2013, 10:28 PM
  2. Question about a select query.
    By 03lances in forum Queries
    Replies: 6
    Last Post: 11-02-2012, 11:19 AM
  3. Replies: 4
    Last Post: 08-25-2012, 07:19 PM
  4. Simple Combo Box Multiple Select Question
    By ahamilton in forum Access
    Replies: 7
    Last Post: 03-17-2011, 01:38 PM
  5. Select query question
    By CoachBarker in forum Queries
    Replies: 2
    Last Post: 10-31-2010, 11:15 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums