Results 1 to 12 of 12
  1. #1
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6

    Trouble w/ a DISTINCT SQL

    I'm doing some work with Windows IIS server logs to capture valid hits to particular video playback URLs. After importing the data, I want to eliminate hits by the same IP address within a short time window (double-clicks, and users just rapidly clicking links, but not watching ).

    I'll try to break this down to minimal pieces. I have the following table with 2 columns:



    Expr1 ( DATE & TIMESTAMP ( HH:M ) & IP ), Field 7 (URL)approx 385K records

    Using: SELECT DISTINCT Working2_IBMCV_Step3a.Expr1 FROM Working2_IBMCV_Step3a I get the 314K records I want ( but I don't have Field 7 URL data for the report )

    Trying: SELECT DISTINCT Working2_IBMCV_Step3a.Expr1, Working2_IBMCV_Step3a.Field7 FROM Working2_IBMCV_Step3a I get approx 325K records, BUT it breaks the DISTINCT Expr1 data.

    How can I get the query to apply the DISTINCT to ONLY Expr1 and present the matching Field 7 values ( the 314K records I want for the report ) ???

    Sub-query?? If so, syntax?? I tried a two table approach with a JOIN on Expr1, but ended up with essentially the same problem.

    Any assistance would be greatly appreciated.

  2. #2
    June7's Avatar
    June7 is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,816
    So are there possibly multiple URL for some Expr1? If there are, how should query know which one to pull? Both produce a valid DISTINCT record.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  3. #3
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6
    Quote Originally Posted by June7 View Post
    So are there possibly multiple URL for some Expr1? If there are, how should query know which one to pull? Both produce a valid DISTINCT record.
    Stats_minimal.zip

    I think I did the attachment correctly...

    Yes, multiple URLs, because Expr1 has intentional duplicate vales due to my TRIM of the Timestamp. I really only want the first occurrence of any duplicate Expr1 ( along with the associated URL ).
    I would like a more sophisticated method that only allows an IP to have one hit to a URL within any exact 10min window, but that seems to be a pretty hairy prospect considering my current SQL skills.

    In my database, look at pared down table "Working2_IBMCV_Step3a" and note records 7 thru 11. I want my Make Table query to only select the 1st, 3rd, and 5th records based on Expr1 data.

    I understand why the DISTINCT in my SQL doesn't work, but I don't know how to get what I'm after.

    I appreciate you looking at this.

  4. #4
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    suggest explain simply what it is you are trying to achieve.

    You say you want 1st, 3rd and 5th record from that group - but on what basis? what is the rule that defines which one you want? 'short time window' is how long? Looks like you are saying anything in the same 10 minute period so something with times of

    06:59:32
    07:00:32

    would be included even tho' it is only a minute later

    but you would exclude this pair even tho they are 9 minutes apart.

    06:10:32
    06:19:32

    And 10 minutes seems a long time for a 'short period of time' when you are talking about double clicks.

    And having identified your 'pairs' - do you really want the first one? the last one? or it doesn't matter? Reason for asking is if it matters then the solution may be different

  5. #5
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6
    You are corrected that simply trimming down the timestamp to HH:M is imperfect at best regardless of the query process. My predecessor had used this method to get rid of double-clicks, but also to eliminate people rapidly selecting multiple videos and never substantially playing them. Most of our shorter videos are in the 5-7min range, so we settled on a 10min window.

    My source IIS Log file is already imported "ordered by" IP, then DATA, then TIMESTAMP, so I have things in the order I want them to be in.

    What I'd really like to do is select the first occurrence of an IP address, ( insert into the new table ), then test if the next occurrence of the IP is on the same date, and if so, at least 10min later. I knew how to pull this off when I used to use Visual FoxPro and Xcode, but I never advanced to VB. I suspect that is what is needed here??

    Any thoughts would be greatly appreciated. Maybe even a suggestion of a VB programmer??

    Thanks

  6. #6
    Micron is offline Virtually Inert Person
    Windows 10 Access 2016
    Join Date
    Jun 2014
    Location
    Ontario, Canada
    Posts
    12,737
    First, I think you will need a primary key in your tables, otherwise Access will have issues treating anything as unique and in some cases, won't be able to run certain queries at all. One type it usually has such issues with is subqueries, which I think you're going to need. However I'm not really making sense of how to operate the db or exactly what I'm looking at. I have to drop out until maybe later tonight but in the meantime, I'm thinking you need to sort by date/time and use a subquery to get the next date/time that is <10 minutes where a url is the same - the 10 minutes being derived by DateDiff. Perhaps this will provide some insight as to what I'm saying

    http://www.allenbrowne.com/subquery-01.html
    The more we hear silence, the more we begin to think about our value in this universe.
    Paraphrase of Professor Brian Cox.

  7. #7
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    if the next occurrence of the IP is on the same date, and if so,
    a 10 minute window is just the time+10 minutes. Otherwise you will miss one if the first time is just before midnight and the second one just after

    So what do you want to happen if you have the following times for a given IP

    07:30
    09:30
    09:39
    09:41
    10:22

    do you want to ignore 09:39 since it is within 10 minutes of 09:30 (I presume yes)
    but then do you want to ignore 09:41 because that is within 10 minutes of 09:39, or include it since it is over 10 minutes since 09:30

    you haven't said which time to return i.e. per above, ignore 09:39 or ignore 09:30 because there is a subsequent time withing 10 minutes

    this does not require VB, can all be done in sql, providing you are clear about what you actually want. Your current method is clumsy at best and certainly not the best way to solve the problem. You certainly make life difficult for yourself by not giving fields proper names, storing dates and times separately and as text.

    for example, this might work

    Code:
    SELECT * 
    FROM workingTable A
    WHERE (SELECT count(*) FROM workingTable T WHERE workingDate+workingTime BETWEEN DateAdd("n",-10,A.workingDate+A.workingTime) AND A.workingDate+A.workingTime AND IP=A.IP)=0
    This will only include records where there is not an earlier record within 10 minutes. So from the above example, that would be 07:30, 09:30 and 10:22

    I'll leave you to work out which field is which

    My source IIS Log file is already imported "ordered by" IP, then DATA, then TIMESTAMP, so I have things in the order I want them to be in.
    actually that is completely irrelevant. In databases you need to specify order, records are actually stored randomly.

    edit: see Micron has also responded with a similar suggestion

  8. #8
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6
    I would want to return

    07:30
    09:30
    09:41
    10:22

    I get the points you and Micron have made regarding clumsy field names, text for all, and lack of primary key. I didn't set up the original database, but should have cleaned things up...

    Let me re-import the IIS text data, set proper field types & names, add a primary key and I'll re-attach a better sample to evaluate. I will do this early tomorrow morning.

    I had only applied the sort so I could inspect what URL a specific IP selected on a particular DATE and TIME to better understand what the user was doing. This just put things in sequence.

    I greatly appreciate the feedback.

  9. #9
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    I would want to return

    07:30
    09:30
    09:41
    10:22
    that will be difficult because of the 09:39. You are getting into what is called recursion. Let me give you another example

    what would you want returned from these times?
    07:30
    09:30
    09:39
    09:41
    09:48
    10:22

    That would return the records as before for a first pass

    so you would need to do a second pass looking at the 10 minute interval from the first pass - but that will bring back both 09:41 and 09:48.

    The point is what are you actually trying to achieve? So far you have only told us how you are trying to achieve something, not what that something is. OK - to reduce the number of clicks - but to what end? what is the relevance? Is the reason you want to include 09:41 simply because that is what your current system does? (right or wrong)?

    It may be that grouping is the way to go. If you have a PK then you can simply choose the first (or minimum PK) for that particular group of IP's and date/time in 10 minute 'slots' and live with that definition - knowing it does not meet the requirement of 'within' 10 minutes.



  10. #10
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6
    I need to generate a hit count report for each published video URL for the month. HOWEVER, I have specific reasons to only select an IP address ( with associated URL ) once within any 10min window on any date.
    I see where this causes SQL problems.

    Wouldn't it be easier to simply have VB code that steps through my 1st table ( ordered by IP, then DATE, TIMESTAMP ), tests each record for criteria, and if met, INSERTS that record into a new table? It seems to me that the specific 10min requirement may dictate this approach?

    Can SQL pull off this function by uniquely testing each DATE/TIMESTAMP and making sure it is at least 10min or greater, and then evaluating the next to see how it meets that criteria?

  11. #11
    CJ_London is offline VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    you can do what you want just using a group by query and utilising the partition function on the time element

    you need to convert say

    06:59

    to a number that can be used by the partition function - I would suggest the number of minutes

    e,g,

    ?(hour(#06:59#)*60)+minute(#06:59#)
    419


    so

    Partition((hour(#06:59#)*60)+minute(#06:59#),0,144 0,10)

    (1440 is the number of minutes in a day)

    so you can then group your table by IP, date and the partition function and select first PK (once you have this in your table)

    use this query to link back to your table on PK to return the full record

    based on one of your tables (with a PK added)
    Code:
    SELECT Working2_IBMCV_Step3a.*
    FROM (SELECT First(Working2_IBMCV_Step3a.PK) AS FirstOfPK
    FROM Working2_IBMCV_Step3a
    GROUP BY Working2_IBMCV_Step3a.Field1, Working2_IBMCV_Step3a.Field10, Partition((Hour([field2])*60)+Minute([field2]),0,1440,10))  AS Fst INNER JOIN Working2_IBMCV_Step3a ON Fst.FirstOfPK = Working2_IBMCV_Step3a.PK

  12. #12
    Bruce Covey is offline Novice
    Windows 10 Access 2016
    Join Date
    Nov 2020
    Location
    Cartersville, Ga.
    Posts
    6
    Perfect! This seems to accomplish what I'm after.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 6
    Last Post: 05-05-2020, 11:21 PM
  2. Replies: 0
    Last Post: 02-11-2020, 08:12 PM
  3. Trouble Shooting Union and Distinct Count Queries
    By EcologyHelp in forum Queries
    Replies: 5
    Last Post: 04-21-2016, 09:45 AM
  4. distinct
    By frustratedwithaccess in forum Access
    Replies: 1
    Last Post: 12-04-2014, 03:47 PM
  5. Need help for distinct...
    By gunapriyan in forum Queries
    Replies: 2
    Last Post: 05-28-2010, 12:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums