This post is the result of assisting a poster with a requirement to select the Top N records per group. This was a learning exercise in which the speed was improved tremendously by using a Ranking Query.
The original solution was a traditional:
Code:
traditional Top 5
SELECT [running lines].horse, [running lines].rctrack, [running lines].RCDate, [running lines].rcRace, [running lines].[Date]
FROM [running lines]
WHERE ((([running lines].[date]) In
(
select top 5 XX.[Date] from
[running lines] as XX
WHERE XX.horse = [running lines].horse
Order By XX.[Date] desc ))
)
GROUP BY [running lines].horse, [running lines].rctrack, [running lines].RCDate, [running lines].rcRace, [running lines].[Date]
ORDER BY [running lines].horse, [running lines].[Date] DESC;
which worked reasonably well with 2000 records. But the poster had several thousand(30,000) and indicated the full query took 30+ minutes to process, and over 45 min for 60000+ records.
Another poster suggested to limit the records by Date and suggested this sql
Code:
with Date cut off
SELECT [running lines].rctrack, [running lines].RCDate, [running lines].rcRace, [running lines].horse, [running lines].Date, [running lines].Race
FROM [running lines]
WHERE [running lines].date In (SELECT TOP 5 XX.[Date]
FROM [running lines] as XX
WHERE XX.horse = [running lines].horse AND XX.[Date]
ORDER BY XX.[Date] DESC) AND [running lines].date)>=#6/1/2014#
GROUP BY [running lines].rctrack, [running lines].RCDate, [running lines].rcRace, [running lines].horse, [running lines].Date, [running lines].Race
ORDER BY [running lines].rctrack, [running lines].RCDate, [running lines].rcRace, [running lines].horse, [running lines].Date DESC;
I did some searching, trial and error which resulted in this query which was about 10 to 25 times quicker than the traditional Top N on the sample data.
Code:
with Ranking query
SELECT *
FROM (SELECT a1.rctrack, a1.RCDate, a1.rcRace, a1.horse, a1.Date, a1.Race, COUNT(*) AS CategoryRank
FROM [Running Lines Id] AS a1 INNER JOIN [Running Lines Id] AS a2
ON (a1.horse = a2.horse) AND (a1.date<= a2.date)
GROUP BY a1.rctrack, a1.rcdate, a1.rcrace, a1.horse, a1.date, a1.race
) AS RankingQuery
WHERE (((RankingQuery.[CategoryRank])<=5) )
ORDER BY RankingQuery.rctrack, RankingQuery.RCrace, RankingQuery.horse, RankingQuery.CategoryRank;
Here are some relative times:
Code:
LatestQueryByOP <------------------traditional Top 5
Start 31/12/2014 2:56:28 PM
End 31/12/2014 2:56:32 PM
4453 ~millisecs
RawbRevised '<----------------------------the revised Top 5 with Date cut off
Start 31/12/2014 5:48:52 PM
End 31/12/2014 5:48:54 PM
1828 ~millisecs
QuickerQueryJED <------------------the quicker query
Start 31/12/2014 2:56:32 PM
End 31/12/2014 2:56:32 PM
169 ~millisecs
Bottom line: Consider a ranking query approach when trying to speed up a Top N query.