Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128

    Query to find substring in table with half million rows = SLOW

    I have a table with a text field of 100 characters in size, but about half a million rows (and counting). The text field contains people's names, and there is no empty field. An index is also present for this text field. Whenever I run a SELECT query that searches for a substring in this field, it usually returns about a few thousand rows, and it takes 1.5 to 2 seconds on my quad-core Windows 7 PC:

    Code:
    SELECT * FROM Table1 WHERE ContactName Like "*james*";
    Obviously I want to minimize this time. Any ideas? Or does the high row count make it very difficult if not impossible? (For tables with only a few thousand rows, this kind of query finishes almost instantaneously on my PC.)

    I tried using the InStr() function, but it actually took MORE time (about a fraction of a sec to a full sec) than using the Like operator:

    Code:
    SELECT * FROM Table1 WHERE InStr(1, ContactName, "james") <> 0;
    I tried converting the text field to memo field (also with an index), but there was virtually no time difference.

    Then I upsized the table to SQL Server Express 2008, also created an index for the text field, and ran the same query in passthrough mode (i.e. using SQL Server syntax for queries) and in Access query syntax. To my surprise, running a passthrough query on the upsized table did NOT result in faster processing time than running an Access query on the Access table. And in fact, running an Access query on the upsized table was actually about A SECOND SLOWER than running an Access query on the Access table.

    Many years ago, I used SQL Server 6.5 and Access 2.0 (and later Access 97), and there was a MARKED increase in speed when running this kind of substring-finding queries on SQL Server compared to doing it in Access. But now, perhaps because later Access versions have greatly improved (I'm using 2007), SQL Server doesn't seem to offer a speed increase in this type of query.

    If a table is small (a few thousand rows), Access can finish the query nearly instantaneously. When I run a query on an address table (2000+ rows) that finds a substring in all the address fields (street, city, state, postal code, country) and then UNION it to another query on another address table (8000+ rows) that also finds the same substring on all its fields, Access is able to produce near-instant result. This was not what I experienced with the older Access versions.

    But for a table with half a million rows (and counting), is there any way to speed up such a query? Is it realistic to hope for near-instant result just like with tables that have only a few thousand rows? What concerns me is that my table is growing. Should I just use faster PCs? Or other database products? Are there other database products that perform better with this particular type of query?

  2. #2
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    I did not read your entire post but, you could try adding another column of type Integer and index it. Place a number in the new column that corresponds to the first letter of the name (for every record). Use VBA to run two queries.

  3. #3
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    After further tests, I realized I was mistaken in my original post about the type of query I ran. The query would take 1.5 to 2 seconds only when AGGREGATE functions are used. E.g.:

    Code:
    SELECT * FROM Table1 GROUP BY ContactName HAVING ContactName Like "*james*";
    ItsMe, I tried your suggestion, but adding an integer field did not change the processing time of such a query.

    Without aggregate functions, however, I would get near-instant result (with or without integer field):

    Code:
    SELECT * FROM Table1 WHERE ContactName Like "*james*";
    So obviously I need to avoid using aggregate functions when I query this type of tables. I'm marking this thread as solved.

  4. #4
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    If all you are using is GROUP BY, should be able to use WHERE clause.

    If aggregate is required, use WHERE to retrieve a smaller dataset and then use aggregate in a subquery of said dataset.

  5. #5
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    Quote Originally Posted by ItsMe View Post
    If all you are using is GROUP BY, should be able to use WHERE clause.

    If aggregate is required, use WHERE to retrieve a smaller dataset and then use aggregate in a subquery of said dataset.
    Do you mean like this?

    Code:
    SELECT * FROM Table1 WHERE ContactName Like "*james*" GROUP BY ContactName;
    That also did not change the processing time.

    I also tried multiple queries. A query named Query1 had:

    Code:
    SELECT * FROM Table1 WHERE ContactName Like "*james*";
    ...which ran instantaneously as I mentioned, and returned only about 5000 records.

    But this query:

    Code:
    SELECT * FROM Query1 GROUP BY ContactName;
    ... took 1.5 to 2 seconds.

    Then I tried creating a new table, then running the aggregate query on this new table:

    Code:
    SELECT * INTO NewTable1 FROM Table1 WHERE ContactName Like "*james*";
    
    SELECT * FROM NewTable1 GROUP BY ContactName;
    And it finished instantly. For my project, creating new tables is not a desirable solution, however.

    So it does seem that if aggregate functions are used, whether directly or indirectly, on a half-million-row table, the delay always occurs.

    I'm going to upload my database later so you and others can see if I missed something.

  6. #6
    June7's Avatar
    June7 is online now VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,960
    1.5 to 2 seconds is unacceptable?

    Not much can be done about this. More data naturally means more time to process.

    Indexing might help. Do you have any set?
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  7. #7
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    Quote Originally Posted by June7 View Post
    1.5 to 2 seconds is unacceptable?
    Yes, due to the frequency at which this type of query has to be run.

    Quote Originally Posted by June7 View Post
    Not much can be done about this. More data naturally means more time to process.

    Indexing might help. Do you have any set?
    Yes. In the queries mentioned in my previous posts, which were simple ones that I used only to demonstrate the time delay, having an index did not make a big difference in processing time. But I have had more complex queries where having an index made a huge difference in processing time (from 30 secs to a minute all the way down to near-instant).

  8. #8
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    Hmmm, I do not think GROUP BY is evaluated before WHERE. So, I did not expect that. Maybe the issue is around the Access Query Object as a subquery. Like June mentioned, may not be anything that can be done.

    I would be interested if doing the entire thing in SQL and using an inner join would be a benefit.


    SELECT a.ContactName
    FROM Table1 AS a INNER JOIN
    (SELECT Table1.ContactName
    FROM Table1
    WHERE ((Table1.ContactName) Like "*James*")) AS qryInnerJoin
    ON a.ContactName = qryInnerJoin.ContactName
    GROUP BY a.ContactName;

  9. #9
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    Quote Originally Posted by ItsMe View Post
    Hmmm, I do not think GROUP BY is evaluated before WHERE. So, I did not expect that. Maybe the issue is around the Access Query Object as a subquery. Like June mentioned, may not be anything that can be done.

    I would be interested if doing the entire thing in SQL and using an inner join would be a benefit.


    SELECT a.ContactName
    FROM Table1 AS a INNER JOIN
    (SELECT Table1.ContactName
    FROM Table1
    WHERE ((Table1.ContactName) Like "*James*")) AS qryInnerJoin
    ON a.ContactName = qryInnerJoin.ContactName
    GROUP BY a.ContactName;
    I just tried this, but it also yielded a 1.5 to 2-second processing time. I'm attaching my database here, which includes all of the aforementioned queries: QueryTests.zip (6MB)

  10. #10
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    I have heard people complain about Access queries doing a table dump. I never understood it because I always had success with the WHERE clause. It seems that adding the GROUP By statement disregards all SQL, parenthesis, etc.

    I guess GROUP BY is evaluated first, kinda. At least it seems to bring the entire table into memory before the WHERE criteria. This is very interesting and might help to explain phenomenon I previously did not understand. Note to self, use temp tables for large tables and the GROUP BY statement.

  11. #11
    Perceptus's Avatar
    Perceptus is offline Expert
    Windows 7 64bit Access 2007
    Join Date
    Nov 2012
    Location
    Knoxville, Tennessee
    Posts
    659
    Can you add a job to the SQL server to crunch the data periodically into a table that could be referenced then by access? This could decrease the visible display time.

  12. #12
    orange's Avatar
    orange is online now Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,742
    I don't think indexes are used when you use Like "*SearchTerm" structure. I think you are doing a sequential read when your searchterm starts with a wild card.
    Indexing would be used with Like "PartOfSearchTerm*" or logical operator (=,>,<...)

    What sort of speed would be considered "acceptable"? How many of these queries/searches are done (per minute/per hour/per day)?

    Can you tell us more about what is being searched and the context?

    This may not apply to your situation, but may be useful.

    We once had to have "instant/rapid" search for specific records in about 120,000.
    This was for Companies with English and French names - Operating, Legal and aliases- entered by external users. The approach that worked for us was to break out each word/term (keyword) in the these Names and associate them with the assigned CompanyIdentifer. This gave a search table with which we could find unique companies based on Keyword(s). There were indexes on phone numbers, addresses (mailing, physical, shipping, billing) and SIC codes. When terms were used for search, the process identified qualifying record(s), and since these were keywords we could, indexes were used in the search.

  13. #13
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    Quote Originally Posted by Perceptus View Post
    Can you add a job to the SQL server to crunch the data periodically into a table that could be referenced then by access? This could decrease the visible display time.
    I'm thinking of a re-design of the table. Those who downloaded my database would notice that there are a lot of repeating names in the table. I'm thinking of creating a NORMALIZED table with non-repeating names, thus eliminating the need of aggregate queries. The downside is, of course, I would have to modify a lot of the queries and modules related to that table. This just shows how important database design is at the onset of a project. The past week I've advised several posters on the potential pitfalls of ill-designed tables. And here I am, falling victim to that myself. I believe that query is the ONLY aggregate query in the entire project!

    I have another project where I also have tables with 6-figure row counts, and I resort to using temporary tables in order to reduce processing time of some of the more complex queries, which also involve aggregate functions.

  14. #14
    orange's Avatar
    orange is online now Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,742
    Perhaps you should tell us what makes the queries so complex.

    In your database, Table1, you have 418619 records. However, there are only 149700 unique records.
    Last edited by orange; 05-02-2015 at 05:00 AM.

  15. #15
    keviny04 is offline Competent Performer
    Windows 7 64bit Access 2007
    Join Date
    Apr 2015
    Posts
    128
    orange, the keyword approach is interesting so I'm going to put it in my back pocket should any of my future projects have a need for it, thanks. But for my current project in question, I'm going to stick to my initial "solution", which is avoidance of aggregate queries by way of normalizing the table. Even without the usage of your keywords approach, queries for finding substrings already produce near-instant results, as I said, provided that they don't use aggregate functions.

    As for complex queries, I haven't really had much trouble with their processing speed in general (with the ones I mentioned earlier being one of few exceptions). Maybe the speed of modern CPUs and improvements in Access have something to with it. The one query where I had to create temporary tables was done many years ago under Access 97 on ancient PCs. Maybe if I go back and rewrite the query without temporary tables, it may run just fine.

Page 1 of 2 12 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 5
    Last Post: 07-01-2014, 11:10 AM
  2. Replies: 3
    Last Post: 10-17-2012, 09:24 AM
  3. Replies: 3
    Last Post: 03-14-2012, 06:48 AM
  4. Query to find and COMBINE rows based on 2 fields
    By admessing in forum Queries
    Replies: 2
    Last Post: 12-13-2011, 12:59 PM
  5. Replies: 13
    Last Post: 10-26-2011, 03:49 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums