Query to find substring in table with half million rows = SLOW

**keviny04** · 04-30-2015, 08:04 PM

I have a table with a text field of 100 characters in size, but about half a million rows (and counting). The text field contains people's names, and there is no empty field. An index is also present for this text field. Whenever I run a SELECT query that searches for a substring in this field, it usually returns about a few thousand rows, and it takes 1.5 to 2 seconds on my quad-core Windows 7 PC:

Code:

SELECT * FROM Table1 WHERE ContactName Like "*james*";

Obviously I want to minimize this time. Any ideas? Or does the high row count make it very difficult if not impossible? (For tables with only a few thousand rows, this kind of query finishes almost instantaneously on my PC.)

I tried using the InStr() function, but it actually took MORE time (about a fraction of a sec to a full sec) than using the Like operator:

Code:

SELECT * FROM Table1 WHERE InStr(1, ContactName, "james") <> 0;

I tried converting the text field to memo field (also with an index), but there was virtually no time difference.

Then I upsized the table to SQL Server Express 2008, also created an index for the text field, and ran the same query in passthrough mode (i.e. using SQL Server syntax for queries) and in Access query syntax. To my surprise, running a passthrough query on the upsized table did NOT result in faster processing time than running an Access query on the Access table. And in fact, running an Access query on the upsized table was actually about A SECOND SLOWER than running an Access query on the Access table.

Many years ago, I used SQL Server 6.5 and Access 2.0 (and later Access 97), and there was a MARKED increase in speed when running this kind of substring-finding queries on SQL Server compared to doing it in Access. But now, perhaps because later Access versions have greatly improved (I'm using 2007), SQL Server doesn't seem to offer a speed increase in this type of query.

If a table is small (a few thousand rows), Access can finish the query nearly instantaneously. When I run a query on an address table (2000+ rows) that finds a substring in all the address fields (street, city, state, postal code, country) and then UNION it to another query on another address table (8000+ rows) that also finds the same substring on all its fields, Access is able to produce near-instant result. This was not what I experienced with the older Access versions.

But for a table with half a million rows (and counting), is there any way to speed up such a query? Is it realistic to hope for near-instant result just like with tables that have only a few thousand rows? What concerns me is that my table is growing. Should I just use faster PCs? Or other database products? Are there other database products that perform better with this particular type of query?

**ItsMe** · 04-30-2015, 08:20 PM

I did not read your entire post but, you could try adding another column of type Integer and index it. Place a number in the new column that corresponds to the first letter of the name (for every record). Use VBA to run two queries.

**keviny04** · 05-01-2015, 09:12 AM

After further tests, I realized I was mistaken in my original post about the type of query I ran. The query would take 1.5 to 2 seconds only when AGGREGATE functions are used. E.g.:

Code:

SELECT * FROM Table1 GROUP BY ContactName HAVING ContactName Like "*james*";

ItsMe, I tried your suggestion, but adding an integer field did not change the processing time of such a query.

Without aggregate functions, however, I would get near-instant result (with or without integer field):

Code:

SELECT * FROM Table1 WHERE ContactName Like "*james*";

So obviously I need to avoid using aggregate functions when I query this type of tables. I'm marking this thread as solved.

**ItsMe** · 05-01-2015, 09:46 AM

If all you are using is GROUP BY, should be able to use WHERE clause.

If aggregate is required, use WHERE to retrieve a smaller dataset and then use aggregate in a subquery of said dataset.

**keviny04** · 05-01-2015, 11:22 AM

Originally Posted by ItsMe

If all you are using is GROUP BY, should be able to use WHERE clause.

If aggregate is required, use WHERE to retrieve a smaller dataset and then use aggregate in a subquery of said dataset.

Do you mean like this?

Code:

SELECT * FROM Table1 WHERE ContactName Like "*james*" GROUP BY ContactName;

That also did not change the processing time.

I also tried multiple queries. A query named Query1 had:

Code:

SELECT * FROM Table1 WHERE ContactName Like "*james*";

...which ran instantaneously as I mentioned, and returned only about 5000 records.

But this query:

Code:

SELECT * FROM Query1 GROUP BY ContactName;

... took 1.5 to 2 seconds.

Then I tried creating a new table, then running the aggregate query on this new table:

Code:

SELECT * INTO NewTable1 FROM Table1 WHERE ContactName Like "*james*";

SELECT * FROM NewTable1 GROUP BY ContactName;

And it finished instantly. For my project, creating new tables is not a desirable solution, however.

So it does seem that if aggregate functions are used, whether directly or indirectly, on a half-million-row table, the delay always occurs.

I'm going to upload my database later so you and others can see if I missed something.

**June7** · 05-01-2015, 11:29 AM

1.5 to 2 seconds is unacceptable?

Not much can be done about this. More data naturally means more time to process.

Indexing might help. Do you have any set?

**keviny04** · 05-01-2015, 11:49 AM

Originally Posted by June7

1.5 to 2 seconds is unacceptable?

Yes, due to the frequency at which this type of query has to be run.

Originally Posted by June7

Not much can be done about this. More data naturally means more time to process.

Indexing might help. Do you have any set?

Yes. In the queries mentioned in my previous posts, which were simple ones that I used only to demonstrate the time delay, having an index did not make a big difference in processing time. But I have had more complex queries where having an index made a huge difference in processing time (from 30 secs to a minute all the way down to near-instant).

**ItsMe** · 05-01-2015, 12:21 PM

Hmmm, I do not think GROUP BY is evaluated before WHERE. So, I did not expect that. Maybe the issue is around the Access Query Object as a subquery. Like June mentioned, may not be anything that can be done.

I would be interested if doing the entire thing in SQL and using an inner join would be a benefit.

SELECT a.ContactName
FROM Table1 AS a INNER JOIN
(SELECT Table1.ContactName
FROM Table1
WHERE ((Table1.ContactName) Like "*James*")) AS qryInnerJoin
ON a.ContactName = qryInnerJoin.ContactName
GROUP BY a.ContactName;

**keviny04** · 05-01-2015, 12:44 PM

Originally Posted by ItsMe

Hmmm, I do not think GROUP BY is evaluated before WHERE. So, I did not expect that. Maybe the issue is around the Access Query Object as a subquery. Like June mentioned, may not be anything that can be done.

I would be interested if doing the entire thing in SQL and using an inner join would be a benefit.

SELECT a.ContactName
FROM Table1 AS a INNER JOIN
(SELECT Table1.ContactName
FROM Table1
WHERE ((Table1.ContactName) Like "*James*")) AS qryInnerJoin
ON a.ContactName = qryInnerJoin.ContactName
GROUP BY a.ContactName;

I just tried this, but it also yielded a 1.5 to 2-second processing time. I'm attaching my database here, which includes all of the aforementioned queries: QueryTests.zip (6MB)

**ItsMe** · 05-01-2015, 01:13 PM

I have heard people complain about Access queries doing a table dump. I never understood it because I always had success with the WHERE clause. It seems that adding the GROUP By statement disregards all SQL, parenthesis, etc.

I guess GROUP BY is evaluated first, kinda. At least it seems to bring the entire table into memory before the WHERE criteria. This is very interesting and might help to explain phenomenon I previously did not understand. Note to self, use temp tables for large tables and the GROUP BY statement.

**Perceptus** · 05-01-2015, 01:47 PM

Can you add a job to the SQL server to crunch the data periodically into a table that could be referenced then by access? This could decrease the visible display time.

**orange** · 05-01-2015, 02:01 PM

I don't think indexes are used when you use Like "*SearchTerm" structure. I think you are doing a sequential read when your searchterm starts with a wild card.
Indexing would be used with Like "PartOfSearchTerm*" or logical operator (=,>,<...)

What sort of speed would be considered "acceptable"? How many of these queries/searches are done (per minute/per hour/per day)?

Can you tell us more about what is being searched and the context?

This may not apply to your situation, but may be useful.

We once had to have "instant/rapid" search for specific records in about 120,000.
This was for Companies with English and French names - Operating, Legal and aliases- entered by external users. The approach that worked for us was to break out each word/term (keyword) in the these Names and associate them with the assigned CompanyIdentifer. This gave a search table with which we could find unique companies based on Keyword(s). There were indexes on phone numbers, addresses (mailing, physical, shipping, billing) and SIC codes. When terms were used for search, the process identified qualifying record(s), and since these were keywords we could, indexes were used in the search.

**keviny04** · 05-01-2015, 02:44 PM

Originally Posted by Perceptus

Can you add a job to the SQL server to crunch the data periodically into a table that could be referenced then by access? This could decrease the visible display time.

I'm thinking of a re-design of the table. Those who downloaded my database would notice that there are a lot of repeating names in the table. I'm thinking of creating a NORMALIZED table with non-repeating names, thus eliminating the need of aggregate queries. The downside is, of course, I would have to modify a lot of the queries and modules related to that table. This just shows how important database design is at the onset of a project. The past week I've advised several posters on the potential pitfalls of ill-designed tables. And here I am, falling victim to that myself. I believe that query is the ONLY aggregate query in the entire project!

I have another project where I also have tables with 6-figure row counts, and I resort to using temporary tables in order to reduce processing time of some of the more complex queries, which also involve aggregate functions.

**orange** · 05-01-2015, 04:43 PM

Perhaps you should tell us what makes the queries so complex.

In your database, Table1, you have 418619 records. However, there are only 149700 unique records.

**keviny04** · 05-01-2015, 05:55 PM

orange, the keyword approach is interesting so I'm going to put it in my back pocket should any of my future projects have a need for it, thanks. But for my current project in question, I'm going to stick to my initial "solution", which is avoidance of aggregate queries by way of normalizing the table. Even without the usage of your keywords approach, queries for finding substrings already produce near-instant results, as I said, provided that they don't use aggregate functions.

As for complex queries, I haven't really had much trouble with their processing speed in general (with the ones I mentioned earlier being one of few exceptions). Maybe the speed of modern CPUs and improvements in Access have something to with it. The one query where I had to create temporary tables was done many years ago under Access 97 on ancient PCs. Maybe if I go back and rewrite the query without temporary tables, it may run just fine.

Query to find substring in table with half million rows = SLOW

Thread Tools

Query to find substring in table with half million rows = SLOW

Similar Threads

Best way to load Access table with 10 million records ?

my query working fine. i want the query to find the total showing all the rows

calculated column - does value exist as substring in another table

Query to find and COMBINE rows based on 2 fields

query to extract data from million rows of data

Posting Permissions