help with query returning duplicates

**BrockWade** · 12-05-2013, 11:37 AM

Can anyone advise on what might be causing a query such as the one below to be returning duplicates which are not needed? The problem may be in the lack of uniqueness of the data itself... for example each of the two files (ExcelData & ExcelDataBook) are Excel uploads which only have 2 fields each (with no record number or key field): "Store" and "Amount" and often there will be maybe four sequential records like "Store 250 $100" "Store 250 $100" "Store 250 $100" "Store 250 $100" Thanks! BW

SELECT ExcelData.Store, ExcelData.Amount
FROM ExcelData LEFT JOIN ExcelDataBook ON (ExcelData.Amount = ExcelDataBook.Amount) AND (ExcelData.[Store] = ExcelDataBook.[Store])

WHERE (((ExcelData.Store)<>144) AND ((ExcelDataBook.Store) Is Not Null))
ORDER BY ExcelData.Store, ExcelData.Amount DESC;

**JoeM** · 12-05-2013, 12:00 PM

If you are getting "duplicate" data, it is due to the nature of your data and/or relationships.
This can happen if you have "one-to-many" or "many-to-many" relationships established, or if your relationship is not written correctly. I am guessing it is probably the former.

One easy way to eliminate duplicates in a "one-to-many" or "many-to-many" scenario is to add the word "DISTINCT" after SELECT, i.e.

Code:

SELECT DISTINCT ExcelData.Store, ExcelData.Amount
FROM ...

**Dal Jeanis** · 12-05-2013, 12:41 PM

Okay, so the question is, what values really ARE needed? What is the meaning of the duplicated data in each table/file?

I notice that you are already joining on ExcelDataBook.Store, so you don't need to test it for Null. If you just wwant to kill the dups, then you can use GROUP BY:

Code:

SELECT 
   ExcelData.Store, 
   ExcelData.Amount
FROM 
   ExcelData 
   LEFT JOIN 
   ExcelDataBook 
   ON (ExcelData.Amount = ExcelDataBook.Amount) 
   AND (ExcelData.Store = ExcelDataBook.Store)
WHERE 
   (ExcelData.Store)<>144) 
GROUP BY 
   ExcelData.Store, 
   ExcelData.Amount
ORDER BY 
   ExcelData.Store, 
   ExcelData.Amount DESC;

By the way, JoeM's answer is correct also.

**JoeM** · 12-05-2013, 01:02 PM

Dal,

I have used both methods also (SELECT DISTINCT vs. GROUP BY). They both seem work equally as well.
I have often wondered if one is preferred over the other (maybe one is more efficient, or one has some potential issues to watch out for).
Do you know if there is a "preferred" way or if it doesn't really matter?

Just one of those things I have always been curious about...

**Dal Jeanis** · 12-05-2013, 01:57 PM

This page says that in absence of aggregate functions, they are, for all intents and purposes, the same. http://asktom.oracle.com/pls/asktom/...32961403234212

This page agrees, and implies that DISTINCT is preferred http://sqlmag.com/database-performan...tinct-vs-group

However, over in real life, these pages show that in complex queries there are major differences and the GROUP BY beats DISTINCT all to heck and back.
http://stackoverflow.com/questions/1...ersus-group-by
http://msmvps.com/blogs/robfarley/ar...p-by-wins.aspx

Note that the last listed site says WHY, and offers some advice on ways to improve the structure of a query's execution plan.

**JoeM** · 12-05-2013, 02:12 PM

Thanks Dal, some interesting stuff there. I guess the answers is "it depends...".

The one nice thing about using DISTINCT is if you are not using an Aggregate Function, it makes your code shorter without the GROUP BY clause and all those fields listed in the GROUP BY clause (I like clear and concise!).

I agree with a comment that someone made that if you have a well-designed database and your relationships are set up properly, you hopefully shouldn't run into this situation too often, but it does come up (especially since we are often dealing with less than ideal data structures that we did not create).

**Dal Jeanis** · 12-05-2013, 02:19 PM

Or less-than-ideal structures that we DID create.

I'm just sayin'.

**JoeM** · 12-05-2013, 02:23 PM

Or less-than-ideal structures that we DID create.

I have NO idea what you are talking about!

help with query returning duplicates

Thread Tools

help with query returning duplicates

Similar Threads

Query Returning Duplicates

Querying multiple queries, returning duplicates

Union Query and returning the value instead of the key

Query - Returning ID instead of Value...??

Returning inverse of a query

Posting Permissions