Should I use distinct in my queries
Ask them why is it a bad practice. A lot of people make up rules or come up with things that they consider bad practice from reading the first page of the book or the first result of a google search. If it does the job and doesn't cause any issues there is no reason to create more work by finding alternatives. From the two options you have posted I would use distinct too because its shorter and easier to read and maintain.
In your example distinct
and group by
do the same thing. I think your colleagues means that your query should not return duplicates in the first instance and that you should be able to write your query without a distinct
or group by
clause. You maybe be able to reduce the duplicates by extending your join
conditions.
The answer provided by @zedfoxus is useful to understand the context.
However, I don't believe your query should require distinct records if the data is designed correctly.
It appears you are selecting the primary key of table spriden
, so all that data should be unique. You're also joining onto the spraddr
table; does that table really contain valid duplicate data? Or is there perhaps an additional join criterium that's required to filter out those duplicates?
This is why I get nervous about use of "distinct
" - the spraddr
table may include additional columns which you should use to filter out data, and "distinct
" may be hiding that.
Also, you may be generating a massive result set which needs to be filtered by the "distinct" clause, which can cause performance issues. For instance, if there are 1 million rows in spraddr
for each row in spriden
, and you should use the "is_current" flag to find the 2 or 3 "real" ones.
Finally, I get nervous when I see "group by" used as a substitute for distinct, not because it's "wrong", but because stylistically, I believe group by should be used for aggregate functions. That's just a personal preference.
Databases are smart to recognize what you mean. I expect both of your queries to perform equally well. It is important for someone else maintaining your query to know what you meant. If you really meant to retrieve distinct records, use DISTINCT
. If your intention was to do aggregation, use GROUP BY
Take a look at this question. There are some nice answers that might help.