Why are aggregate functions not allowed in where clause
The reason you can't use SUM()
in the WHERE
clause is the order of evaluation of clauses.
FROM
tells you where to read rows from. Right as rows are read from disk to memory, they are checked for the WHERE
conditions. (Actually in many cases rows that fail the WHERE
clause will not even be read from disk. "Conditions" are formally known as predicates and some predicates are used - by the query execution engine - to decide which rows are read from the base tables. These are called access predicates.) As you can see, the WHERE
clause is applied to each row as it is presented to the engine.
On the other hand, aggregation is done only after all rows (that verify all the predicates) have been read.
Think about this: SUM()
applies ONLY to the rows that satisfy the WHERE
conditions. If you put SUM()
in the WHERE
clause, you are asking for circular logic. Does a new row pass the WHERE
clause? How would I know? If it will pass, then I must include it in the SUM
, but if not, it should not be included in the SUM
. So how do I even evaluate the SUM
condition?
Why can't we use aggregate function in where clause
Aggregate functions work on sets of data. A WHERE
clause doesn't have access to entire set, but only to the row that it is currently working on.
You can of course use HAVING clause:
select name from employee
group by name having sum(salary) > 1000;
If you must use WHERE
, you can use a subquery:
select name from (
select name, sum(salary) total_salary from employee
group by name
) t where total_salary > 1000;
sum()
is an aggregation function. In general, you would expect it to work with group by
. Hence, your first query is missing a group by
. In a group by
query, having
is used for filtering after the aggregation:
Select name
from employee
group by name
having sum(salary) > 1000 ;