Select which has max date or latest date
Your current query is not giving the desired result because you are using a GROUP BY
clause on the PERSON_ID
column which has a unique value for both entries. As a result you will return both rows.
There are a few ways that you can solve this. You can use a subquery to apply the aggregate function to return the max(LAST_UPDATE_DATE_TIME)
for each SCHOOL_CODE
:
select s1.LAST_UPDATE_DATE_TIME,
s1.SCHOOL_CODE,
s1.PERSON_ID
from SCHOOL_STAFF s1
inner join
(
select max(LAST_UPDATE_DATE_TIME) LAST_UPDATE_DATE_TIME,
SCHOOL_CODE
from SCHOOL_STAFF
group by SCHOOL_CODE
) s2
on s1.SCHOOL_CODE = s2.SCHOOL_CODE
and s1.LAST_UPDATE_DATE_TIME = s2.LAST_UPDATE_DATE_TIME;
See SQL Fiddle with Demo
Or you can use use a windowing function to return the rows of data for each school with the most recent LAST_UPDATE_DATE_TIME
:
select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME
from
(
select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME,
row_number() over(partition by SCHOOL_CODE
order by LAST_UPDATE_DATE_TIME desc) seq
from SCHOOL_STAFF
where STAFF_TYPE_NAME='Principal'
) d
where seq = 1;
See SQL Fiddle with Demo
This query implements row_number()
which assigns a unique number to each row in the partition of SCHOOL_CODE
and placed in a descending order based on the LAST_UPDATE_DATE_TIME
.
As a side note, the JOIN with aggregate function is not exactly the same as the row_number()
version. If you have two rows with the same event time the JOIN will return both rows, while the row_number()
will only return one. If you want to return both with a windowing function, then consider using the rank()
windowing function instead as it will return ties:
select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME
from
(
select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME,
rank() over(partition by SCHOOL_CODE
order by LAST_UPDATE_DATE_TIME desc) seq
from SCHOOL_STAFF
where STAFF_TYPE_NAME='Principal'
) d
where seq = 1;
See Demo
I'm surprised nobody has taken advantage of window functions beyond row_number()
Here's some data to play with:
CREATE TABLE SCHOOL_STAFF
(
LAST_UPDATE_DATE_TIME VARCHAR(20),
SCHOOL_CODE VARCHAR(20),
PERSON_ID VARCHAR(20),
STAFF_TYPE_NAME VARCHAR(20)
);
INSERT INTO SCHOOL_STAFF VALUES ('24-JAN-13', 'ABE', '111222', 'Principal');
INSERT INTO SCHOOL_STAFF VALUES ('09-FEB-12', 'ABE', '222111', 'Principal');
The OVER() clause creates a window for which you will define your aggregate groups. In this case, I am only partitioning on the SHOOL_CODE, so we will see the FIRST_VALUE, which will come from LAST_UPDATE_DATE_TIME, grouped by SCHOOL_CODE, and in the order of LAST_UPDATE_DATE_TIME by descending order. This value will be applied to the entire column for each SCHOOL_CODE.
It is important to pay close attention to your partitioning and ordering in the over() clause.
SELECT DISTINCT
FIRST_VALUE(LAST_UPDATE_DATE_TIME) OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS LAST_UPDATE
,FIRST_VALUE(SCHOOL_CODE) OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS SCHOOL_CODE
,FIRST_VALUE(PERSON_ID) OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS PERSON_ID
FROM SCHOOL_STAFF
WHERE STAFF_TYPE_NAME = 'Principal'
ORDER BY SCHOOL_CODE
Returns:
24-JAN-13 ABE 111222
This should eliminate your need for GROUP BY and Subqueries for the most part. You will want to make sure to include DISTINCT though.