how to calculate difference of first row and last row fields value in each group
You are not showing the query you are using to obtain the results without diff
. I'm assuming it is something like this:
SELECT
min = MIN(Value),
max = MAX(Value),
avg = AVG(Value), -- or, if Value is an int, like this, perhaps:
-- AVG(CAST(Value AS decimal(10,2))
Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
Also, you are not explaining what first and last mean. In this answer, it is assumed that first stands for earliest in the group (according to the Date
value) and, similarly, last means latest in the group.
One way to throw in diff
could be like this:
First, add two more aggregated columns, minDate
and maxDate
, to the original query:
SELECT
min = MIN(Value),
max = MAX(Value),
avg = AVG(Value),
minDate = MIN(Date),
maxDate = MAX(Date),
Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
Next, join the aggregated result set back to the original table on minDate
and on maxDate
(separately) to access the corresponding Value
s:
SELECT
g.min,
g.max,
g.avg,
diff = last.Value - first.Value,
g.Date
FROM (
SELECT
min = MIN(Value),
max = MAX(Value),
avg = AVG(Value),
minDate = MIN(Date),
maxDate = MAX(Date),
Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last ON last .Date = g.maxDate
;
Note that the above assumes the Date
values (at least those that happen to be first or last in their corresponding hours) to be unique, or you would get more than one row for some of the hours in the output.
An alternative, if you are on SQL Server 2005 or later version, could be to use window aggregate functions MIN() OVER (...)
and MAX() OVER (...)
to calculate Value
s corresponding to either minDate
or maxDate
, before aggregating all the results similarly to how you are probably doing it now. Here's what specifically I'm talking about:
WITH partitioned AS (
SELECT
Value,
Date,
GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
)
, firstlast AS (
SELECT
Value,
Date,
GroupDate,
FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
LastValue = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
FROM partitioned
)
SELECT
min = MIN(Value),
max = MAX(Value),
avg = AVG(Value), -- or, again, if Value is an int, cast it as a decimal or float
diff = MAX(LastValue) - MIN(FirstValue),
Date = GroupDate
FROM firstlast
GROUP BY
GroupDate
;
As you can see, the first common table expression (CTE) merely returns all rows and adds a calculated column GroupDate
, the one subsequently used for grouping/partitioning. So it essentially just assigns a name to the grouping expression, and that is done for better readability/maintainability of the entire query, as the column is later referenced more than once. This is what the first CTE produces:
+-------+------------------+------------------+
| Value | Date | GroupDate |
+-------+------------------+------------------+
| 10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
| 11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
| 15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
| 15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
| 17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
| 18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
| 22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
| 30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+
The second CTE adds two more columns to the above result. It uses window aggregate functions MIN() OVER ...
and MAX() OVER ...
to match against Date
, and where the match takes place, the corresponding Value
is returned in a separate column, either FirstValue
or LastValue
:
+-------+------------------+------------------+------------+-----------+
| Value | Date | GroupDate | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
| 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | NULL |
| 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | NULL | NULL |
| 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | NULL | NULL |
| 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | NULL | 15 |
| 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | NULL |
| 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | NULL | NULL |
| 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | NULL | NULL |
| 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 30 |
+-------+------------------+------------------+------------+-----------+
At this point, everything is ready for the final aggregation. The min
, max
, and avg
columns will be aggregated same as previously, and diff
can now easily be obtained as the aggregated FirstValue
subtracted from the aggregated LastValue
. As you can see from the above result set, you can use various functions to get FirstValue
and LastValue
for the group: it could be MIN
, MAX
, SUM
, AVG
– any if these would do, because there's just one value in every group.
The main SELECT, however, as you can see, applies specifically MAX()
over LastValue
and MIN()
over FirstValue
. That is intentional. It's because this second suggestion doesn't really require Date
to be unique, like the first one did, but, in case either minDate
or maxDate
does happen to have more than one associated Value
, it would result in FirstValue
or LastValue
containing more than one value per group, something like this:
+-------+------------------+------------------+------------+-----------+
| Value | Date | GroupDate | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
| 9 | 10/10/2010 10:00 | 10/10/2010 10:00 | 9 | NULL |
| 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | NULL |
| 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | NULL | NULL |
| 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | NULL | NULL |
| 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | NULL | 15 |
| 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | NULL |
| 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | NULL | NULL |
| 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | NULL | NULL |
| 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 30 |
| 33 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 33 |
+-------+------------------+------------------+------------+-----------+
I assumed that in this situation it would be more natural to take the difference between the greatest last value and the least first one. You, however, should know better what rule to apply here, so you'll just change the query accordingly.
You can test both solutions at SQL Fiddle:
#1
#2
Update
Starting from SQL Server 2012, you can also use the FIRST_VALUE and LAST_VALUE functions and substitute them for the CASE expressions in the firstlast
CTE in my last query above, like this:
FirstValue = FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LastValue = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case it will not matter whether you use MIN or MAX over FirstValue
and LastValue
later (in the main SELECT): each column will have exactly the same value (first or last Value
accordingly) across all rows of the same GroupDate
group, and so MIN()
and MAX()
would return identical results in each case.
In fact, you can get diff
directly in the firstlast
CTE and then, in the main query, just either aggregate it using MIN/MAX or add it to GROUP BY and reference it without aggregation, like this:
WITH partitioned AS (
SELECT
Value,
Date,
GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
)
, firstlast AS (
SELECT
Value,
Date,
GroupDate,
diff = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM partitioned
)
SELECT
min = MIN(Value),
max = MAX(Value),
avg = AVG(Value),
diff,
Date = GroupDate
FROM firstlast
GROUP BY
GroupDate,
diff
;
Moving one step further, you could obtain min
, max
and avg
in firstlast
as well, instead of the main query – using the counterpart window functions:
min = MIN(Value) OVER (PARTITION BY GroupDate),
max = MAX(Value) OVER (PARTITION BY GroupDate),
avg = AVG(Value) OVER (PARTITION BY GroupDate),
With these three additional columns and the previous change, the firstlast
CTE will be returning a row set like this for your example:
+-------+------------------+------------------+-----+-----+-------+------+
| Value | Date | GroupDate | min | max | avg | diff |
+-------+------------------+------------------+-----+-----+-------+------+
| 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 |
| 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 |
| 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 |
| 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 |
| 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 |
| 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 |
| 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 |
| 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 |
+-------+------------------+------------------+-----+-----+-------+------+
Note how GroupDate
, min
, max
, avg
and diff
– the columns that you really need for the final set – are simply repeated across all rows that belong to the same group. That means that you can get rid of Value
and Date
, rename GroupDate
to Date
, slightly rearrange the columns, apply DISTINCT to the resulting set – and you have eliminated the last SELECT:
WITH partitioned AS (
SELECT
Value,
Date,
GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM
atable
)
SELECT DISTINCT
min = MIN(Value) OVER (PARTITION BY GroupDate),
max = MAX(Value) OVER (PARTITION BY GroupDate),
avg = AVG(Value) OVER (PARTITION BY GroupDate),
diff = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
Date = GroupDate
FROM
partitioned
;
Finally, it is also possible to move the GroupDate
calculation into the same scope where min
, max
etc. are calculated. You can use CROSS APPLY for that and thus avoid the need for nesting a query entirely – in other words, this way you can get rid of the partitioned
CTE as well. The entire query would look like this:
SELECT DISTINCT
min = MIN(t.Value) OVER (PARTITION BY x.GroupDate),
max = MAX(t.Value) OVER (PARTITION BY x.GroupDate),
avg = AVG(t.Value) OVER (PARTITION BY x.GroupDate),
diff = LAST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- FIRST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
Date = x.GroupDate
FROM
atable AS t
CROSS APPLY (SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)) AS x (GroupDate)
;
and return the same results. You can test it at SQL Fiddle too.