how to calculate difference of first row and last row fields value in each group

You are not showing the query you are using to obtain the results without diff. I'm assuming it is something like this:

SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, if Value is an int, like this, perhaps:
                      -- AVG(CAST(Value AS decimal(10,2))
  Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;

Also, you are not explaining what first and last mean. In this answer, it is assumed that first stands for earliest in the group (according to the Date value) and, similarly, last means latest in the group.

One way to throw in diff could be like this:

First, add two more aggregated columns, minDate and maxDate, to the original query:

SELECT
  min     = MIN(Value),
  max     = MAX(Value),
  avg     = AVG(Value),
  minDate = MIN(Date),
  maxDate = MAX(Date),
  Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;

Next, join the aggregated result set back to the original table on minDate and on maxDate (separately) to access the corresponding Values:

SELECT
  g.min,
  g.max,
  g.avg,
  diff = last.Value - first.Value,
  g.Date
FROM (
  SELECT
    min     = MIN(Value),
    max     = MAX(Value),
    avg     = AVG(Value),
    minDate = MIN(Date),
    maxDate = MAX(Date),
    Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
  GROUP BY
    DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last  ON last .Date = g.maxDate
;

Note that the above assumes the Date values (at least those that happen to be first or last in their corresponding hours) to be unique, or you would get more than one row for some of the hours in the output.

An alternative, if you are on SQL Server 2005 or later version, could be to use window aggregate functions MIN() OVER (...) and MAX() OVER (...) to calculate Values corresponding to either minDate or maxDate, before aggregating all the results similarly to how you are probably doing it now. Here's what specifically I'm talking about:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
    LastValue  = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, again, if Value is an int, cast it as a decimal or float
  diff = MAX(LastValue) - MIN(FirstValue),
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate
;

As you can see, the first common table expression (CTE) merely returns all rows and adds a calculated column GroupDate, the one subsequently used for grouping/partitioning. So it essentially just assigns a name to the grouping expression, and that is done for better readability/maintainability of the entire query, as the column is later referenced more than once. This is what the first CTE produces:

+-------+------------------+------------------+
| Value |       Date       |    GroupDate     |
+-------+------------------+------------------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+

The second CTE adds two more columns to the above result. It uses window aggregate functions MIN() OVER ... and MAX() OVER ... to match against Date, and where the match takes place, the corresponding Value is returned in a separate column, either FirstValue or LastValue:

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
+-------+------------------+------------------+------------+-----------+

At this point, everything is ready for the final aggregation. The min, max, and avg columns will be aggregated same as previously, and diff can now easily be obtained as the aggregated FirstValue subtracted from the aggregated LastValue. As you can see from the above result set, you can use various functions to get FirstValue and LastValue for the group: it could be MIN, MAX, SUM, AVG – any if these would do, because there's just one value in every group.

The main SELECT, however, as you can see, applies specifically MAX() over LastValue and MIN() over FirstValue. That is intentional. It's because this second suggestion doesn't really require Date to be unique, like the first one did, but, in case either minDate or maxDate does happen to have more than one associated Value, it would result in FirstValue or LastValue containing more than one value per group, something like this:

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|     9 | 10/10/2010 10:00 | 10/10/2010 10:00 |          9 |      NULL |
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
|    33 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        33 |
+-------+------------------+------------------+------------+-----------+

I assumed that in this situation it would be more natural to take the difference between the greatest last value and the least first one. You, however, should know better what rule to apply here, so you'll just change the query accordingly.

You can test both solutions at SQL Fiddle:

Update

Starting from SQL Server 2012, you can also use the FIRST_VALUE and LAST_VALUE functions and substitute them for the CASE expressions in the firstlast CTE in my last query above, like this:

FirstValue = FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LastValue  = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

In this case it will not matter whether you use MIN or MAX over FirstValue and LastValue later (in the main SELECT): each column will have exactly the same value (first or last Value accordingly) across all rows of the same GroupDate group, and so MIN() and MAX() would return identical results in each case.

In fact, you can get diff directly in the firstlast CTE and then, in the main query, just either aggregate it using MIN/MAX or add it to GROUP BY and reference it without aggregation, like this:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
         - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),
  diff,
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate,
  diff
;

Moving one step further, you could obtain min, max and avg in firstlast as well, instead of the main query – using the counterpart window functions:

min  = MIN(Value) OVER (PARTITION BY GroupDate),
max  = MAX(Value) OVER (PARTITION BY GroupDate),
avg  = AVG(Value) OVER (PARTITION BY GroupDate),

With these three additional columns and the previous change, the firstlast CTE will be returning a row set like this for your example:

+-------+------------------+------------------+-----+-----+-------+------+
| Value |       Date       |    GroupDate     | min | max |  avg  | diff |
+-------+------------------+------------------+-----+-----+-------+------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
+-------+------------------+------------------+-----+-----+-------+------+

Note how GroupDate, min, max, avg and diff – the columns that you really need for the final set – are simply repeated across all rows that belong to the same group. That means that you can get rid of Value and Date, rename GroupDate to Date, slightly rearrange the columns, apply DISTINCT to the resulting set – and you have eliminated the last SELECT:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM
    atable
)
SELECT DISTINCT
  min  = MIN(Value) OVER (PARTITION BY GroupDate),
  max  = MAX(Value) OVER (PARTITION BY GroupDate),
  avg  = AVG(Value) OVER (PARTITION BY GroupDate),
  diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  Date = GroupDate
FROM
  partitioned
;

Finally, it is also possible to move the GroupDate calculation into the same scope where min, max etc. are calculated. You can use CROSS APPLY for that and thus avoid the need for nesting a query entirely – in other words, this way you can get rid of the partitioned CTE as well. The entire query would look like this:

SELECT DISTINCT
  min  = MIN(t.Value) OVER (PARTITION BY x.GroupDate),
  max  = MAX(t.Value) OVER (PARTITION BY x.GroupDate),
  avg  = AVG(t.Value) OVER (PARTITION BY x.GroupDate),
  diff = LAST_VALUE(t.Value)  OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - FIRST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  Date = x.GroupDate
FROM
  atable AS t
  CROSS APPLY (SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)) AS x (GroupDate)
;

and return the same results. You can test it at SQL Fiddle too.

how to calculate difference of first row and last row fields value in each group

Tags:

Sql Server

Group By

Aggregate

T Sql

Related

Recent Posts