Linear regression with postgres

This is the combination of Joop's statistics and Denis's window functions:

WITH num AS (
        SELECT id, idstation
        , (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
        , value AS value
        FROM thedata
        )
        -- id + the ids of the {prev,next} records
        --  within the same idstation group
, drag AS (
        SELECT id AS center
                , LAG(id) OVER www AS prev
                , LEAD(id) OVER www AS next
        FROM thedata
        WINDOW www AS (partition by idstation ORDER BY id)
        )
        -- junction CTE between ID and its three feeders
, tri AS (
                  SELECT center AS this, center AS that FROM drag
        UNION ALL SELECT center AS this , prev AS that FROM drag
        UNION ALL SELECT center AS this , next AS that FROM drag
        )
SELECT  t.this, n.idstation
        , regr_intercept(value,idate) AS intercept
        , regr_slope(value,idate) AS slope
        , regr_r2(value,idate) AS rsq
        , regr_avgx(value,idate) AS avgx
        , regr_avgy(value,idate) AS avgy
FROM num n
JOIN tri t ON t.that = n.id
GROUP BY t.this, n.idstation
        ;

Results:

Click to copy

INSERT 0 7
 this | idstation |     intercept     |       slope       |        rsq        |       avgx       |       avgy       
------+-----------+-------------------+-------------------+-------------------+------------------+------------------
    1 |        12 |               -46 |                 1 |                 1 |               52 |                6
    2 |        12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
    3 |        12 | -10.6666666666667 | 0.333333333333333 |                 1 |             54.5 |              7.5
    4 |        14 |                   |                   |                   |               51 |                9
    5 |        15 |                   |                   |                   |               51 |               15
    6 |        18 |                   |                   |                   |               51 |               14
    7 |        19 |                   |                   |                   |               51 |              200
(7 rows)

The clustering of the group-of-three can probably be done more elegantly using a rank() or row_number() function, which would also allow larger sliding windows to be used.

Linear regression with postgres

Tags:

Datetime

Postgresql

Linear Regression

Regression

Related

Recent Posts