How to calculate the slope in SQL

If you're defining slope as just the slope from the earliest point to the latest point, and if score only increases with date, then you can get the output above with this:

SELECT *
  FROM scores
  JOIN
    (SELECT foo.keyword,
            (MAX(score)-MIN(score)) / DATEDIFF(MAX(date),MIN(date)) AS score
     FROM scores
     GROUP BY keyword) a
  USING(keyword);

However if you want linear regression, or if scores can decrease as well as increase with time, you'll need something more complex.


The cleanest one I could make:

SELECT
    Scores.Date, Scores.Keyword, Scores.Score,
    (N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope
FROM Scores
INNER JOIN (
    SELECT
        Keyword,
        COUNT(*) AS N,
        SUM(CAST(Date as float)) AS Sum_X,
        SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2,
        SUM(Score) AS Sum_Y,
        SUM(CAST(Date as float) * Score) AS Sum_XY
    FROM Scores
    GROUP BY Keyword
) G ON G.Keyword = Scores.Keyword;

It uses Simple Linear Regression to calculate the slope.

Result:

Date         Keyword        Score         Slope
2012-01-22   water bottle   0,010885442   0,000334784345222076
2012-01-23   water bottle   0,011203949   0,000334784345222076
2012-01-24   water bottle   0,008460835   0,000334784345222076
2012-01-25   water bottle   0,010363991   0,000334784345222076
2012-01-26   water bottle   0,011800716   0,000334784345222076
2012-01-27   water bottle   0,012948411   0,000334784345222076
2012-01-28   water bottle   0,012732459   0,000334784345222076
2012-01-29   water bottle   0,011682568   0,000334784345222076

Every database system seems to have a different approach to converting dates to numbers:

  • MySQL: TO_SECONDS(date) or TO_DAYS(date)
  • Oracle: TO_NUMBER(TO_CHAR(date, 'J')) or date - TO_DATE('1','yyyy')
  • MS SQL Server: CAST(date AS float) (or equivalent CONVERT)