Python: Finding a trend in a set of numbers

The Link provided by Keith or probably the answer from Riaz might help you to get the poly fit, but it is always recommended to use libraries if available, and for the problem in your hand, numpy provides a wonderful polynomial fit function called polyfit . You can use polyfit to fit the data over any degree of equation.

Here is an example using numpy to fit the data in a linear equation of the form y=ax+b

>>> data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> x = np.arange(0,len(data))
>>> y=np.array(data)
>>> z = np.polyfit(x,y,1)
>>> print "{0}x + {1}".format(*z)
4.32527472527x + 17.6
>>> 

similarly a quadratic fit would be

>>> print "{0}x^2 + {1}x + {2}".format(*z)
0.311126373626x^2 + 0.280631868132x + 25.6892857143
>>> 

Possibly you mean you want to plot these numbers on a graph and find a straight line through them where the overall distance between the line and the numbers is minimized? This is called a linear regression

def linreg(X, Y):
    """
    return a,b in solution to y = ax + b such that root mean square distance between trend line and original points is minimized
    """
    N = len(X)
    Sx = Sy = Sxx = Syy = Sxy = 0.0
    for x, y in zip(X, Y):
        Sx = Sx + x
        Sy = Sy + y
        Sxx = Sxx + x*x
        Syy = Syy + y*y
        Sxy = Sxy + x*y
    det = Sxx * N - Sx * Sx
    return (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det


x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
a,b = linreg(range(len(x)),x)  //your x,y are switched from standard notation

The trend line is unlikely to pass through your original points, but it will be as close as possible to the original points that a straight line can get. Using the gradient and intercept values of this trend line (a,b) you will be able to extrapolate the line past the end of the array:

extrapolatedtrendline=[a*index + b for index in range(20)] //replace 20 with desired trend length

Here is one way to get an increasing/decreasing trend:

>>> x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> trend = [b - a for a, b in zip(x[::1], x[1::1])]
>>> trend
[22, -5, 9, -4, 17, -22, 5, 13, -13, 21, 39, -26, 13]

In the resulting list trend, trend[0] can be interpreted as the increase from x[0] to x[1], trend[1] would be the increase from x[1] to x[2] etc. Negative values in trend mean that value in x decreased from one index to the next.

Tags:

Python

Math