Is there easy way in python to extrapolate data points to the future?

A simple way of doing extrapolations is to use interpolating polynomials or splines: there are many routines for this in scipy.interpolate, and there are quite easy to use (just give the (x, y) points, and you get a function [a callable, precisely]).

Now, as as been pointed in this thread, you cannot expect the extrapolation to be always meaningful (especially when you are far from your data points) if you don't have a model for your data. However, I encourage you to play with the polynomial or spline interpolations from scipy.interpolate to see whether the results you obtain suit you.


The mathematical models are the way to go in this case. For instance, if you have only three data points, you can have absolutely no indication on how the trend will unfold (could be any of two parabola.)

Get some statistics courses and try to implement the algorithms. Try Wikibooks.


You have to swpecify over which function you need extrapolation. Than you can use regression http://en.wikipedia.org/wiki/Regression_analysis to find paratmeters of function. And extrapolate this in future.

For instance: translate dates into x values and use first day as x=0 for your problem the values shoul be aproximatly (0,1.2), (400,1.8),(900,5.3)

Now you decide that his points lies on function of type a+bx+cx^2

Use the method of least squers to find a,b and c http://en.wikipedia.org/wiki/Linear_least_squares (i will provide full source, but later, beacuase I do not have time for this)


It's all too easy for extrapolation to generate garbage; try this. Many different extrapolations are of course possible; some produce obvious garbage, some non-obvious garbage, many are ill-defined.

alt text

""" extrapolate y,m,d data with scipy UnivariateSpline """
import numpy as np
from scipy.interpolate import UnivariateSpline
    # pydoc scipy.interpolate.UnivariateSpline -- fitpack, unclear
from datetime import date
from pylab import *  # ipython -pylab

__version__ = "denis 23oct"


def daynumber( y,m,d ):
    """ 2005,1,1 -> 0  2006,1,1 -> 365 ... """
    return date( y,m,d ).toordinal() - date( 2005,1,1 ).toordinal()

days, values = np.array([
    (daynumber(2005,1,1), 1.2 ),
    (daynumber(2005,4,1), 1.8 ),
    (daynumber(2005,9,1), 5.3 ),
    (daynumber(2005,10,1), 5.3 )
    ]).T
dayswanted = np.array([ daynumber( year, month, 1 )
        for year in range( 2005, 2006+1 )
        for month in range( 1, 12+1 )])

np.set_printoptions( 1 )  # .1f
print "days:", days
print "values:", values
print "dayswanted:", dayswanted

title( "extrapolation with scipy.interpolate.UnivariateSpline" )
plot( days, values, "o" )
for k in (1,2,3):  # line parabola cubicspline
    extrapolator = UnivariateSpline( days, values, k=k )
    y = extrapolator( dayswanted )
    label = "k=%d" % k
    print label, y
    plot( dayswanted, y, label=label  )  # pylab

legend( loc="lower left" )
grid(True)
savefig( "extrapolate-UnivariateSpline.png", dpi=50 )
show()

Added: a Scipy ticket says, "The behavior of the FITPACK classes in scipy.interpolate is much more complex than the docs would lead one to believe" -- imho true of other software doc too.