How to formally tell if one time series affects another?

Alex Isakov has a Granger Causality Test in his Economica Time Series package here:-

Mathematica Package Repository - Economica

I'm not very familiar with the details, but I ran some tests using data from here:-

Dave Giles' Blog - Testing for Granger Causality

I downloaded the example data from the Data page. Here it is stored as QR codes.

qrimage = BarcodeImage[Compress[datalist]];

Dates

enter image description here

Arabica data

enter image description here

Robusta data

enter image description here

{dates, arabica, robusta} = Map[
  Uncompress@BarcodeRecognize@Import[#] &,
  {"http://i.stack.imgur.com/KdrUd.png",
   "http://i.stack.imgur.com/hCbZV.png",
   "http://i.stack.imgur.com/J9ESY.png"}];

arabica = Last /@ 
   Select[Transpose[{DateList /@ dates, arabica}], #[[1, 1]] > 1975 &];
{dates, robusta} = Transpose[
   Select[Transpose[{DateList /@ dates, robusta}], #[[1, 1]] > 1975 &]];

DateListPlot[{Transpose[{dates, arabica}], 
  Transpose[{dates, robusta}]}, Joined -> True]

enter image description here

Alex Isakov's Granger Causality function

getLags[data_, i_] := Drop[data, -i]

GrangerCausalityTest[dat_, lag_: 3] := Module[
   {xx, x, y, laggeddata, le, res1, res2, grangerstat},
   le = Select[dat, (And @@ (NumericQ /@ #)) &]; 
   xx = Flatten[Most /@ le]; y = Last /@ le; 
   laggeddata = Transpose[Map[Take[#, -Length[y] + lag] &,
      Join[Table[getLags[y, i], {i, lag}],
       Table[getLags[xx, i], {i, lag}], {y}]]]; 
   res1 = Total[LinearModelFit[laggeddata,
        Table[Subscript[x, i],
         {i, Length[First[laggeddata]] - 1}],
        Table[Subscript[x, i],
         {i, Length[First[laggeddata]] - 1}]]["FitResiduals"]^2]; 
   res2 = Total[LinearModelFit[laggeddata,
        Table[Subscript[x, i],
         {i, (Length[First[laggeddata]] - 1)/2}],
        Table[Subscript[x, i],
         {i, Length[First[laggeddata]] - 1}]]["FitResiduals"]^2]; 
   grangerstat = ((res2 - res1)/lag)/(res1/(Length[laggeddata] - 2*lag - 1)); 
   1 - CDF[FRatioDistribution[lag, Length[laggeddata] - 2*lag - 1],
     grangerstat]];

Checking a range of lag intervals.

a = Array[GrangerCausalityTest[Transpose[{robusta, arabica}], #] &, 10];
b = Array[GrangerCausalityTest[Transpose[{arabica, robusta}], #] &, 10];
ListLinePlot[{a, b}]

enter image description here

GrangerCausalityTest@Transpose[{arabica, robusta}]

0.127273

indicating low significance of null hypothesis.

GrangerCausalityTest@Transpose[{robusta, arabica}]

0.8631

Quoting from Dave Giles' page:

"In summary, we have reasonable evidence of Granger causality from the price of Arabica coffee to the price of Robusta coffee, but not vice versa."

Check

Using Free Statistics Software with the coffee data, plotting the results for the first ten lags.

ListLinePlot[{
  {0.4541, 0.8248, 0.0696, 0.0454, 0.061,
   0.0445, 0.0201, 0.0052, 0.0039, 0.0307},
  {0.2605, 0.3755, 0.7118, 0.4239, 0.3455,
   0.2779, 0.1233, 0.0361, 0.0055, 0.0172},
  ConstantArray[0.05, 10]}, AxesOrigin -> {0, 0},
 Epilog -> Inset["P-value 0.05", {1.5, 0.08}]]

enter image description here

The results indicate a rejection of the hypothesis that the price of Arabica coffee does not affect the price of Robusta coffee.

The shape of the graph is similar to that produced by Alex's function, but clearly there are differences.

Additional note

The results from Alex's function match exactly the results from here:

MathGroup archive: implementation of Granger causality tests in Mathematica

dated Jan 2006, by Darren Glosemeyer, Wolfram Research.

(Runs in version 7 but not 10.4, requiring legacy Statistics package.)

Further reading

  • Chickens, Eggs, and Causality, or Which Came First?
  • Investigating Causal Relations by Econometric Models and Cross-spectral Methods
  • Granger Causality: Theory and Applications (Guo, Ladroue, Feng)

I've installed the package, ?GrangerCausalityTest' gives

GrangerCausalityTest[dat, lag] returns a p-value for the null hypothesis that the first element in dat does not cause the second element in dat.

We remind the definition of the p-value (from Wikipedia):

the p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. In this method, as part of experimental design, before performing the experiment, one first chooses a model (the null hypothesis) and a threshold value for p, called the significance level of the test, traditionally 5% or 1% and denoted as α. If the p-value is less than or equal to the chosen significance level (α), the test suggests that the observed data is inconsistent with the null hypothesis, so the null hypothesis must be rejected. However, that does not prove that the tested hypothesis is true. Nonetheless, it helps to clarify that p-values should not be confused with probability on hypothesis (as is done in Bayesian Hypothesis Testing) such as Pr(H|X), the probability of the hypothesis given the data, or Pr(H), the probability of the hypothesis being true, or Pr(X), the probability of observing the given data.

I tested with synthetic data the above function. Let's set α=5%. Therefore, if the output of GrangerCausalityTest[dat, lag] is <0.05, then we reject the hypothesis that the first element in dat does not cause the second element in dat. On the other hand if the output is > 0.05, then the we cannot reject the hypothesis that the first element in dat does not cause the second element in dat.

First I've generated two time-series (ts1,ts2), with unilateral causation ts1 -> ts2, i.e. the first time series causes the second one, but not viceversa.

ts1=Range[1,100]; 
ts2=Table[x + RandomReal[{-x/3, x/3}], {x, ts1}];

Applying GrangerCausality Test on these data, I found:

In[1]:= GrangerCausalityTest[Transpose[{ts2, ts1}], 1]

Out[1]= 1

In[2]:= GrangerCausalityTest[Transpose[{ts1, ts2}], 1]

Out[2]= 1.11022*10^-15

Thus, as the help function explains, the results should be interpreted as

The null hypothesis that ts2 does not cause ts1 is not rejected. Indeed, this is consistent with our generated data, as we have impose an unilateral causation ts1 -> ts2

The null hypothesis that ts1 does not cause ts2 is rejected. Indeed we know that ts1->ts2.

Therefore, with respect the coffee example above, the correct interpretation of the result is

The Granger causality test gives no evidence of causality from the price of Arabica coffee to the price of Robusta coffee, and vice versa.