Simpler way to fill date gaps with zero values

Without reading Leonid's answer (which is probably better) I recommend something like this:

fillDates[dates_] :=
 Module[{f, all},
  all = Part[DateList /@ (Range[##, 24*60^2] & @@ 
       AbsoluteTime /@ dates[[{1, -1}, 1]]), All, {1, 2, 3}];
  (f[#[[1]]] = #) & ~Scan~ dates;
  f[x_] := {x, 0};
  f /@ all
 ]

fillDates @ {{{2012, 1, 1}, 1}, {{2012, 1, 2}, 2}, {{2012, 1, 5}, 3}, {{2012, 1, 8}, 4}}
{{{2012, 1, 1}, 1}, {{2012, 1, 2}, 2}, {{2012, 1, 3}, 0},
 {{2012, 1, 4}, 0}, {{2012, 1, 5}, 3}, {{2012, 1, 6}, 0},
 {{2012, 1, 7}, 0}, {{2012, 1, 8}, 4}}

I believe the method is sound, and should be fast, but I haven't tuned it at all or even compared it with your own. I'll try to refine it later tonight or tomorrow.


Improved version

fillDates2[dates_] :=
  {#, Replace[#, Dispatch@Append[Rule @@@ dates, _ -> 0], {1}]}\[Transpose] & @
    Part[DateList /@ Range[##, 24*60^2] & @@ AbsoluteTime /@ dates[[{1, -1}, 1]], All, ;; 3]

Timings versus other methods posted

genDates = {#, RandomInteger[{1, 9}]} & /@ 
    Union @ Part[DateList /@
      RandomInteger[AbsoluteTime /@ {{#, 1, 1}, {2012, 12, 31}}, {#2}],
         All, ;; 3] &;

time100 = Function[, First@AbsoluteTiming@Do[#, {100}]/100, HoldFirst];

dates = genDates[2006, 1500]; (* dense data *)

fillDates2 @ dates // time100

fillDateGapsJ @ dates // time100

fillDatesJM @ dates // AbsoluteTiming // First

fillGapsRM @ dates // AbsoluteTiming // First

0.004970284

0.006330362

1.9541118

1.0810618

dates = genDates[2000, 50]; (* sparse data *)

fillDates2 @ dates // time100

fillDateGapsJ @ dates // time100

fillDatesJM @ dates // AbsoluteTiming // First

fillGapsRM @ dates // AbsoluteTiming // First

0.007540432

0.007910453

1.7681011

1.7300989


Top-level solution based on recursion

I suggest a solution based on linked lists and recursion. It will not be blazing fast, but I think it is conceptually rather simple. Here is the code:

Clear[toLinkedList];
toLinkedList[lst_] := Fold[ll[#2, #1] &, ll[], Reverse@lst]

ClearAll[fillGaps];
fillGaps[dates_] := 
    Block[{$IterationLimit = Infinity}, 
       fillGaps[ll[], toLinkedList[dates]]];

fillGaps[accum_, ll[val : {d_, _}, tail : ll[{dn_, _}, _ll]]] :=
   With[{nxt = DatePlus[d, 1]},
     fillGaps[
       If[nxt === dn, ll[accum, val], accum],
       If[nxt === dn, tail, ll[val, ll[{nxt, 0}, tail]]]
     ]];

fillGaps[accum_, ll[val_, ll[]]] := 
   Append[List @@ Flatten[accum, Infinity, ll], val];

The logic is straightforward: if we have two consecutive dates, we add the first to the linked list of accumulated results, and remove it from the remaining list of dates. If not, we insert an extra date adjacent to the first one, after the first one, and repeat. For those who are wondering why I have duplicate code with the comparisons inside If statements, this is needed to make the function properly tail-recursive in Mathematica sense.

Here is the usage:

fillGaps[dateList]

(*
    {{{2012, 1, 1}, 1}, {{2012, 1, 2}, 2}, {{2012, 1, 3}, 0}, {{2012, 1, 4}, 0},
    {{2012, 1, 5}, 3}, {{2012, 1, 6}, 0}, {{2012, 1, 7}, 0}, {{2012, 1, 8}, 4}}
*)

My main message here is to not measure the simplicity necessarily by lines of code. This problem is a look-ahead type problem, and therefore linked lists and recursion seem a natural vehicle for solving it. OTOH, fitting it into the dominant Mathematica execution model where lists are operated on as a whole is of course possible, but IMO rather inelegant and indirect.

Java solution

Addressing your speed request in your edit, here is a Java solution (be sure to load the Java reloader first, along the steps described e.g. here:

JCompileLoad@"import java.util.*;

   public class DateGapFiller{  
       public List<int[]> newDates = new ArrayList<int[]>();
       public List<Double>  newVals = new ArrayList<Double>();      

       public DateGapFiller(int[][] dates, double[] values){
         if(dates.length==0) return;
         Calendar c = Calendar.getInstance();       
         c.set(dates[0][0],dates[0][1]-1,dates[0][2]);
         newVals.add(values[0]);
         newDates.add(dates[0]);        
         for(int i = 1, ctr = 0; i<dates.length;ctr++){    
            c.add(Calendar.DATE,1);
            int y = c.get(Calendar.YEAR);
            int m = c.get(Calendar.MONTH)+1;
            int d = c.get(Calendar.DAY_OF_MONTH);
            int[] newDate = new int[]{y,m,d};
            double newVal = 0;
            if(dates[i][0]== y && dates[i][1] == m && dates[i][2] == d){
               newDate = dates[i];
               newVal = values[i];
               i++;
            } 
            newVals.add(newVal);
            newDates.add(newDate);          
         }          
       }
   }"

The top - level code is

ClearAll[fillDateGapsJ];
fillDateGapsJ[dates_List] :=
  Block[{newDates, newVals, toArray},
    JavaBlock[
      With[{res = JavaNew["DateGapFiller", Sequence @@ Transpose[dates]]},
        Transpose[res[#][toArray[]] & /@ {newDates, newVals}]
      ]]];

The usage is the same:

fillDateGapsJ[dateList]

The speed comparison:

dates =
    NestList[
      {DatePlus[#[[1]],RandomInteger[{1,5}]],RandomInteger[10]}&, 
      {{2000,1,1},1},
      1000
    ];

(filled1= fillDateGaps[dates]);//AbsoluteTiming
(filled2= fillDateGapsJ[dates]);//AbsoluteTiming
filled2 == filled1

(*
   {2.4482422,Null}
   {0.0751953,Null}
   True
*)

So you get about 30x speedup.

Remark

You may actually want to develop some custom data structure for "gapped dates" and a relevant custom plotting routine, as an alternative to all this. Which way to go depends on what you want to do with your data,of course.


TemporalData & ResamplingMethod

dateList = {{{2012, 1, 1}, 1},{{2012, 1, 2}, 2},{{2012, 1, 5}, 3},{{2012, 1, 8}, 4}};

td = TemporalData[#2, {#}, ResamplingMethod -> {"Constant", 0}]& @@ Transpose[dateList];

daterange = AbsoluteTime /@ DateRange[dateList[[1, 1]], dateList[[-1, 1]], {1, "Day"}];

DateListPlot[td["PathFunction"] /@ daterange, dateList[[1,1]], 
 Epilog -> {PointSize[Large], Red, Point @ dateList},
 FrameTicks -> {{Automatic, Automatic}, {dateList[[All,1]], Automatic}}]

enter image description here

You can also create a new TemporalData and plot it:

td2 = TemporalData[td["PathFunction"]/@daterange, {daterange}];
DateListPlot[td2, Epilog -> {PointSize[Large], Red, Point @ dateList},
 FrameTicks -> {{Automatic, Automatic}, {dateList[[All,1]], Automatic}}]

same picture