Predicting when a SE user will reach a given rep

For what follows, rep contains the reputation observations over the given sample (1.460 days).

Fitting a linear trend on the data eg lmf=LiearModelFit[rep,t,t] produces the following fitted function (evaluating lmf["Function"][t] produces):

250144. + 258.184 t

t here are 'days' but that is not made explicit in the estimation; plugging in t=1460 returns 627092.; that is the estimated reputation for the last point in the sample (the actual value is 622032).

Also, according to this linear model, the reputation target trep=654321 will be obtained on day 1566 or 106 days after the end of our sample. We obtain the point estimate of a reputation value that is equal to or greater to the target figure using the follow snippet of code:

Reap[
  Scan[
    If[lmf[#] >= trep, Sow[#]; Return[]] &, 1460 + Range[365]]]  

According to this, the target figure will be obtained on

DatePlus[{2018, 1, 5}, {106, "Day"}]
{2018, 4, 21}

or as of late April, 2018.

Now, all the above are great and all, with a minor glitch; take a look at the residuals from the estimated model:

enter image description here

This is what a major autocorrelation infestation looks like. The presence of autocorrelation in the residuals can be interpreted in different ways; for the present case we'll go with model mis-specification ie there is a systematic part in the data that (systematically?) eludes our (simple, linear trend) model.

The linear model has persistent positive residual autocorrelation.

enter image description here

In order to find a better specified model, we'll turn to Mathematica's TimeSeriesModelFit. For the rest of this segment, tsrep stands for

d1 = {2018, 1, 5};
delta = 1460;
d0 = DatePlus[d1, {-delta + 1, "Day"}]

tsrep = TimeSeries[rep, {d0, Automatic, "Day"}, 
  MetaInformation -> {"User" -> "TeX.SE@egreg", "Units" -> "Reputation"}]

the TimeSeries representation of rep.

This is what the reputation data look like

xlbl = Row[{delta, " ", "days of data"}];
tspec = {"Year", "/", "Month", "/","Day"};
from = DateString[tsrep["FirstTime"], tspec];
to = DateString[tsrep["LastTime"], tspec];
txt = "data provided by user Mma.SE@anderstood";

DateListPlot[tsrep , 
  Filling -> Axis, 
  FrameLabel -> {
    {tsrep["Units"], Automatic}, 
    {xlbl, Row[{from, "-", to}]}
   }, 
  PlotLegends -> tsrep["User"], 
  Epilog -> {Text[Style[txt, Darker@Red, Bold, Italic], Scaled[{0.5, 0.1}]]}]

enter image description here

The data display a definite trend-which we'll assume is deterministic; this means that the reputation series as it stands is a non-stationary series. This observation is confirmed by the outcome of performing appropriate unit-root tests:

headers = Style[#, Bold] & /@ {"Lags", "Test", "Test statistic", 
  "P-value", "Test conclusion"};
Prepend[Prepend[
  UnitRootTest[rep, 
    {"Drift", #}, 
    {"AutomaticTest", "TestStatistic", "PValue", "ShortTestConclusion"}
   ], #] & /@ {1, 2, 7, 15, 30}, headers] // Grid

enter image description here

The null of the presence of a unit root cannot be rejected for various specifications of the test.

We are going to split our data into two parts: the first one we'll use for estimation purposes and the later part will be set aside for testing purposes. We choose to split the data at {2017, 10, 8} (last 90 days of data).

delta = 90;
d2 = DatePlus[d1, {-delta + 1, "Day"}]
excludeLast90dDays = TimeSeriesWindow[tsrep, {Automatic, d2}];
last90Days = TimeSeriesWindow[tsrep, {d2, Automatic}];

We let Mathematica select an appropriate model for the first part of our data

mx90 = TimeSeriesModelFit[excludeLast90dDays];

The selected model is an ARIMAProcess[-0.034365, {-0.0386807, -0.0263645, 0.0195861}, 2, {-0.964332}, 1528.1]. The residuals obtained from this model as well as their scatter plots at various lags are reported below.

enter image description here

enter image description here

The ACF's, PACF's and Ljung-Box don't seem terribly bad also.

enter image description here

This time, the model predicts that trep will be obtained

Reap[Scan[
 If[mx90[#] >= trep, Sow[#]; Return[]] &, 
  DateRange[d2, DatePlus[d1, {365, "Day"}], "Day"]]
]

on

{2018, 5, 25}.

The predicted date along with appropriate 95% bounds according to the ARIMA model is presented below

enter image description here


code

(* used to obtain lags of residuals *)
pair[list_, lag_] := {Drop[list, -lag], Drop[list, lag]}

Clear[u, t]
(* used to scatter-plot residuals *)
plot[list_, lag_] := ListPlot[
  pair[list, lag] // Transpose, 
  Frame -> True, 
  FrameLabel -> {
    {Subscript[u, t], Automatic}, 
    {Subscript[u, t - lag],Row[{"lag = ", lag, " ", "day(s)"}]}
   }, 
  Axes -> False, 
  ImageSize -> Small, 
  RotateLabel -> False]

Data in the format {date, reputation}...:

timeddata = Transpose[{Range[Length[data]], data}];

A quadratic fit:

mymodel = Fit[timeddata, {1, x, x^2}, x]

(* 245614. + 276.773 x - 0.0127237 x^2 *)

Solve for criterion:

Solve[245614.0277534589` + 276.7728073237194` x - 
   0.012723685322656041` x^2 == 654321, x]

(* {{x -> 1593.41}, {x -> 20159.2}} *)

The first solution is the correct one, of course: 1593 days from the beginning of the data collection.

A graph:

Show[
 ListPlot[hhh, PlotStyle -> Red, 
  PlotRange -> {{0, 2000}, {0, 700000}}],
 Plot[{654321, 
   245614.0277534589` + 276.7728073237194` x - 
    0.012723685322656041` x^2}, {x, 1, 2000}],
 Epilog -> {Dashed, Line[{{1593, 0}, {1593, 654321}}]}]

enter image description here

You can add cubic or quartic terms, but it doesn't change the predicted date enough to matter--the cubit term in the fit is less than $10^{-5}$.