Smart progress bar ETA computation

Here's what I've found works well! For the first 50% of the task, you assume the rate is constant and extrapolate. The time prediction is very stable and doesn't bounce much.

Once you pass 50%, you switch computation strategy. You take the fraction of the job left to do (1-p), then look back in time in a history of your own progress, and find (by binary search and linear interpolation) how long it's taken you to do the last (1-p) percentage and use that as your time estimate completion.

So if you're now 71% done, you have 29% remaining. You look back in your history and find how long ago you were at (71-29=42%) completion. Report that time as your ETA.

This is naturally adaptive. If you have X amount of work to do, it looks only at the time it took to do the X amount of work. At the end when you're at 99% done, it's using only very fresh, very recent data for the estimate.

It's not perfect of course but it smoothly changes and is especially accurate at the very end when it's most useful.


Whilst all the examples are valid, for the specific case of 'time left to download', I thought it would be a good idea to look at existing open source projects to see what they do.

From what I can see, Mozilla Firefox is the best at estimating the time remaining.

Mozilla Firefox

Firefox keeps a track of the last estimate for time remaining, and by using this and the current estimate for time remaining, it performs a smoothing function on the time. See the ETA code here. This uses a 'speed' which is previously caculated here and is a smoothed average of the last 10 readings.

This is a little complex, so to paraphrase:

  • Take a smoothed average of the speed based 90% on the previous speed and 10% on the new speed.
  • With this smoothed average speed work out the estimated time remaining.
  • Use this estimated time remaining, and the previous estimated time remaining to created a new estimated time remaining (in order to avoid jumping)

Google Chrome

Chrome seems to jump about all over the place, and the code shows this.

One thing I do like with Chrome though is how they format time remaining. For > 1 hour it says '1 hrs left' For < 1 hour it says '59 mins left' For < 1 minute it says '52 secs left'

You can see how it's formatted here

DownThemAll! Manager

It doesn't use anything clever, meaning the ETA jumps about all over the place.

See the code here

pySmartDL (a python downloader)

Takes the average ETA of the last 30 ETA calculations. Sounds like a reasonable way to do it.

See the code here/blob/916f2592db326241a2bf4d8f2e0719c58b71e385/pySmartDL/pySmartDL.py#L651)

Transmission

Gives a pretty good ETA in most cases (except when starting off, as might be expected).

Uses a smoothing factor over the past 5 readings, similar to Firefox but not quite as complex. Fundamentally similar to Gooli's answer.

See the code here


Original Answer

The company that created this site apparently makes a scheduling system that answers this question in the context of employees writing code. The way it works is with Monte Carlo simulation of future based on the past.

Appendix: Explanation of Monte Carlo

This is how this algorithm would work in your situation:

You model your task as a sequence of microtasks, say 1000 of them. Suppose an hour later you completed 100 of them. Now you run the simulation for the remaining 900 steps by randomly selecting 90 completed microtasks, adding their times and multiplying by 10. Here you have an estimate; repeat N times and you have N estimates for the time remaining. Note the average between these estimates will be about 9 hours -- no surprises here. But by presenting the resulting distribution to the user you'll honestly communicate to him the odds, e.g. 'with the probability 90% this will take another 3-15 hours'

This algorithm, by definition, produces complete result if the task in question can be modeled as a bunch of independent, random microtasks. You can gain a better answer only if you know how the task deviates from this model: for example, installers typically have a download/unpacking/installing tasklist and the speed for one cannot predict the other.

Appendix: Simplifying Monte Carlo

I'm not a statistics guru, but I think if you look closer into the simulation in this method, it will always return a normal distribution as a sum of large number of independent random variables. Therefore, you don't need to perform it at all. In fact, you don't even need to store all the completed times, since you'll only need their sum and sum of their squares.

In maybe not very standard notation,

sigma = sqrt ( sum_of_times_squared-sum_of_times^2 )
scaling = 900/100          // that is (totalSteps - elapsedSteps) / elapsedSteps
lowerBound = sum_of_times*scaling - 3*sigma*sqrt(scaling)
upperBound = sum_of_times*scaling + 3*sigma*sqrt(scaling)

With this, you can output the message saying that the thing will end between [lowerBound, upperBound] from now with some fixed probability (should be about 95%, but I probably missed some constant factor).