Simple Popularity Algorithm

I think this is a very good approach, given its simplicity. A very interesting result.

I made a quick set of calculations and found that this algorithm does seem to understand what "popularity" means. Its problem is that it has a clear tendency to favor recent votes like this:

Imagine we take the time and break it into discrete timestamp values ranging from 100 to 1000. Assume that at t=100 both A and B (two items) have the same P = 100.

    A gets voted 7 times on 200, 300, 400, 500, 600, 700 and 800
resulting on a final Pa(800) = 700 (aprox).

    B gets voted 4 times on 300, 500, 700 and 900 
resulting on a final Pb(900) = 712 (aprox).

When t=1000 comes, both A and B receive votes, so:

Pa(1000) = 850 with 8 votes
Pb(1000) = 856 with 5 votes

Why? because the algorithm allows an item to quickly beat historical leaders if it receives more recent votes (even if the item has fewer votes in total).

EDIT INCLUDING SIMULATION

The OP created a nice fiddle that I changed to get the following results:

http://jsfiddle.net/wBV2c/6/

Item A receives one vote each day from 1970 till 2012 (15339 votes)
Item B receives one vote each month from Jan to Jul 2012 (7 votes)

The result: B is more popular than A.


The proposed algorithm is a good approach, and is a special case of an Exponential Moving Average where alpha=0.5:

p = alpha*p + (1-alpha)*t = 0.5*p + 0.5*t = (p+t)/2    //(for alpha = 0.5)

A way to tweak the fact that the proposed solution for alpha=0.5 tends to favor recent votes (as noted by daniloquio) is to choose higher values for alpha (e.g. 0.9 or 0.99). Note that applying this to the testcase proposed by daniloquio is not working however, because when alpha increases the algorithm needs more 'time' to settle (so the arrays should be longer, which is often true in real applications).

Thus:

  • for alpha=0.9 the algorithm averages approximately the last 10 values
  • for alpha=0.99 the algorithm averages approximately the last 100 values
  • for alpha=0.999 the algorithm averages approximately the last 1000 values
  • etc.

I see one problem, only the last ~24 votes count.

p_i+1 = (p + t) / 2

For two votes we have

p2 = (p1 + t2) / 2 = ((p0 + t1) /2 + t2 ) / 2 = p0/4 + t1/4 + t2/2

Expanding that for 32 votes gives:

p32 = t*2^-32 + t0*2^-32 + t1*2^-31 + t2*2^-30 + ... + t31*2^-1

So for signed 32 bit values, t0 has no effect on the result. Because t0 gets divided by 2^32, it will contribute nothing to p32.

If we have two items A and B (no matter how big the differences are) if they both get the same 32 votes, they will have the same popularity. So you're history goes back for only 32 votes. There is no difference in 2032 and 32 votes, if the last 32 votes are the same.

If the difference is less than a day, they will be equal after 17 votes.