How do I optimally distribute values over an array of percentages?
You should round all values as you assign them using a rounding that is known to uniformly distribute the rounding. Finally, the last value will be assigned differently to round the sum up to 1
.
Let's start slowly or things get very confused. First, let's see how to assign the last value to have a total of the desired value.
// we will need this later on
sum = 0;
// assign all values but the last
for (i = 0; i < output.length - 1; i++)
{
output[i] = input[i] * total;
sum += output[i];
}
// last value must honor the total constraint
output[i] = total - sum;
That last line needs some explanation. The i
will be one more than the last allowed int the for(..)
loop, so it will be:
output.length - 1 // last index
The value we assign will be so that the sum
of all elements is equal to total
. We already computed the sum in a single-pass during the assignment of the values, and thus don't need to iterated over the elements a second time to determine it.
Next, we will approach the rounding problem. Let's simplify the above code so that it uses a function on which we will elaborate shortly after:
sum = 0;
for (i = 0; i < output.length - 1; i++)
{
output[i] = u(input[i], total);
sum += output[i];
}
output[i] = total - sum;
As you can see, nothing has changed but the introduction of the u()
function. Let's concentrate on this now.
There are several approaches on how to implement u()
.
DEFINITION
u(c, total) ::= c * total
By this definition you get the same as above. It is precise and good, but as you have asked before, you want the values to be natural numbers (e.G. integers). So while for real numbers this is already perfect, for natural numbers we have to round it. Let's suppose we use the simple rounding rule for integers:
[ 0.0, 0.5 [ => round down
[ 0.5, 1.0 [ => round up
This is achieved with:
function u(c, total)
{
return Math.round(c * total);
}
When you are unlucky, you may round up (or round down) so much values that the last value correction will not be enough to honor the total constraint and generally, all value will seem to be off by too much. This is a well known problem of which exists a multi-dimensional solution to draw lines in 2D and 3D space which is called the Bresenham algorithm.
To make things easy I'll show you here how to implement it in 1 dimension (which is your case).
Let's first discuss a term: the remainder. This is what is left after you have rounded your numbers. It is computed as the difference between what you wish and what you really have:
DEFINITION
WISH ::= c * total
HAVE ::= Math.round(WISH)
REMAINDER ::= WISH - HAVE
Now think about it. The remained is like the piece of paper that you discard when you cut out a shape from a sheet. That remaining paper is still there but you throw it away. Instead of this, just add it to the next cut-out so it is not wasted:
WISH ::= c * total + REMAINDER_FROM_PREVIOUS_STEP
HAVE ::= Math.round(WISH)
REMAINDER ::= WISH - HAVE
This way you keep the error and carry it over to the next partition in your computation. This is called amortizing the error.
Here is an amortized implementation of u()
:
// amortized is defined outside u because we need to have a side-effect across calls of u
function u(c, total)
{
var real, natural;
real = c * total + amortized;
natural = Math.round(real);
amortized = real - natural;
return natural;
}
On your own accord you may wish to have another rounding rule as Math.floor()
or Math.ceil()
.
What I would advise you to do is to use Math.floor()
, because it is proven to be correct with the total constraint. When you use Math.round()
you will have smoother amortization, but you risk to not have the last value positive. You might end up with something like this:
[ 1, 0, 0, 1, 1, 0, -1 ]
Only when ALL VALUES are far away from 0
you can be confident that the last value will also be positive. So, for the general case the Bresenham algoritm would use flooring, resulting in this last implementation:
function u(c, total)
{
var real, natural;
real = c * total + amortized;
natural = Math.floor(real); // just to be on the safe side
amortized = real - natural;
return natural;
}
sum = 0;
amortized = 0;
for (i = 0; i < output.length - 1; i++)
{
output[i] = u(input[i], total);
sum += output[i];
}
output[i] = total - sum;
Obviously, input
and output
array must have the same size and the values in input
must be a paritition (sum up to 1).
This kind of algorithm is very common for probabilistical and statistical computations.
Alternate implementation - it remembers a pointer to the biggest rounded value and when the sum differs of 100, increment or decrement value at this position.
const items = [1, 2, 3, 5];
const total = items.reduce((total, x) => total + x, 0);
let result = [], sum = 0, biggestRound = 0, roundPointer;
items.forEach((votes, index) => {
let value = 100 * votes / total;
let rounded = Math.round(value);
let diff = value - rounded;
if (diff > biggestRound) {
biggestRound = diff;
roundPointer = index;
}
sum += rounded;
result.push(rounded);
});
if (sum === 99) {
result[roundPointer] += 1;
} else if (sum === 101) {
result[roundPointer] -= 1;
}