Java Apache Commons getPercentile() different result that MS Excel percentile
The difference is subtle and due to assumptions. It is easiest to explain with the 3 element case. Suppose you have three elements(N=3) a=x[0] < b=x[1] < c=x[2]
. Both the Apache and the Excel method say that element b is the 50th percentile (the median). However they differ for a
and c
.
The Apache method (and the method referenced by the NIST page) say a
is the 25th percentile and c
is the 75% percentile, because it divides the space up into N+1 blocks, that is, into quarters.
The Excel method says that a
is the 0th percentile and c
the 100th percentile, as the space is divided into N-1 blocks, that is, in half.
Because of this, if you want the Excel method and you don't want to code it yourself, you could just remove the smallest and the largest element from your array, and call the Apache method - it should give you exactly the same result except at percentiles beyond the end points.
If you want to code it yourself, an easy way is given below. Be aware of these issues:
- this sorts the array (so changes it)
- this takes O(N log(N)) time due to the sorting. The Apache method uses a fast selection algorithm so takes O(N) time (google "quickselect" if you want to know more)
Code (not tested or even compiled, but should give you an idea).
// warning - modifies data
double excelPercentile(double [] data, double percentile) { array
Arrays.sort(data);
double index = percentile*(data.length-1);
int lower = (int)Math.floor(index);
if(lower<0) { // should never happen, but be defensive
return data[0];
}
if(lower>=data.length-1) { // only in 100 percentile case, but be defensive
return data[data.length-1);
}
double fraction = index-lower;
// linear interpolation
double result=data[lower] + fraction*(data[lower+1]-data[lower]);
return result;
}
The solution was creating a class PercentileExcel which is almost a copy of percentile from commons method except for a small change on how to caculate the position:
pos=(1+p*(n-1))/100;
Then you need to add this line to the code in order to use the new class for percentile:
setPercentileImpl(PercentileExcel);