Calculate 95th percentile in Ruby?
If your are interested in existing gem, then descriptive_statistics
gem is best I found so far for percentile
function.
IRB Session
> require 'descriptive_statistics'
=> true
irb(main):009:0> data = [1, 2, 3, 4]
=> [1, 2, 3, 4]
irb(main):010:0> data.percentile(95)
=> 3.8499999999999996
irb(main):011:0> data.percentile(95).round(2)
=> 3.85
Good part of gem is its elegant way of describing "I want 95 percentile of data".
If you want to replicate Excel's PERCENTILE function then try the following:
def percentile(values, percentile)
values_sorted = values.sort
k = (percentile*(values_sorted.length-1)+1).floor - 1
f = (percentile*(values_sorted.length-1)+1).modulo(1)
return values_sorted[k] + (f * (values_sorted[k+1] - values_sorted[k]))
end
values = [1, 2, 3, 4]
p = 0.95
puts percentile(values, p)
#=> 3.85
The formula is based on the QUARTILE method, which is really just a specific percentiles - https://support.microsoft.com/en-us/office/quartile-inc-function-1bbacc80-5075-42f1-aed6-47d735c4819d.
Percentile based on count of items
a = [1,2,3,4,5,6,10,11,12,13,14,15,20,30,40,50,60,61,91,99,120]
def percentile_by_count(array,percentile)
count = (array.length * (1.0-percentile)).floor
array.sort[-count..-1]
end
# 80th percentile (21 items*80% == 16.8 items are below; pick the top 4)
p percentile_by_count(a,0.8) #=> [61, 91, 99, 120]
Percentile based on range of values
def percentile_by_value(array,percentile)
min, max = array.minmax
range = max - min
min_value = (max-min)*percentile + min
array.select{ |v| v >= min_value }
end
# 80th percentile (119 * 80% = 95.2; pick values above this)
p percentile_by_value(a,0.8) #=> [99, 120]
Interestingly, Excel's PERCENTILE
function returns 60
as the first value for the 80th percentile. If you want this result—if you want an item falling on the cusp of the limit to be included— then change the .floor
above to .ceil
.