Percentage of array between values
Basic Numpy and Pandas solutions
There's no completely prepackaged method (in Numpy), but there's lots of one liners. Here's how to do it using comparison and logical ops (Edit tip of the hat to Paul Panzer for suggesting the use of np.count_nonzero
):
import numpy as np
arr = np.linspace(-15,15,1000)
np.count_nonzero((arr > -10) & (arr < 10))/arr.size
Output:
0.666
If you're willing to use Pandas, the pandas.Series.between
method gets you a little closer to the complete package you want:
import pandas as pd
sr = pd.Series(np.linspace(-15,15,1000))
np.count_nonzero(sr.between(-10,10))/sr.size
Output:
0.666
Pitfalls
Every interval analysis method involves an explicit or implicit definition of the interval that you're considering. Is the interval closed (ie inclusive of the extreme values) on both ends, like [-10, 10]
? Or is it half-open (ie excludes the extreme value on one end), like [-10, 10)
? And so forth.
This tends not to be an issue when dealing with arrays of float
values taken from data (since it's unlikely any of the data falls exactly on the extremes), but can cause serious problems when working with arrays of int
. For example, the two methods I listed above can give different results if the array includes the boundary values of the interval:
arr = np.arange(-15,16)
print(np.count_nonzero((arr > -10) & (arr < 10))/arr.size)
print(np.count_nonzero(pd.Series(arr).between(-10,10))/arr.size)
Output:
0.6129032258064516
0.6774193548387096
The pd.Series.between
method defaults to to a closed interval on both ends, so to match it in Numpy you'd have to use the inclusive comparison operators:
arr = np.arange(-15,16)
print(np.count_nonzero((arr >= -10) & (arr <= 10))/arr.size)
print(np.count_nonzero(pd.Series(arr).between(-10,10))/arr.size)
Output:
0.6774193548387096
0.6774193548387096
All of this to say: when you pick a method for this kind of interval analysis, be aware of it's boundary conventions, and use a consistent convention across all your related analyses.
Other solutions
If you assume the data is sorted (or if you sort it yourself), you can use np.searchsorted
:
arr = np.random.uniform(-15,15,100)
arr.sort()
np.diff(arr.searchsorted([-10, 10]))[0]/arr.size
Output:
0.65
A simple solution is to use np.histogram
:
import numpy as np
X = np.arange(20)
values = [5, 13] # these are your a and b
freq = np.histogram(X, bins=[-np.inf] + values + [np.inf])[0]/X.size
print(freq)
>> array([0.25, 0.4 , 0.35])