Map list to bin numbers

This is a very quick-n-dirty, but may serve as a simple example.

This creates a piecewise function following the first definition in Matlab's discretize documentation, then applies that to the data.

disc[data_, edges_] := Module[{e = Partition[edges, 2, 1], p, l},
   l = Length@e;
   p=Piecewise[Append[Table[{i, e[[i, 1]] <= x < e[[i, 2]]}, {i, l - 1}]
                          , {l,e[[l, 1]] <= x <= e[[l, 2]]}]
                   , "NaN"];
   Table[p, {x, data}]];

From the first example in the above referenced documentation:

data={1, 1, 2, 3, 6, 5, 8, 10, 4, 4};
edges={2, 4, 6, 8, 10};

disc[data,edges]

{NaN,NaN,1,1,3,2,4,4,2,2}

I'm sure there are more efficient/elegant solutions, and will revisit as time permits.


Here's a version based on Nearest:

digitize[edges_] := DigitizeFunction[edges, Nearest[edges -> "Index"]]
digitize[data_, edges_] := digitize[edges][data]

DigitizeFunction[edges_, nf_NearestFunction][data_] := With[{init = nf[data][[All, 1]]},
    init + UnitStep[data - edges[[init]]] - 1
]

For example:

SeedRandom[1]
data = RandomReal[10, 10]
digitize[data, {2, 4, 5, 7, 8}]

{8.17389, 1.1142, 7.89526, 1.87803, 2.41361, 0.657388, 5.42247, 2.31155, 3.96006, 7.00474}

{5, 0, 4, 0, 1, 0, 3, 1, 1, 4}

Note that I broke up the definition of digitize into two pieces, so that if you do this for multiple data sets with the same edges list, you only need to compute the nearest function once.


You may also use Interpolation with InterpolationOrder -> 0. However, employing Nearest as Carl Woll did will usually be much faster.

First, we prepare the interplating function.

m = 20;
binboundaries = Join[{-1.}, Sort[RandomReal[{-1, 1}, m - 1]], {1.}];

f = Interpolation[Transpose[{binboundaries, Range[0, m]}], InterpolationOrder -> 0];

Now you can apply it to lists of values as follows:

vals = RandomReal[{-1, 1}, 1000];   
Round[f[vals]]