How to improve performance of BesselJ to the level of GSL?

Following the advice in comments, I've made a test library for BesselJ[1, #] & function to evaluate via GSL. I still consider it a workaround, so if you find a way to use Mathematica built-in functions with good performance, please do make a new answer.

Needs["CCompilerDriver`"]
besselJ1src = "
  #include \"WolframLibrary.h\"
  DLLEXPORT mint WolframLibrary_getVersion() {return WolframLibraryVersion;}
  DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {return 0;}
  DLLEXPORT void WolframLibrary_uninitialize(WolframLibraryData libData) {}

  #include <gsl/gsl_sf_bessel.h>
  DLLEXPORT int j1(WolframLibraryData libData, mint argc, MArgument* args, MArgument result)
  {
    MArgument_setReal(result,gsl_sf_bessel_J1(MArgument_getReal(args[0])));
    return LIBRARY_NO_ERROR;
  }
  ";
besselJ1lib = CreateLibrary[besselJ1src, "besselj1", 
  "Libraries" -> {"gsl", "gslcblas"}];
j1 = LibraryFunctionLoad[besselJ1lib, "j1", {Real}, Real];

Now I can execute my original code with j1 instead of BesselJ[1, #] &:

zs = N /@ Range[0, 12, 10^-5];
AbsoluteTiming[bessels = j1 /@ zs;]
{0.241519, Null}

And bessels do indeed have numerical values.


I present in this answer a compiled implementation of one of the simpler algorithms for numerically evaluating a Bessel function of (modestly-sized) integer order and (small to medium-sized) real argument. This uses Miller's algorithm:

bessj = With[{bjl = N[Log[1*^16]]},
             Compile[{{n, _Integer}, {x, _Real}},
                     Module[{h, hb, k, l, m, r, s},
                            If[x == 0., Return[If[n == 0, 1., 0.]]];
                            l = r = 1;
                            While[r = 2 l;
                                  r (Log[r] - Log[Abs[x/2]] - 1) + Log[6 r + 1]/2 < bjl,
                                  l = r];
                            While[r - l > 1, m = Quotient[l + r, 2];
                                  If[m (Log[m] - Log[Abs[x/2]] - 1) + Log[6 m + 1]/2 < bjl,
                                     l = m, r = m]];
                            h = 0.; s = 0.; hb = Internal`Bag[Most[{0.}]];
                            Do[h = x/(2 k - h x); Internal`StuffBag[hb, h];
                               s = h (s + 2 Mod[k + 1, 2]), {k, l, 1, -1}];
                            Internal`StuffBag[hb,
                                              If[n >= 0 || Mod[n, 2] == 0, 1, -1]/(1 + s)];
                            Times @@ Take[Internal`BagPart[hb, All], -Abs[n] - 1]],
                     RuntimeAttributes -> {Listable}]];

Compare:

zs = N[Range[0, 12, 1*^-5]];
AbsoluteTiming[BesselJ[1, #] & /@ zs;]
   {76.729389, Null}
AbsoluteTiming[bessj[1, #] & /@ zs;]
   {39.575264, Null}

However, the listability of both the built-in function and the compiled version are not being exploited here, so here's another test:

AbsoluteTiming[j1 = BesselJ[1, zs];]
   {74.120239, Null}
AbsoluteTiming[j1t = bessj[1, zs];]
   {24.816419, Null}
Max[Abs[j1t - j1]]
   2.94459*10^-13

Note that I did not do any compilation to C in this case; doing this ought to result in a function with even better performance.


If need be, you can use the trapezoidal rule to evaluate a Bessel function of low integer order and small to moderately sized argument. That should be easy to do within a compiled function, if speed is of the essence.

I mentioned this in a comment months ago. Here is the long-overdue follow through. What follows is a simple routine for $J_1(x)$ for real, moderately sized $x$, using the midpoint rule (since, as it turns out, the expressions look simpler with it; the performance is similar, however):

bj1 = With[{bjl = N[Log[1*^16]], p2 = N[π/2]},
           Compile[{{x, _Real}},
                   Module[{m = 1, k, r, s},
                          If[x != 0., While[(8 m - 2) (Log[4 m - 1] -
                             Log[Abs[0.5 x]] - 1) + Log[25 m - 5] < 2 bjl, m++]];
                          s = 0.;
                          Do[r = Sin[(k + 0.5) p2/(m + 1)];
                             s += r Sin[r x],
                             {k, 0, m}];
                          s/(m + 1)],
                   RuntimeAttributes -> {Listable}]];

Admittedly, it does not have the generality of my previous answer, but it is concise and quite efficient:

AbsoluteTiming[j1 = BesselJ[1, zs];]
   {191.703, Null}

AbsoluteTiming[j1t = bj1[zs];]
   {37.6003, Null}

Max[Abs[j1t - j1]]
   4.44089*10^-16

For an explanation of why the trapezoidal/midpoint rule works so well, see this or this.