Can (compiled) matrix permanent evaluation be further sped-up?
Looking at CompilePrint[compiledGlynnAlgorithm]
, there are some lines with CopyTensor
in it which aren't really needed. There's also a few CoerceTensor
lines in there when it might be faster to just coerce the integer matrix once at the beginning.
By slightly adjusting the function, all instances of CopyTensor
and CoerceTensor
go away, giving a small increase in speed:
compiledGlynnAlgorithmAlt = Compile[{
{d, _Complex, 2}, {a, _Complex, 2}},
Total@Map[Apply[Times, (#.a)*#] &, d],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True];
n = 20;
rc = RandomComplex[{-I - 1, I + 1}, {n, n}];
a = compiledGlynnAlgorithmAlt[δGrayCodeList[n], rc]; // AbsoluteTiming
b = compiledGlynnAlgorithm[δGrayCodeList[n], rc]; // AbsoluteTiming
a == b
(* {0.582192, Null} *)
(* {0.690600, Null} *)
(* True *)
Some more performance can be squeezed out by caching the resulting sign of each row in δGrayCodeList[n]
; the result is no longer exactly the same, but the relative difference is small:
δGrayCodeListSigns[n_] := δGrayCodeListSigns[n] = Times @@@ δGrayCodeList[n]
compiledGlynnAlgorithmKnownSign =
Compile[{{d, _Integer, 2}, {a, _Complex, 2}, {s, _Integer, 1}},
s.Map[ Apply[Times, (#.a)] &, d]
, CompilationTarget -> "C"
, RuntimeAttributes -> {Listable}];
n = 20;
rc = RandomComplex[{-I - 1, I + 1}, {n, n}];
a = compiledGlynnAlgorithmAlt[δGrayCodeList[n], rc]; // AbsoluteTiming
b = compiledGlynnAlgorithm[δGrayCodeList[n], rc]; // AbsoluteTiming
c = compiledGlynnAlgorithmKnownSign[
δGrayCodeList[n], rc, δGrayCodeListSigns[n]
]; // AbsoluteTiming
Abs[c - b]/Abs[b]
(* {0.565806, Null} *)
(* {0.614640, Null} *)
(* {0.430388, Null} *)
(* 2.49266*10^-13 *)
You might get a speed up by restricting compiledGlynnAlgorithm
to work on just one row of the Gray Code list, allowing the Listable
and Parallelization
to come into play. I say "might" because the speed up will depend on the details of your hardware.
Redefine compiledGlynnAlgorithm
like so (note that it now takes a one dimensional list for d
):
compiledGlynnAlgorithm = Compile[{{d, _Integer, 1}, {a, _Complex, 2}},
Apply[Times, (d.a) d],
CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True]
And put the Total
into Permanent
Permanent[mArg_List /; (MatrixQ[mArg, NumericQ])] :=
Total@compiledGlynnAlgorithm[δGrayCodeList[mArg // Length], mArg] //
#/2^((mArg // Length) - 1) &;
a bit more speed
As ssch suggested, a little more performance can be squeezed out by exploiting the fact that the product of a given row of the Gray Code list is either 1 or -1. Furthermore, these occur alternately. So we can redefine compiledGlynnAlgorithm
to remove the multiplication by d
:
compiledGlynnAlgorithm = Compile[{{d, _Integer, 1}, {a, _Complex, 2}},
Apply[Times, (d.a)],
CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True]
and modify Permanent
to Total
the odd and even rows of the result separately:
Permanent[mArg_List /; (MatrixQ[mArg, NumericQ])] :=
Module[{x},
x = compiledGlynnAlgorithm[δGrayCodeList[mArg // Length], mArg];
(Total[x[[;; ;; 2]]] - Total[x[[2 ;; ;; 2]]]) // #/2^((mArg // Length) - 1) &]
On my machine this gives about a factor of 3.5 speed increase over the original code for a 20x20 matrix.
Here's my relatively compact implementation of Glynn's formula, which incorporates the Gray code optimization:
SetAttributes[GrayCode, Listable];
GrayCode[n_Integer] := BitXor[n, BitShiftRight[n]]
permanent[mat_?MatrixQ] /; Equal @@ Dimensions[mat] :=
Module[{b = 2^(Length[mat] - 1)},
PadRight[{}, b, {1, -1}].(Times @@@
((2 IntegerDigits[2 b - GrayCode[Range[0, b - 1]] - 1,
2] - 1).mat))/b]
Here is a compiled version of the Gray-permuted Glynn formula. The code is adapted from Knuth's method for generating tuples in Gray order:
cPermanent = Compile[{{m, _Real, 2}},
Module[{n = Length[m], d, f, j, p, s},
d = Table[1, {n}]; f = Range[0, n - 1];
j = 0; s = 1; p = Times @@ Total[m];
While[j < n - 1,
f[[1]] = 0; d[[n - j]] *= -1;
s *= -1; p += s (Times @@ (d.m));
f[[j + 1]] = f[[j + 2]]; f[[j + 2]] = j + 1;
j = f[[1]]];
p/2^j], "RuntimeOptions" -> "Quality"];
(Modifying the compiled routine to handle complex matrices is completely straightforward.)
At least in the limited tests I did, these two perform quite well for symbolic and numerical matrices, compared to the MathWorld routine. I do not have version 10, so I can't say whether these routines are any better than the now built-in Permanent[]
function.
For reference, here is an implementation of Ryser's formula, adapted from old code by Ilan Vardi:
permanent[mat_?MatrixQ] /; Equal @@ Dimensions[mat] :=
Module[{n = Length[mat], l},
l = Range[2^n - 1];
(1 - 2 Mod[n, 2]) PadRight[{}, 2^n - 1, {-1, 1}].(Times @@@
Accumulate[JacobiSymbol[-1, l]
mat[[IntegerExponent[l, 2] + 1]]])]
It is a bit slower than the routine based on Glynn's formula.