Modular arithmetic - efficiently calculating the remainders of factorials
This is bit faster:
toPrime = 500;
sums = Accumulate@FoldList[Times, 1, Range[2, Prime@toPrime - 1]];
primes = Prime[Range[toPrime]];
Mod[sums[[primes - 1]], primes]
Precompute factorial sums and primes. Mod
is fast on lists.
Let $x \equiv r_1 \bmod p$ and $y \equiv r_2 \bmod p$. Then, $x y \equiv r_1 r_2 \bmod p$. So, we can compute the sum of the factorials mod p using:
f[p_] := Mod[ Total @ FoldList[ Mod[Times[##], p]&, Range[p-1]], p]
Let's compare this to the naive implementation:
t[p_] := Mod[Sum[k!, {k, p-1}], p]
For example:
f[Prime[500]] //AbsoluteTiming
t[Prime[500]] //AbsoluteTiming
{0.00058, 1813}
{0.085628, 1813}
The nice thing about using Mod[Times[##], p]
as the FoldList
function is that the output should be a machine number unless you are working with very large primes. That means that f
can be compiled:
fc = Compile[{{p, _Integer}},
Mod[ Total @ FoldList[ Mod[Times[##], p]&, Range[p-1]], p],
RuntimeAttributes->{Listable}
];
Let's compare for a large prime:
f[Prime[10^6]] //AbsoluteTiming
fc[Prime[10^6]] //AbsoluteTiming
{1.83041, 9308538}
{1.05449, 9308538}
Faster, but the real difference is that fc
is Listable. Hence, comparing timings on a list shows a much larger difference. For example:
r1 = f /@ Prime[Range[5000, 5500]]; //AbsoluteTiming
r2 = fc[Prime[Range[5000, 5500]]]; //AbsoluteTiming
r1 === r2
{2.06691, Null}
{0.395402, Null}
True
Finally, since the list of all factorials is not stored anywhere, the memory used is much more manageable. Here is the memory and timing for the first 5000 primes:
r1 = fc[Prime[Range[5000]]]; //MaxMemoryUsed //AbsoluteTiming
{3.03463, 462848}
This is quite a bit smaller than @MichaelE2's answer:
MaxMemoryUsed[
toPrime = 5000;
sums = Accumulate@FoldList[Times, 1, Range[2, Prime@toPrime - 1]];
primes = Prime[Range[toPrime]];
r2 = Mod[sums[[primes - 1]], primes]
] //AbsoluteTiming
r1 === r2
{2.40441, 4064607008}
True
The compiled answer uses .462KB while the approach where all the factorials are precomputed takes 4GB, and the timing is not too different.