How to compile procedural program effectively
In the fortran version, you have the large loop within the executable and you call the executable only once. Also quite important: You allocate memory for result
only once. In order to make the benchmark fair, you should compare to something like
f5 = Compile[{{ls, _Real, 1}},
Module[{n = Length@ls, tp},
tp = Table[0., {Length[ls]}];
Do[
tp[[1]] = ls[[2]];
tp[[n]] = ls[[n - 1]];
Do[tp[[i]] = ls[[i - 1]] + ls[[i + 1]], {i, 2, n - 1}];
, {20000}];
tp],
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
On my machine, calling f5[ls]
takes about 0.067 s while Do[f4[ls];, {20000}]
needs about 0.58 s. The fortran variant (see below) compiled with gfortran -o bla test2.f90 -O3
needs about 0.012 s. Moreover, the call f6[ls]
with the function defined below needs only 0.034 seconds which is not too far away from the fortran timing.
f6 = Compile[{{ls, _Real, 1}},
Module[{n = Length@ls, tp},
tp = Table[0., {n}];
Do[
tp[[1]] = Compile`GetElement[ls, 2];
tp[[n]] = Compile`GetElement[ls, n - 1];
Do[
tp[[i]] = Compile`GetElement[ls, i - 1] + Compile`GetElement[ls, i + 1],
{i, 2, n - 1}],
{20000}];
tp],
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
test2.f90
program main
implicit none
integer,parameter :: N0 = 2000
integer i,j
real (kind=8) :: list(N0), result(N0),start,finish
do i = 1, N0
list(i) = sin(i/real(N0))
end do
call CPU_TIME(start)
do j = 1, 20000
result(1) = list(2)
result(N0) = list(N0-1)
do i=2,N0-1
result(i) = list(i+1) + list(i-1)
end do
end do
call CPU_TIME(finish)
write(*,*) result
write(*,*) finish-start
end program
PS.: I have still not used parallelization, here. Even with the dullest way to do that within Mathematica, I get (on a Quad Core CPU):
AbsoluteTiming[ParallelDo[f6[ls], {i, 1, $KernelCount}]][[1]]/($KernelCount)
(* 0.0125135 *)
and by even increasing the number of jobs:
AbsoluteTiming[ParallelDo[f6[ls], {i, 1, $KernelCount 10}]][[ 1]]/($KernelCount 10)
(* 0.00855335 *)
Admittedly, it is not exactly fair to compare this to the unparallelized fortran code. I added this only to show another possibility to speed up the Mathematica code.
This is an answer to Q1 and partly Q4, really. I can't test your Fortran version at the moment, but it would be an interesting comparison.
You can improve the performance of f4
compared to f1
and f2
by setting RuntimeOptions -> "Speed"
. Clearly the change in runtime settings (mainly "CatchMachineIntegerOverflow"
it seems...) from the defaults has a different effect on the two functions.
For instance:
f2 = Compile[{{ls1, _Real, 1}},
Append[Rest@ls1, 0.] + Prepend[Most@ls1, 0.],
CompilationTarget -> "C", RuntimeOptions -> "Speed"];
f4 = Compile[{{ls, _Real, 1}},
Module[{n = Length@ls, tp = ls}, tp[[1]] = ls[[2]];
tp[[n]] = ls[[n - 1]];
Do[tp[[i]] = ls[[i - 1]] + ls[[i + 1]], {i, 2, n - 1}];
tp], CompilationTarget -> "C", RuntimeOptions -> "Speed"];
AbsoluteTiming[TimeConstrained[Do[f2[ls];, {20000}], 5]]
(* 0.152 seconds *)
AbsoluteTiming[TimeConstrained[Do[f4[ls];, {20000}], 5]]
(* 0.127 seconds *)