Dramatic speed difference of code on MATLAB and Mathematica
Modify the calculation order a little to avoid ragged array and then make use of Listable
and Compile
:
computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function]
15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity =
With[{l = liuQuartic, m = uniMass},
Compile[{{d, _Real, 2}, {h, _Real}}, m Total@Transpose[l[d, h] UnitStep[2 h - d]]]];
new = initializeDensity[computeDistance[N@totalPos], h]; // AbsoluteTiming
Tested with your new added sample data, my code ran for 0.390000 s while the original code ran for 4.851600 s and ybeltukov's code ran for 0.813200 s on my machine.
If you have a C compiler installed, the following code
computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function]
15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity =
With[{l = liuQuartic, m = uniMass, g = Compile`GetElement},
Compile[{{d, _Real, 2}, {h, _Real}},
Module[{b1, b2}, {b1, b2} = Dimensions@d;
m Table[Sum[If[2 h > g[d, i, j], l[g[d, i, j], h], 0.], {j, b2}], {i, b1}]],
CompilationTarget -> "C", RuntimeOptions -> "Speed"]];
will give you a 2X speedup once again. Notice the C compiler is necessary, see this post for some more details.
You miss that many Mathematica functions are Listable. It allows you to write a fast and clear code
init2[distance_] := uniMass Total[liuQuartic[distance, h] UnitStep[2 h - distance], {2}]
h = 0.1;
uniMass = 1.0;
liuQuartic[d_, h_] := d^2 - h^2;
totalPos = RandomReal[1, {1119, 2}];
res1 = initializeDensity@computeDistance[totalPos]; // AbsoluteTiming
res2 = init2@computeDistance[totalPos]; // AbsoluteTiming
res1 == res2
(* {3.088372, Null} *)
(* {0.130059, Null} *)
(* True *)
It seems that this code is faster than MATLAB.
Here a slightly improved version of xzczd's code (I call the function cinitializeDensity
in the following) that does not require computing the DistanceMatrix
beforehand. Moreover, I tried to suppress some type casts within the CompiledFunction
and to exploit parallelization.
Block[{r, h},
cinitializeDensity2 =
With[{code = N[liuQuartic[Sqrt[r], h]], m = N[uniMass], g = Compile`GetElement},
Compile[{{x, _Real, 1}, {y, _Real, 2}, {h, _Real}},
Block[{r, sum = 0., x1, x2},
x1 = g[x, 1];
x2 = g[x, 2];
Do[
r = (x1 - g[y, j, 1])^2 + (x2 - g[y, j, 2])^2;
sum += If[r < 4. h^2, code, 0.],
{j, 1, Length[y]}];
m sum
],
CompilationTarget -> "C",
Parallelization -> True,
RuntimeOptions -> "Speed",
RuntimeAttributes -> {Listable}
]]
];
Along with packing, this leads to further speedup:
ptotalPos = Developer`ToPackedArray[N[totalPos]];
a = initializeDensity[computeDistance[ptotalPos]]; // AbsoluteTiming // First
b = cinitializeDensity[computeDistance[ptotalPos], h]; // AbsoluteTiming // First
c = cinitializeDensity2[ptotalPos, ptotalPos, h]; // AbsoluteTiming // First
a == b == c
1.28708
0.006039
0.000844
True
For even longer list, it might be worthwhile to delegate the distance checks to Nearest
:
Block[{r, h},
cinitializeDensity3 =
With[{code = N[liuQuartic[Sqrt[r], h]], m = N[uniMass],
g = Compile`GetElement},
Compile[{{x, _Real, 1}, {y, _Real, 2}, {h, _Real}},
Block[{r, sum = 0., x1, x2},
x1 = g[x, 1];
x2 = g[x, 2];
Do[
r = (x1 - g[y, j, 1])^2 + (x2 - g[y, j, 2])^2;
sum += code,
{j, 1, Length[y]}];
m sum
],
CompilationTarget -> "C",
Parallelization -> True,
RuntimeOptions -> "Speed",
RuntimeAttributes -> {Listable}
]]
];
d = cinitializeDensity3[ptotalPos, Nearest[ptotalPos, ptotalPos, {∞, 2 h}], h]; // AbsoluteTiming // First
a == d
0.00126
True