MaxDetect speed

Is there any faster way available?

There is, if you're willing to to use a different definition of "local maximum". If a local maximum is any value at least as high as any value in a 11x11 neighborhood, then you can use UnitStep[dat - Dilation[dat, 5]], which is about 16 times faster for your data:

RepeatedTiming[MaxDetect[dat];][[1]]

0.016

RepeatedTiming[UnitStep[dat - Dilation[dat, 5]];][[1]]

0.00101

But the result is slightly different:

enter image description here

Note the marked pixels on the left border in the dat-Dilation version. By the definition above, these pixels mark one extended local maximum.


Why is MaxDetect so much slower when Chop is applied?

Because Chop returns exact zeros what prevents packing of the matrix attempted by MaxDetect. You can detect an attempt to pack the array with Trace:

Trace[MaxDetect[{{1.}}], Developer`ToPackedArray] // Flatten

screenshot

Applying N after Chop gives almost the same speed as without Chop because now the matrix can be packed again:

RepeatedTiming[MaxDetect[dat];][[1]]
RepeatedTiming[MaxDetect[N@Chop@dat];][[1]]
0.06
0.07

Why does converting to an Image speed it up further?

It seems that MaxDetect is simply not so well-optimized for an array input as compared to an Image input. A superficial analysis follows.

  1. Compare the complexity of Trace outputs in the both cases:

    Trace[MaxDetect[dat]] // LeafCount
    Trace[MaxDetect[Image@dat]] // LeafCount
    
    79670996
    260610
    

    It indicates that in the case of array input it is evaluated many times as an (unpacked?) array, while in the case of Image input it is evaluated as an atomic Image object what may be faster.

  2. Additional evidences in support of the previous guess:

    Count[Trace[MaxDetect[dat]], _Image, Infinity]
    Count[Trace[MaxDetect[Image@dat]], _Image, Infinity]
    
    8
    1913
    
    Count[Trace[MaxDetect[dat]], _List?MatrixQ, Infinity]
    Count[Trace[MaxDetect[Image@dat]], _List?MatrixQ, Infinity] // Quiet
    
    7432
    367
    
  3. And finally let us compare how many packed and non-packed array evaluations appear in the both cases:

    Count[Trace[MaxDetect[dat]], a_List /; Developer`PackedArrayQ[a], Infinity]
    Count[Trace[MaxDetect[Image@dat]], a_List /; Developer`PackedArrayQ[a], Infinity] // Quiet
    
    102
    204
    
    Count[Trace[MaxDetect[dat]], a_List /; ! Developer`PackedArrayQ[a], Infinity]
    Count[Trace[MaxDetect[Image@dat]], a_List /; ! Developer`PackedArrayQ[a], Infinity] // Quiet
    
    748562
    8706
    

    From the last output we see that there seems to be about 100 times more evaluations of non-packed arrays in the case of array input as compared to Image input. I think that now the situation is clear enough.


A severe problem is that Table generates an unpacked array, a thing that is often very annoying. When converting to an image, it is packed automatically (one can check that, e.g., with Developer`PackedArrayQ[ImageData[Image[dat]]]). And because many functions work faster on packed arrays than on unpacked ones, this increases the performance.

A vectorized implementation that exploits the very nature of the function (and is thus not very generalizable) is the following:

nx = ny = 1000;
dat = KroneckerProduct[
     (0.1 + Cos[Subdivide[0., 2. Pi, ny]]),
     Sin[Subdivide[0., 2. Pi, nx]]
     ];

0.003691

The maxima can be found by comparing each value to the maximum of its neighbors. The latter can be done efficiently with MaxFilter:

getindex = {Quotient[#, Dimensions[dat][[1]]] + 1, Mod[#, Dimensions[dat][[2]], 1]} &;
idx = getindex /@ 
     Random`Private`PositionsOf[
      Flatten@UnitStep[dat - MaxFilter[dat, {1, 1}]], 
      1
      ]; // AbsoluteTiming // First
idx
MatrixPlot@SparseArray[idx -> 1, Dimensions[dat]]

0.00108

{{1, 26}, {29, 1}, {30, 1}, {31, 1}, {32, 1}, {33, 1}, {34, 1}, {35, 1}, {36, 1}, {37, 1}, {38, 1}, {39, 1}, {40, 1}, {41, 1}, {42, 1}, {43, 1}, {44, 1}, {45, 1}, {46, 1}, {47, 1}, {48, 1}, {49, 1}, {50, 1}, {51, 1}, {51, 76}, {52, 1}, {53, 1}, {54, 1}, {55, 1}, {56, 1}, {57, 1}, {58, 1}, {59, 1}, {60, 1}, {61, 1}, {62, 1}, {63, 1}, {64, 1}, {65, 1}, {66, 1}, {67, 1}, {68, 1}, {69, 1}, {70, 1}, {71, 1}, {72, 1}, {73, 1}, {101, 26}}

enter image description here

This is about 10 times faster than MaxDetect. But it does not distinguish between local maxima and strict local maxima. However, a second sweep over the local maxima (which is much less expensive) could filter out the strict local maxima.