which flood-fill algorithm is better for performance?

Make a mask - a parallel 2-dim array of bytes. Unchecked areas bytes has 0, for the fresh border of flooded area it will have value 1. For the inside of the flooded area - value 2. And keep the list of current border points, too.

At any end of the outer cycle you have the mask with marked current border, inside and outside area, and the array of the border points. So you will check for the new points only on the border. And while checking the first arraylist of border points, you are creating the second border arraylist and second mask. At the next step you recreate the first border array and mask. Going this way, we can use simple while cycle instead of recursion, for the data structure you check at any step is very simple.

BTW, you have forgotten to check coordinates of the new points for being on the drawn border or on the border of the whole rectangle.

As for cycling through all neighbouring points, look at my algorithm here

Computers process XY loops and 2D arrays very efficiently.

check this video to see what happens if put the standard recursive routine in a loop: https://www.youtube.com/watch?v=LvacRISl99Y enter image description here

Instead of using complex logic to track previously verified neighbor spaces, you can have a 2D array that records all the verified spaces. Reading from the verified array is 2-3 instructions: IF pixel[23,23] is verified, THEN fill it and check it's neighbors.