Fold and foldLeft method difference
I am not familiar with Scala, but Scala's collection library has a similar design to Haskell's. This answer is based on Haskell and is probably accurate for Scala as well.
Because foldLeft
processes its inputs from left to right, it can have different input and output types. On the other hand, fold
can process its inputs in various orders and so the inputs and output must all have the same type. This is easiest to see by expanding out the fold expressions. foldLeft
operates in a specific order:
Array("1","2","3").foldLeft(0)(_ + _.toInt)
= ((0 + "1".toInt) + "2".toInt) + "3".toInt
Note that array elements are never used as the first parameter to the combining function. They always appear on the right of the +
.
fold
does not guarantee a specific order. It could do various things, such as:
Array("1","2","3").fold(0)(_ + _.toInt)
= ((0 + "1".toInt) + "2".toInt) + "3".toInt
or (0 + "1".toInt) + ("2" + "3".toInt).toInt
or "1" + ("2" + ("3" + 0.toInt).toInt).toInt
Array elements can appear in either parameter of the combining function. But your combining function expects its first argument to be an int. If you don't respect that constraint, you end up adding strings to ints! This error is caught by the type system.
The neutral element may be introduced multiple times because, generally, a parallel fold is implemented by splitting up the input and executing multiple sequential folds. A sequential fold introduces the neutral element once. Imagine one particular execution of Array(1,2,3,4).fold(0)(_ + _)
where the array is split into two separate arrays, and these are folded sequentially in two threads. (Of course, the real fold
function does not spit the array into multiple arrays.) One thread executes Array(1,2).fold(0)(_ + _)
, computing 0 + 1 + 2
. The other thread executes Array(3,4).fold(0)(_ + _)
, computing 0 + 3 + 4
. Finally, the partial sums from the two threads are added together. Note that the neutral element, 0
, appears twice.
NOTE: I could be completely wrong here. My scala is less than perfect.
I think that the difference is in the signature of the methods:
def fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
vs
def foldLeft[B](z: B)(op: (B, T) ⇒ B): B
In short, fold is defined as operating on some type A1 which is a supertype of the array's type, which for your string array the compiler defines as "Any" (probably because it needs a type that can store your String or an int- notice that the combiner method passed to fold Fold takes two parameters of the same type?) That's also what the documentation means when it talks about z- the implementation of Fold could be such that it combines your inputs in parallel, for instance:
"1" + "2" --\
--> 3 + 3 -> 6
"3" + *z* --/
On the other hand, foldLeft operates on type B (unconstrained) and only asks that you provide a combiner method that takes a parameter of type B and another of your array's type (String, in your case), and produces a B.
As defined by Scala, foldLeft
is a linear operation while fold
is allowed to be a tree operation. For example:
List(1,2,3,4,5).foldLeft(0)(_ + _)
// This is the only valid order of operations
0+1 = 1
1+2 = 3
3+3 = 6
6+4 = 10
10 + 5 = 15
15 // done
List(1,2,3,4,5).fold(0)(_ + _)
// This is valid
0+1 = 1 0+3 = 3 0+5 = 5
1+2 = 3 3+4 = 7 5
3 + 7=10 5
10 + 5 = 15
15 // done
In order to allow arbitrary tree decompositions of a sequential list, you must have a zero that doesn't do anything (so you can add it wherever you need it in the tree) and you must create the same sort of thing that you take as your binary arguments so the types don't change on you depending on how you decompose the tree.
(Being able to evaluate as a tree is nice for parallelization. If you want to be able to transform your output time as you go, you need both a combination operator and a standard start-value-transforms-sequence-element-to-desired-type function just like foldLeft
has. Scala has this and calls it aggregate
, but in some ways this is more like foldLeft
than fold
is.)