Is Haskell's laziness an elegant alternative to Python's generators?
Indeed, lazy lists can be used this way. There are some subtle differences though:
- Lists are data structures. So you can keep them after evaluating them, which can be both good and bad (you can avoid recomputation of values and to recursive tricks as @ChrisDrost described, at the cost of keeping memory unreleased).
- Lists are pure. In generators you can have computations with side effects, you can't do that with lists (which is often desirable).
- Since Haskell is a lazy language, laziness is everywhere and if you just convert a program from an imperative language to Haskell, the memory requirements can change considerably (as @RomanL describes in his answer).
But Haskell offers more advanced tools to accomplish the generator/consumer pattern. Currently there are three libraries that focus on this problem: pipes, conduit and iteratees. My favorite is conduit, it's easy to use and the complexity of its types is kept low.
They have several advantages, in particular that you can create complex pipelines and you can base them on a chosen monad, which allows you to say what side effects are allowed in a pipeline.
Using conduit, your example could be expressed as follows:
import Data.Functor.Identity
import Data.Conduit
import qualified Data.Conduit.List as C
ifactC :: (Num a, Monad m) => Producer m a
ifactC = loop 1 1
where
loop r n = let r' = r * n
in yield r' >> loop r' (n + 1)
sumC :: (Num a, Monad m) => Consumer a m a
sumC = C.fold (+) 0
main :: IO ()
main = (print . runIdentity) (ifactC $= C.isolate 5 $$ sumC)
-- alternatively running the pipeline in IO monad directly:
-- main = (ifactC $= C.isolate 5 $$ sumC) >>= print
Here we create a Producer
(a conduit that consumes no input) that yields factorials indefinitely. Then we compose it with isolate
, which ensures that no more than a given number of values are propagated through it, and then we compose it with a Consumer
that just sums values and returns the result.
Your examples are not equivalent in memory usage. It is easy to see if you replace *
with a +
(so that the numbers don't get big too quickly) and then run both examples on a big n
such as 10^7. Your Haskell version will consume a lot of memory and python will keep it low.
Python generator will not generate a list of values then sum it up. Instead, the sum
function will get values one-by-one from the generator and accumulate them. Thus, the memory usage will remain constant.
Haskell will evaluate functions lazily, but in order to calculate say foldl1 (+) (take n fact)
it will have to evaluate the complete expression. For large n
this will unfold into a huge expression the same way as (foldl (+) 0 [0..n])
does. For more details on evaluation and reduction have a look here: https://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27.
You can fix your sum n
by using foldl1'
instead of foldl1
as described on the link above. As @user2407038 explained in his comment, you'd also need to keep fact
local. The following works in GHC with a constant memory use:
let notfact = scanl1 (+) [1..]
let n = 20000000
let res = foldl' (+) 0 (take n notfact)
Note that in case of the actual factorial in place of notfact
memory considerations are less of a concern. The numbers will get big quickly, arbitrary-precision arithmetic will slow things down so you won't be able get to big values of n
in order to actually see the difference.
Basically, yes: Haskell's lazy-lists are a lot like Python's generators, if those generators were effortlessly cloneable, cacheable, and composeable. Instead of raising StopIteration
you return []
from your recursive function, which can thread state into the generator.
They do some cooler stuff due to self-recursion. For example, your factorial generator is more idiomatically generated like:
facts = 1 : zipWith (*) facts [1..]
or the Fibonaccis as:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
In general any iterative loop can be converted to a recursive algorithm by promoting the loop-state to arguments of a function and then calling it recursively to get the next loop cycle. Generators are just like that, but we prepend some elements each iteration of the recursive function, `go ____ = (stuff) : go ____.
The perfect equivalent is therefore:
ifact :: [Integer]
ifact = go 1 1
where go f i = f : go (f * i) (i + 1)
sum_fact n = sum (take n ifact)
In terms of what's fastest, the absolute fastest in Haskell will probably be the "for loop":
sum_fact n = go 1 1 1
where go acc fact i
| i <= n = go (acc + fact) (fact * i) (i + 1)
| otherwise = acc
The fact that this is "tail-recursive" (a call of go
does not pipe any sub-calls to go
to another function like (+)
or (*)
) means that the compiler can package it into a really tight loop, and that's why I'm comparing it with "for loops" even though that's not really a native idea to Haskell.
The above sum_fact n = sum (take n ifact)
is a little slower than this but faster than sum (take n facts)
where facts
is defined with zipWith
. The speed differences are not very large and I think mostly just come down to memory allocations that don't get used again.