Is there a fast, functional prime generator?

Here's a Haskell implementation of Melissa O'Neill's algorithm (from the linked article). Unlike the implementation that Gassa linked to, I've made minimal use of laziness, so that the performance analysis is clear -- O(n log n log log n), i.e., linearithmic in n log log n, the number of writes made by the imperative Sieve of Eratosthenes.

The heap implementation is just a tournament tree. The balancing logic is in push; by swapping the children every time, we ensure that, for every branch, the left subtree is the same size or one bigger compared to the right subtree, which ensures depth O(log n).

module Sieve where

type Nat = Int

data Heap = Leaf !Nat !Nat
          | Branch !Nat !Heap !Heap
          deriving Show

top :: Heap -> Nat
top (Leaf n _) = n
top (Branch n _ _) = n

leaf :: Nat -> Heap
leaf p = Leaf (3 * p) p

branch :: Heap -> Heap -> Heap
branch h1 h2 = Branch (min (top h1) (top h2)) h1 h2

pop :: Heap -> Heap
pop (Leaf n p) = Leaf (n + 2 * p) p
pop (Branch _ h1 h2)
  = case compare (top h1) (top h2) of
        LT -> branch (pop h1) h2
        EQ -> branch (pop h1) (pop h2)
        GT -> branch h1 (pop h2)

push :: Nat -> Heap -> Heap
push p h@(Leaf _ _) = branch (leaf p) h
push p (Branch _ h1 h2) = branch (push p h2) h1

primes :: [Nat]
primes
  = let helper n h
          = case compare n (top h) of
                LT -> n : helper (n + 2) (push n h)
                EQ -> helper (n + 2) (pop h)
                GT -> helper n (pop h)
      in 2 : 3 : helper 5 (leaf 3)

Here it is, if (Haskell's) pure arrays count as pure (they should, IMO). The complexity is obviously O(n log (log n)), provided accumArray indeed spends O(1) time for each entry it is given, as it ought to:

import Data.Array.Unboxed 
import Data.List (tails, inits)

ps = 2 : [ n | (r:q:_, px) <- (zip . tails . (2:) . map (^2)) ps (inits ps),
               (n,True)    <- assocs (
                                accumArray (\_ _ -> False) True (r+1,q-1)
                                  [(m,()) | p <- px, let s = (r+p)`div`p*p, 
                                            m <- [s,s+p..q-1]] :: UArray Int Bool) ]

Calculates primes by segments between successive squares of primes (the map (^2) bit), generating the composites by enumerating the multiples of growing prefixes of primes (the inits bit) just as any proper sieve of Eratosthenes would, by repeated additions.

So, the primes {2,3} are used to sieve a segment from 10 to 24; {2,3,5} from 26 to 48; and so on. See also.

Also, a Python generator-based sieve might be considered functional as well. Python's dicts are extremely well-performing, empirically, though I'm not sure about the exact cost of the multiples over-producing scheme used there to avoid duplicate composites.


update: testing it indeed produces favorable results, as expected:

{-     original heap       tweaked           nested-feed         array-based
          (3*p,p)         (p*p,2*p)            JBwoVL              abPSOx
          6Uv0cL          2x speed-up     another 3x+ speed-up
                n^                n^                  n^                  n^
100K:  0.78s             0.38s               0.13s              0.065s    
200K:  2.02s   1.37      0.97s   1.35        0.29s   1.16       0.13s    1.00
400K:  5.05s   1.32      2.40s   1.31        0.70s   1.27       0.29s    1.16
800K: 12.37s   1.29                     1M:  2.10s   1.20       0.82s    1.13
 2M:                                                            1.71s    1.06
 4M:                                                            3.72s    1.12
10M:                                                            9.84s    1.06 
    overall in the tested range:
               1.33                                  1.21                1.09
-}

with empirical orders of growth calculated for producing n primes, where O(n log log n) is commonly seen as n1.05...1.10 and O(n log n log log n) as n1.20...1.25.

The "nested-feed" variant implements the postponement technique (as also seen in the above linked Python answer) that achieves quadratic reduction of the heap size which evidently has a noticeable impact on the empirical complexity, even if not quite reaching the still better results for the array-based code of this answer, which is able to produce 10 million primes in under 10 seconds on ideone.com (with overall growth rate of just n1.09 in the tested range).

("original heap" is of course the code from the other answer here).