Triangularizing a list in Haskell

This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.

So, out of curiosity, let's see if we can get a diagonalize function that way. Define the infinite list of Cantor pairs recursively in Haskell:

auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
    let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
    in (x,y) : auxCantorPairList nextPair

cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)

And try that inside ghci:

 λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
 λ> 

We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:

 λ> 
 λ> xs = [1..]
 λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
 λ> 

We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:

 λ> 
 λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
 λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
 λ> 
 λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
 λ> 

From there, we can write our first draft of a diagonalize function:

 λ> 
 λ> printAsLines xs = mapM_ (putStrLn . show) xs
 λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
 λ> 
 λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
 λ> 

EDIT: performance update

For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.

Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.

To improve performance, we can use a Data.Map structure for the target sublists.


{-#  LANGUAGE  ExplicitForAll       #-}
{-#  LANGUAGE  ScopedTypeVariables  #-}

import qualified  Data.List  as  L
import qualified  Data.Map   as  M

type MIL a = M.Map Integer [a]

buildCantorMap :: forall a.  [a] -> MIL a
buildCantorMap xs = 
    let   ts     =  zip xs cantorPairList -- triplets (a,(x,y))
          m0     = (M.fromList [])::MIL a
          redOp m (n,(x,y)) = let  afn as = case as of
                                              Nothing  -> Just [n]
                                              Just jas -> Just (n:jas)
                              in   M.alter afn x m
          m1r = L.foldl' redOp m0 ts
    in
          fmap reverse m1r

diagonalize :: [a] -> [[a]]
diagonalize xs = let  cm = buildCantorMap xs
                 in   map snd $ M.toAscList cm


With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.


Make increasing size chunks:

chunks :: [a] -> [[a]]
chunks = go 0 where
    go n [] = []
    go n as = b : go (n+1) e where (b,e) = splitAt n as

Then just transpose twice:

diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks

Try it in ghci:

> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]

It might be a good idea to craete a comb filter.

So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;

comb :: [Bool]  -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id

combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ []          = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
                         in if c then (x:f,s) else (f,x:s)

λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])

Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.

Lets do it;

diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
           in h : diags t

λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]

also seems to be lazy too :)

λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]

I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?