hGetContents being too lazy

hGetContents uses lazy IO; it only reads from the file as you force more of the string, and it only closes the file handle when you evaluate the entire string it returns. The problem is that you're enclosing it in withFile; instead, just use openFile and hGetContents directly (or, more simply, readFile). The file will still get closed once you fully evaluate the string. Something like this should do the trick, to ensure that the file is fully read and closed immediately by forcing the entire string beforehand:

import Control.Exception (evaluate)

readCode :: FilePath -> IO Code
readCode fileName = do
    text <- readFile fileName
    evaluate (length text)
    return (parseCode text)

Unintuitive situations like this are one of the reasons people tend to avoid lazy IO these days, but unfortunately you can't change the definition of hGetContents. A strict IO version of hGetContents is available in the strict package, but it's probably not worth depending on the package just for that one function.

If you want to avoid the overhead that comes from traversing the string twice here, then you should probably look into using a more efficient type than String, anyway; the Text type has strict IO equivalents for much of the String-based IO functionality, as does ByteString (if you're dealing with binary data, rather than Unicode text).


hGetContents isn't too lazy, it just needs to be composed with other things appropriately to get the desired effect. Maybe the situation would be clearer if it were were renamed exposeContentsToEvaluationAsNeededForTheRestOfTheAction or just listen.

withFile opens the file, does something (or nothing, as you please -- exactly what you require of it in any case), and closes the file.

It will hardly suffice to bring out all the mysteries of 'lazy IO', but consider now this difference in bracketing

 good file operation = withFile file ReadMode (hGetContents >=> operation >=> print)
 bad file operation = (withFile file ReadMode hGetContents) >>= operation >>= print

-- *Main> good "lazyio.hs" (return . length)
-- 503
-- *Main> bad "lazyio.hs" (return . length)
-- 0

Crudely put, bad opens and closes the file before it does anything; good does everything in between opening and closing the file. Your first action was akin to bad. withFile should govern all of the action you want done that that depends on the handle.

You don't need a strictness enforcer if you are working with String, small files, etc., just an idea how the composition works. Again, in bad all I 'do' before closing the file is exposeContentsToEvaluationAsNeededForTheRestOfTheAction. In good I compose exposeContentsToEvaluationAsNeededForTheRestOfTheAction with the rest of the action I have in mind, then close the file.

The familiar length + seq trick mentioned by Patrick, or length + evaluate is worth knowing; your second action with putStrLn txt was a variant. But reorganization is better, unless lazy IO is wrong for your case.

$ time ./bad
bad: Prelude.last: empty list  
                        -- no, lots of Chars there
real    0m0.087s

$ time ./good
'\n'                -- right
()
real    0m15.977s

$ time ./seqing 
Killed               -- hopeless, attempting to represent the file contents
    real    1m54.065s    -- in memory as a linked list, before finding out the last char

It goes without saying that ByteString and Text are worth knowing about, but reorganization with evaluation in mind is better, since even with them the Lazy variants are often what you need, and they then involve grasping the same distinctions between forms of composition. If you are dealing with one of the (immense) class of cases where this sort of IO is inappropriate, take a look at enumerator, conduit and co., all wonderful.