hGetContents being too lazy
hGetContents
uses lazy IO; it only reads from the file as you force more of the string, and it only closes the file handle when you evaluate the entire string it returns. The problem is that you're enclosing it in withFile
; instead, just use openFile
and hGetContents
directly (or, more simply, readFile
). The file will still get closed once you fully evaluate the string. Something like this should do the trick, to ensure that the file is fully read and closed immediately by forcing the entire string beforehand:
import Control.Exception (evaluate)
readCode :: FilePath -> IO Code
readCode fileName = do
text <- readFile fileName
evaluate (length text)
return (parseCode text)
Unintuitive situations like this are one of the reasons people tend to avoid lazy IO these days, but unfortunately you can't change the definition of hGetContents
. A strict IO version of hGetContents
is available in the strict package, but it's probably not worth depending on the package just for that one function.
If you want to avoid the overhead that comes from traversing the string twice here, then you should probably look into using a more efficient type than String
, anyway; the Text
type has strict IO equivalents for much of the String
-based IO functionality, as does ByteString
(if you're dealing with binary data, rather than Unicode text).
hGetContents
isn't too lazy, it just needs to be composed with other things appropriately to get the desired effect. Maybe the situation would be clearer if it were were renamed exposeContentsToEvaluationAsNeededForTheRestOfTheAction
or just listen
.
withFile
opens the file, does something (or nothing, as you please -- exactly what you require of it in any case), and closes the file.
It will hardly suffice to bring out all the mysteries of 'lazy IO', but consider now this difference in bracketing
good file operation = withFile file ReadMode (hGetContents >=> operation >=> print)
bad file operation = (withFile file ReadMode hGetContents) >>= operation >>= print
-- *Main> good "lazyio.hs" (return . length)
-- 503
-- *Main> bad "lazyio.hs" (return . length)
-- 0
Crudely put, bad
opens and closes the file before it does anything; good
does everything in between opening and closing the file. Your first action was akin to bad
. withFile
should govern all of the action you want done that that depends on the handle.
You don't need a strictness enforcer if you are working with String
, small files, etc., just an idea how the composition works. Again, in bad
all I 'do' before closing the file is exposeContentsToEvaluationAsNeededForTheRestOfTheAction
. In good
I compose exposeContentsToEvaluationAsNeededForTheRestOfTheAction
with the rest of the action I have in mind, then close the file.
The familiar length
+ seq
trick mentioned by Patrick, or length
+ evaluate
is worth knowing; your second action with putStrLn txt
was a variant. But reorganization is better, unless lazy IO is wrong for your case.
$ time ./bad
bad: Prelude.last: empty list
-- no, lots of Chars there
real 0m0.087s
$ time ./good
'\n' -- right
()
real 0m15.977s
$ time ./seqing
Killed -- hopeless, attempting to represent the file contents
real 1m54.065s -- in memory as a linked list, before finding out the last char
It goes without saying that ByteString and Text are worth knowing about, but reorganization with evaluation in mind is better, since even with them the Lazy variants are often what you need, and they then involve grasping the same distinctions between forms of composition. If you are dealing with one of the (immense) class of cases where this sort of IO is inappropriate, take a look at enumerator
, conduit
and co., all wonderful.