Fastest way to read large binary file in Haskell?
To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.
In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.
The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).
Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.
With only streaming and bytestring, one can write something like:
import Data.ByteString
import Streaming
import qualified Streaming.Prelude as S
import System.IO
fromHandle :: Int -> Handle -> Stream (Of ByteString) IO ()
fromHandle chunkSize h =
S.untilRight $ do bytes <- Data.ByteString.hGet h chunkSize
pure $ if Data.ByteString.null bytes then Right ()
else Left bytes
Using hGet
, null
from bytestring, and untilRight
from streaming. You will need to use withFile
to get the Handle
, and consume the Stream
within the callback:
dump :: FilePath -> IO ()
dump file = withFile file ReadMode go
where
go :: Handle -> IO ()
go = S.mapM_ (Data.ByteString.hPut stdout) . fromHandle 4096