Python binary EOF

Here is what I did. The call to read returns a falsy value when it encounters the end of the file, and this terminates the loop. Using while ch != "": copied the image but it gave me a hung loop.

from sys import argv
donor = argv[1]
recipient = argv[2]
# read from donor and write into recipient
# with statement ends, file gets closed
with open(donor, "rb") as fp_in:
    with open(recipient, "wb") as fp_out:
        ch = fp_in.read(1)
        while ch:
            fp_out.write(ch)
            ch = fp_in.read(1)

To quote the documentation:

file.read([size])

Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.

That means (for a regular file):

  • f.read(1) will return a byte object containing either 1 byte or 0 byte is EOF was reached
  • f.read(2) will return a byte object containing either 2 bytes, or 1 byte if EOF is reached after the first byte, or 0 byte if EOF in encountered immediately.
  • ...

If you want to read your file one byte at a time, you will have to read(1) in a loop and test for "emptiness" of the result:

# From answer by @Daniel
with open(filename, 'rb') as f:
    while True:
        b = f.read(1)
        if not b:
            # eof
            break
        do_something(b)

If you want to read your file by "chunk" of say 50 bytes at a time, you will have to read(50) in a loop:

with open(filename, 'rb') as f:
    while True:
        b = f.read(50)
        if not b:
            # eof
            break
        do_something(b) # <- be prepared to handle a last chunk of length < 50
                        #    if the file length *is not* a multiple of 50

In fact, you may even break one iteration sooner:

with open(filename, 'rb') as f:
    while True:
        b = f.read(50)
        do_something(b) # <- be prepared to handle a last chunk of size 0
                        #    if the file length *is* a multiple of 50
                        #    (incl. 0 byte-length file!)
                        #    and be prepared to handle a last chunk of length < 50
                        #    if the file length *is not* a multiple of 50
        if len(b) < 50:
            break

Concerning the other part of your question:

Why does the container [..] contain [..] a whole bunch of them [bytes]?

Referring to that code:

for x in file:  
   i=i+1  
   print(x)  

To quote again the doc:

A file object is its own iterator, [..]. When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing).

The the code above read a binary file line-by-line. That is stopping at each occurrence of the EOL char (\n). Usually, that leads to chunks of various length as most binary files contains occurrences of that char randomly distributed.

I wouldn't encourage you to read a binary file that way. Please prefer one a solution based on read(size).


Reading byte-by-byte:

with open(filename, 'rb') as f:
    while True:
        b = f.read(1)
        if not b:
            # eof
            break
        do_something(b)

"" will signify the end of the file

with open(filename, 'rb') as f:
    for ch in iter(lambda: f.read(1),""): # keep calling f.read(1) until end of the data
        print ch

Tags:

Python

Binary

Eof