Efficiently remove the last two lines of an extremely large text file

I haven't tried this on a large file to see how fast it is, but it should be fairly quick.

To use the script to remove lines from the end of a file:

./shorten.py 2 large_file.txt

It seeks to the end of the file, checks to make sure the last character is a newline, then reads each character one at a time going backwards until it's found three newlines and truncates the file just after that point. The change is made in place.

Edit: I've added a Python 2.4 version at the bottom.

Here is a version for Python 2.5/2.6:

#!/usr/bin/env python2.5
from __future__ import with_statement
# also tested with Python 2.6

import os, sys

if len(sys.argv) != 3:
    print sys.argv[0] + ": Invalid number of arguments."
    print "Usage: " + sys.argv[0] + " linecount filename"
    print "to remove linecount lines from the end of the file"
    exit(2)

number = int(sys.argv[1])
file = sys.argv[2]
count = 0

with open(file,'r+b') as f:
    f.seek(0, os.SEEK_END)
    end = f.tell()
    while f.tell() > 0:
        f.seek(-1, os.SEEK_CUR)
        char = f.read(1)
        if char != '\n' and f.tell() == end:
            print "No change: file does not end with a newline"
            exit(1)
        if char == '\n':
            count += 1
        if count == number + 1:
            f.truncate()
            print "Removed " + str(number) + " lines from end of file"
            exit(0)
        f.seek(-1, os.SEEK_CUR)

if count < number + 1:
    print "No change: requested removal would leave empty file"
    exit(3)

Here's a Python 3 version:

#!/usr/bin/env python3.0

import os, sys

if len(sys.argv) != 3:
    print(sys.argv[0] + ": Invalid number of arguments.")
    print ("Usage: " + sys.argv[0] + " linecount filename")
    print ("to remove linecount lines from the end of the file")
    exit(2)

number = int(sys.argv[1])
file = sys.argv[2]
count = 0

with open(file,'r+b', buffering=0) as f:
    f.seek(0, os.SEEK_END)
    end = f.tell()
    while f.tell() > 0:
        f.seek(-1, os.SEEK_CUR)
        print(f.tell())
        char = f.read(1)
        if char != b'\n' and f.tell() == end:
            print ("No change: file does not end with a newline")
            exit(1)
        if char == b'\n':
            count += 1
        if count == number + 1:
            f.truncate()
            print ("Removed " + str(number) + " lines from end of file")
            exit(0)
        f.seek(-1, os.SEEK_CUR)

if count < number + 1:
    print("No change: requested removal would leave empty file")
    exit(3)

Here is a Python 2.4 version:

#!/usr/bin/env python2.4

import sys

if len(sys.argv) != 3:
    print sys.argv[0] + ": Invalid number of arguments."
    print "Usage: " + sys.argv[0] + " linecount filename"
    print "to remove linecount lines from the end of the file"
    sys.exit(2)

number = int(sys.argv[1])
file = sys.argv[2]
count = 0
SEEK_CUR = 1
SEEK_END = 2

f = open(file,'r+b')
f.seek(0, SEEK_END)
end = f.tell()

while f.tell() > 0:
    f.seek(-1, SEEK_CUR)
    char = f.read(1)
    if char != '\n' and f.tell() == end:
        print "No change: file does not end with a newline"
        f.close()
        sys.exit(1)
    if char == '\n':
        count += 1
    if count == number + 1:
        f.truncate()
        print "Removed " + str(number) + " lines from end of file"
        f.close()
        sys.exit(0)
    f.seek(-1, SEEK_CUR)

if count < number + 1:
    print "No change: requested removal would leave empty file"
    f.close()
    sys.exit(3)

you can try GNU head

head -n -2 file

I see my Debian Squeeze/testing systems (but not Lenny/stable) include a "truncate" command as part of the "coreutils" package.

With it you could simply do something like

truncate --size=-160 myfile

to remove 160 bytes from the end of the file (obviously you need to figure out exactly how many characters you need to remove).