Read a file line by line from S3 using boto?
I know it's a very old question.
But as for now, we can just use s3_conn.get_object(Bucket=bucket, Key=key)['Body'].iter_lines()
You may find https://pypi.python.org/pypi/smart_open useful for your task.
From documentation:
for line in smart_open.smart_open('s3://mybucket/mykey.txt'):
print line
The codecs
module in the stdlib provides a simple way to encode a stream of bytes into a stream of text and provides a generator to retrieve this text line-by-line. It can be used with S3 without much hassle:
import codecs
import boto3
s3 = boto3.resource("s3")
s3_object = s3.Object('my-bucket', 'a/b/c.txt')
line_stream = codecs.getreader("utf-8")
for line in line_stream(s3_object.get()['Body']):
print(line)
Here's a solution which actually streams the data line by line:
from io import TextIOWrapper
from gzip import GzipFile
...
# get StreamingBody from botocore.response
response = s3.get_object(Bucket=bucket, Key=key)
# if gzipped
gzipped = GzipFile(None, 'rb', fileobj=response['Body'])
data = TextIOWrapper(gzipped)
for line in data:
# process line