How best to convert from azure blob csv format to pandas dataframe while running notebook in azure ml

I think you want to use get_blob_to_bytes, or get_blob_to_text; these should output a string which you can use to create a dataframe as

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME)
df = pd.read_csv(StringIO(blobstring))

Thanks for the answer, I think some correction is needed. You need to get content from the blob object and in the get_blob_to_text there's no need for the local file name.

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))

The accepted answer will not work in the latest Azure Storage SDK. MS has rewritten the SDK completely. It's kind of annoying if you are using the old version and update it. The below code should work in the new version.

from azure.storage.blob import ContainerClient
from io import StringIO
import pandas as pd

conn_str = ""
container = ""
blob_name = ""

container_client = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container
    )
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)

df = pd.read_csv(StringIO(downloaded_blob.content_as_text()))

How best to convert from azure blob csv format to pandas dataframe while running notebook in azure ml

Tags:

Python

Pandas

Azure

Azure Storage Blobs

Azure Machine Learning Studio

Related

Recent Posts