Pandas read_csv without knowing whether header is present

using new feature - selection by callable:

cols = ['Name','Sex']

df = (pd.read_csv(filename, header=None, names=cols)
      [lambda x: np.ones(len(x)).astype(bool)
                 if (x.iloc[0] != cols).all()
                 else np.concatenate([[False], np.ones(len(x)-1).astype(bool)])]
)

using .query() method:

df = (pd.read_csv(filename, header=None, names=cols)
        .query('Name != "Name" and Sex != "Sex"'))

i'm not sure that this is the most elegant way, but this should work as well:

df = pd.read_csv(filename, header=None, names=cols)

if (df.iloc[0] == cols).all():
    df = df[1:].reset_index(drop=True)

I've come up with a way of detecting the header without prior knowledge of its names:

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    df = df[1:].reset_index(drop=True)

And by changing it slightly, it can update the current header with the detected one:

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    df = df[1:].reset_index(drop=True).rename(columns=df.iloc[0])

This would allow easily selecting the desired behavior:

update_header = True

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    new_header = df.iloc[0]

    df = df[1:].reset_index(drop=True)

    if update_header:
        df.rename(columns=new_header, inplace=True)

Pros:

  • Doesn't require prior knowledge of the header's names.
  • Can be used to update the header automatically if an existing one is detected.

Cons:

  • Won't work well if data contains strings. Replacing if any() to require all elements to be strings might help, unless data also contains entire rows of strings.

Tags:

Python

Pandas

Csv