KeyError when indexing Pandas dataframe
As mentioned by alko, it is probably extra character at the beginning of your file.
When using read_csv
, you can specify encoding
to deal with encoding and heading character, known as BOM (Byte order mark)
df = pd.read_csv('values.csv', delimiter=',', encoding="utf-8-sig")
This question finds some echoes on Stackoverflow: Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError
It is almost always one of these reasons
- You spelled the column name wrong
- There are leading/trailing whitespaces
- in this case, use
df.columns = df.columns.str.strip()
to remove them, or revisit yourpd.read_csv
(or other IO function) call to see if you can remove them while parsing input
- in this case, use
- Your column is not actually a column, but an index level
- you can check the index level names using
df.index.names
to see if it is there. Calling.reset_index()
before selecting the column should fix it.
- you can check the index level names using
- Your DataFrame does not have the column, at all
- it was all just a figment of your imagination. Please turn off your system and take a nap.
Regardless of the reason, the first step is to stop what you're doing and run print(df.columns.tolist())
and eyeball the result to see which of these 4 possible reasons it could be.
You most likely have an extra character at the beginning of your file, that is prepended to your first column name, 'Date'
. Simply Copy / Paste your output to a non-unicode console produces.
Index([u'?Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')