Pandas df.itertuples renaming dataframe columns when printing

This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:

df.columns = ['us_qqq_equity', 'us_spy_equity'] 
# df.columns = df.columns.str.replace(r'\s+', '_')  # Courtesy @MaxU  
for r in df.head().itertuples():
    print(r)

# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...

Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.

Interesting observation: out of DataFrame.iterrows(), DataFrame.iteritems(), DataFrame.itertuples() only the last one renames the columns, containing spaces:

In [140]: df = df.head(3)

In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity    0.0
  us spy equity    1.0
  Name: 2017-06-19 00:00:00, dtype: float64),
 (Timestamp('2017-06-20 00:00:00'), us qqq equity    0.0
  us spy equity   -1.0
  Name: 2017-06-20 00:00:00, dtype: float64),
 (Timestamp('2017-06-21 00:00:00'), us qqq equity    0.0
  us spy equity    0.0
  Name: 2017-06-21 00:00:00, dtype: float64)]

In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
  2017-06-19    0.0
  2017-06-20    0.0
  2017-06-21    0.0
  Name: us qqq equity, dtype: float64), ('us spy equity', date
  2017-06-19    1.0
  2017-06-20   -1.0
  2017-06-21    0.0
  Name: us spy equity, dtype: float64)]

In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
 Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
 Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]

Pandas df.itertuples renaming dataframe columns when printing

Tags:

Python

Pandas

Iteration

Dataframe

Related

Recent Posts