Filter pandas dataframe with specific column names in python
You can just put mylist
inside []
and pandas will select it for you.
mydata_new = mydata[mylist]
Not sure whether your yyy
is a typo.
The reason that you are wrong is that you are assigning mydata_new
to a new series every time in the loop.
for item in mylist:
mydata_new = mydata[item] # <-
Thus, it will create a series rather than the whole df you want.
If some names in the list is not in your data frame, you can always check it with,
len(set(mylist) - set(mydata.columns)) > 0
and print it out
print(set(mylist) - set(mydata.columns))
Then see if there are typos or other unintended behaviors.
Just pass a list of column names to index df
:
df[['nnn', 'mmm', 'yyy']]
nnn mmm yyy
0 5 5 10
1 3 4 9
2 7 0 8
If you need to handle non-existent column names in your list, try filtering with df.columns.isin
-
df.loc[:, df.columns.isin(['nnn', 'mmm', 'yyy', 'zzzzzz'])]
yyy nnn mmm
0 10 5 5
1 9 3 4
2 8 7 0