How to get pandas.DataFrame columns containing specific dtype
There's a new feature in 0.14.1, select_dtypes
to select columns by dtype, by providing a list of dtypes to include or exclude.
For example:
df = pd.DataFrame({'a': np.random.randn(1000),
'b': range(1000),
'c': ['a'] * 1000,
'd': pd.date_range('2000-1-1', periods=1000)})
df.select_dtypes(['float64','int64'])
Out[129]:
a b
0 0.153070 0
1 0.887256 1
2 -1.456037 2
3 -1.147014 3
...
dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:
headers = df.dtypes.index
it will return a list containing the column names of "df" dataframe.
Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64
or float64
objects, then you can create a dict of the column data types and then use the values to create your list of columns.
So for example, in a dataframe where I have columns of type float64
, int64
and object
firstly you can look at the data types as so:
DF.dtypes
and if they conform to the standard whereby the non-numeric columns of data are all object
types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:
[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]
Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...