How to select column and rows in pandas without column or row names?
If your DataFrame does not have column/row labels and you want to select some specific columns then you should use iloc method.
example if you want to select first column and all rows:
df = dataset.iloc[:,0]
Here the df variable will contain the value stored in the first column of your dataframe.
Do remember that
type(df) -> pandas.core.series.Series
Hope it helps
To select filter column by index:
In [19]: df
Out[19]:
Date Close Close.1 Close.2 Close.3 Close.4
0 2000-01-0300:00:00 NaN NaN NaN NaN -0.033944
1 2000-01-0400:00:00 NaN NaN NaN NaN 0.035137
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN -0.017241
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN -0.004386
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN 0.039648
In [20]: df.ix[:, 5]
Out[20]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [21]: df.icol(5)
/usr/bin/ipython:1: FutureWarning: icol(i) is deprecated. Please use .iloc[:,i]
#!/usr/bin/python2
Out[21]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [22]: df.iloc[:, 5]
Out[22]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
To select all columns except index:
In [29]: df[[df.columns[i] for i in range(len(df.columns)) if i != 5]]
Out[29]:
Date Close Close.1 Close.2 Close.3
0 2000-01-0300:00:00 NaN NaN NaN NaN
1 2000-01-0400:00:00 NaN NaN NaN NaN
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN
If you want the fifth column:
df.ix[:,4]
Stick the colon in there to take all the rows for that column.
To exclude a fifth column you could try:
df.ix[:, (x for x in range(0, len(df.columns)) if x != 4)]
Use iloc
. It is explicitly a position based indexer. ix
can be both and will get confused if an index is integer based.
df.iloc[:, [4]]
For all but the fifth column
slc = list(range(df.shape[1]))
slc.remove(4)
df.iloc[:, slc]
or equivalently
df.iloc[:, [i for i in range(df.shape[1]) if i != 4]]