Pandas: Replacement for .ix

You can stay in the world of a single loc by getting at the index values you need by slicing that particular index with positions.

df.loc[
    df['cap'].astype(float) > 35,
    df.columns[:-1]
]

How about breaking this into a two-step indexing:

df[df['cap'].astype(float) > 35].iloc[:,:-1]

or even:

df[df['cap'].astype(float) > 35].drop('cap',1)

Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).

For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):

df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5 
# (i.e. Boolean mask along axis 0, position slice of axis 1)

# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop

# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop

# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop

# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop

# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop

Bottom line: If you really want to avoid .ix, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .iloc with np.where() or .loc with integer slices of df.index or df.columns.

Pandas: Replacement for .ix

Tags:

Python

Pandas

Indexing

Related

Recent Posts