pandas .at versus .loc
As you asked about the limitations of .at
, here is one thing I recently ran into (using pandas 0.22). Let's use the example from the documentation:
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C'])
df2 = df.copy()
A B C
4 0 2 3
5 0 4 1
6 10 20 30
If I now do
df.at[4, 'B'] = 100
the result looks as expected
A B C
4 0 100 3
5 0 4 1
6 10 20 30
However, when I try to do
df.at[4, 'C'] = 10.05
it seems that .at
tries to conserve the datatype (here: int
):
A B C
4 0 100 10
5 0 4 1
6 10 20 30
That seems to be a difference to .loc
:
df2.loc[4, 'C'] = 10.05
yields the desired
A B C
4 0 2 10.05
5 0 4 1.00
6 10 20 30.00
The risky thing in the example above is that it happens silently (the conversion from float
to int
). When one tries the same with strings it will throw an error:
df.at[5, 'A'] = 'a_string'
ValueError: invalid literal for int() with base 10: 'a_string'
It will work, however, if one uses a string on which int()
actually works as noted by @n1k31t4 in the comments, e.g.
df.at[5, 'A'] = '123'
A B C
4 0 2 3
5 123 4 1
6 10 20 30
Update: df.get_value
is deprecated as of version 0.21.0. Using df.at
or df.iat
is the recommended method going forward.
df.at
can only access a single value at a time.
df.loc
can select multiple rows and/or columns.
Note that there is also df.get_value
, which may be even quicker at accessing single values:
In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop
In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop
In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop
Under the hood, df.at[...]
calls df.get_value
, but it also does some type checking on the keys.