Difference between map, applymap and apply methods in Pandas

Adding to the other answers, in a Series there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply just for the sake of clarity.

series.apply(download_file_for_every_element) 

Map can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.


Quick Summary

  • DataFrame.apply operates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.map operate on one element at time.

Series.apply and Series.map are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.


Comparing map, applymap and apply: Context Matters

First major difference: DEFINITION

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

Second major difference: INPUT ARGUMENT

  • map accepts dicts, Series, or callable
  • applymap and apply accept callables only

Third major difference: BEHAVIOR

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

  • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
  • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
  • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)).

Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply (note that there aren't many, but there are a few— apply is generally slow).


Summarising

enter image description here

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.