Calculating the mode in a multimodal list in Python

In Python >=2.7, use collections.Counter for frequency tables.

from collections import Counter
from itertools import takewhile

data = [1,1,2,3,4,4]
freq = Counter(data)
mostfreq = freq.most_common()
modes = list(takewhile(lambda x_f: x_f[1] == mostfreq[0][1], mostfreq))

Note the use of an anonymous function (lambda) that checks whether a pair (_, f) has the same frequency as the most frequent element.

Well, the first problem is that yes, you're returning the value in frequences rather than the key. That means you get the count of the mode, not the mode itself. Normally, to get the mode, you'd use the key keyword argument to max, like so:

>>> max(frequencies, key=counts.get())

But in 2.4 that doesn't exist! Here's an approach that I believe will work in 2.4:

>>> import random
>>> l = [random.randrange(0, 5) for _ in range(50)]
>>> frequencies = {}
>>> for i in l:
...     frequencies[i] = frequencies.get(i, 0) + 1
... 
>>> frequencies
{0: 11, 1: 13, 2: 8, 3: 8, 4: 10}
>>> mode = max((v, k) for k, v in frequencies.iteritems())[1]
>>> mode
1
>>> max_freq = max(frequencies.itervalues())
>>> modes = [k for k, v in frequencies.iteritems() if v == max_freq]
>>> modes
[1]

I prefer the decorate-sort-undecorate idiom to the cmp keyword. I think it's more readable. Could be that's just me.

Note that starting in Python 3.8, the standard library includes the statistics.multimode function to return a list of the most frequently occurring values in the order they were first encountered:

from statistics import multimode

multimode([1, 1, 2, 3, 4, 4])
# [1, 4]

Calculating the mode in a multimodal list in Python

Tags:

Python

Statistics

Related

Recent Posts