Counting repeated characters in a string in Python
Python 2.7+ includes the collections.Counter class:
import collections
results = collections.Counter(the_string)
print(results)
My first idea was to do this:
chars = "abcdefghijklmnopqrstuvwxyz"
check_string = "i am checking this string to see how many times each character appears"
for char in chars:
count = check_string.count(char)
if count > 1:
print char, count
This is not a good idea, however! This is going to scan the string 26 times, so you're going to potentially do 26 times more work than some of the other answers. You really should do this:
count = {}
for s in check_string:
if s in count:
count[s] += 1
else:
count[s] = 1
for key in count:
if count[key] > 1:
print key, count[key]
This ensures that you only go through the string once, instead of 26 times.
Also, Alex's answer is a great one - I was not familiar with the collections module. I'll be using that in the future. His answer is more concise than mine is and technically superior. I recommend using his code over mine.
import collections
d = collections.defaultdict(int)
for c in thestring:
d[c] += 1
A collections.defaultdict
is like a dict
(subclasses it, actually), but when an entry is sought and not found, instead of reporting it doesn't have it, it makes it and inserts it by calling the supplied 0-argument callable. Most popular are defaultdict(int)
, for counting (or, equivalently, to make a multiset AKA bag data structure), and defaultdict(list)
, which does away forever with the need to use .setdefault(akey, []).append(avalue)
and similar awkward idioms.
So once you've done this d
is a dict-like container mapping every character to the number of times it appears, and you can emit it any way you like, of course. For example, most-popular character first:
for c in sorted(d, key=d.get, reverse=True):
print '%s %6d' % (c, d[c])