Count occurrences of a couple of specific words
Make a dict
-typed frequency table for your words, then iterate over the words in your string.
vocab = ["foo", "bar", "baz"]
s = "foo bar baz bar quux foo bla bla"
wordcount = dict((x,0) for x in vocab)
for w in re.findall(r"\w+", s):
if w in wordcount:
wordcount[w] += 1
Edit: if the "words" in your list contain whitespace, you can instead build an RE out of them:
from collections import Counter
vocab = ["foo bar", "baz"]
r = re.compile("|".join(r"\b%s\b" % w for w in vocab))
wordcount = Counter(re.findall(r, s))
Explanation: this builds the RE r'\bfoo bar\b|\bbaz\b'
from the vocabulary. findall
then finds the list ['baz', 'foo bar']
and the Counter
(Python 2.7+) counts the occurrence of each distinct element in it. Watch out that your list of words should not contain characters that are special to REs, such as ()[]\
.
Presuming the words need to be found separately (that is, you want to count words as made by str.split()
):
Edit: as suggested in the comments, a Counter is a good option here:
from collections import Counter
def count_many(needles, haystack):
count = Counter(haystack.split())
return {key: count[key] for key in count if key in needles}
Which runs as so:
count_many(["foo", "bar", "baz"], "testing somefoothing foo bar baz bax foo foo foo bar bar test bar test")
{'baz': 1, 'foo': 4, 'bar': 4}
Note that in Python <= 2.6(?) you will need to use return dict((key, count[key]) for key in count if key in needles)
due to the lack of dict comprehensions.
Of course, another option is to simply return the whole Counter
object and only get the values you need when you need them, as it may not be a problem to have the extra values, depending on the situation.
Old answer:
from collections import defaultdict
def count_many(needles, haystack):
count = defaultdict(int)
for word in haystack.split():
if word in needles:
count[word] += 1
return count
Which results in:
count_many(["foo", "bar", "baz"], "testing somefoothing foo bar baz bax foo foo foo bar bar test bar test")
defaultdict(<class 'int'>, {'baz': 1, 'foo': 4, 'bar': 4})
If you greatly object to getting a defaultdict back (which you shouldn't, as it functions exactly the same as a dict when accessing), then you can do return dict(count)
instead to get a normal dictionary.