How to efficiently calculate prefix sum of frequencies of characters in a string?
You can do it in one line using itertools.accumulate
and collections.Counter
:
from collections import Counter
from itertools import accumulate
s = 'AAABBBCAB'
psum = list(accumulate(map(Counter, s)))
This gives you a list of Counter
objects. Now, to get frequencies for any substring of s
in O(1) time, you can simply subtract counters, e.g.:
>>> psum[6] - psum[1] # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})
Simplest would be to use the Counter object from collections.
from collections import Counter
s = 'AAABBBCAB'
[ dict(Counter(s[:i]) for i in range(1,len(s))]
Yields:
[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2},
{'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}]
this is an option:
from collections import Counter
c = Counter()
s = 'AAABBBCAB'
psum = []
for char in s:
c.update(char)
psum.append(dict(c))
# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2},
# {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
# {'A': 4, 'B': 4, 'C': 1}]
i use collections.Counter
in order to keep a 'running sum' and add (a copy of the result) to the list psum
. this way i iterate once only over the string s
.
if you prefer to have collections.Counter
objects in your result, you could change the last line to
psum.append(c.copy())
in order to get
[Counter({'A': 1}), Counter({'A': 2}), ...
Counter({'A': 4, 'B': 4, 'C': 1})]
the same result could also be achieved with this (using accumulate
was first proposed in Eugene Yarmash's answer; i just avoid map
in favour of a generator expression):
from itertools import accumulate
from collections import Counter
s = "AAABBBCAB"
psum = list(accumulate(Counter(char) for char in s))
just for completeness (as there is no 'pure dict
' answer here yet). if you do not want to use Counter
or defaultdict
you could use this as well:
c = {}
s = 'AAABBBCAB'
psum = []
for char in s:
c[char] = c.get(char, 0) + 1
psum.append(c.copy())
although defaultdict
is usually more performant than dict.get(key, default)
.
You actually don't even need a counter for this, just a defaultdict would suffice!
from collections import defaultdict
c = defaultdict(int)
s = 'AAABBBCAB'
psum = []
#iterate through the character
for char in s:
#Update count for each character
c[char] +=1
#Add the updated dictionary to the output list
psum.append(dict(c))
print(psum)
The output looks like
[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1},
{'A': 3, 'B': 2}, {'A': 3, 'B': 3},
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
{'A': 4, 'B': 4, 'C': 1}]