How to efficiently calculate prefix sum of frequencies of characters in a string?

You can do it in one line using itertools.accumulate and collections.Counter:

from collections import Counter
from itertools import accumulate

s = 'AAABBBCAB'
psum = list(accumulate(map(Counter, s)))

This gives you a list of Counter objects. Now, to get frequencies for any substring of s in O(1) time, you can simply subtract counters, e.g.:

>>> psum[6] - psum[1]  # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})

Simplest would be to use the Counter object from collections.

from collections import Counter

s = 'AAABBBCAB'

[ dict(Counter(s[:i]) for i in range(1,len(s))]

Yields:

[{'A': 1},  {'A': 2},  {'A': 3},  {'A': 3, 'B': 1},  {'A': 3, 'B': 2},
{'A': 3, 'B': 3},  {'A': 3, 'B': 3, 'C': 1},  {'A': 4, 'B': 3, 'C': 1}]

this is an option:

from collections import Counter

c = Counter()
s = 'AAABBBCAB'

psum = []
for char in s:
    c.update(char)
    psum.append(dict(c))

# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, 
#  {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
#  {'A': 4, 'B': 4, 'C': 1}]

i use collections.Counter in order to keep a 'running sum' and add (a copy of the result) to the list psum. this way i iterate once only over the string s.

if you prefer to have collections.Counter objects in your result, you could change the last line to

psum.append(c.copy())

in order to get

[Counter({'A': 1}), Counter({'A': 2}), ...
 Counter({'A': 4, 'B': 4, 'C': 1})]

the same result could also be achieved with this (using accumulate was first proposed in Eugene Yarmash's answer; i just avoid map in favour of a generator expression):

from itertools import accumulate
from collections import Counter

s = "AAABBBCAB"
psum = list(accumulate(Counter(char) for char in s))

just for completeness (as there is no 'pure dict' answer here yet). if you do not want to use Counter or defaultdict you could use this as well:

c = {}
s = 'AAABBBCAB'

psum = []
for char in s:
    c[char] = c.get(char, 0) + 1
    psum.append(c.copy())

although defaultdict is usually more performant than dict.get(key, default).

You actually don't even need a counter for this, just a defaultdict would suffice!

from collections import defaultdict

c = defaultdict(int)
s = 'AAABBBCAB'

psum = []

#iterate through the character
for char in s:
    #Update count for each character
    c[char] +=1
    #Add the updated dictionary to the output list
    psum.append(dict(c))

print(psum)

The output looks like

[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, 
{'A': 3, 'B': 2}, {'A': 3, 'B': 3}, 
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, 
{'A': 4, 'B': 4, 'C': 1}]

How to efficiently calculate prefix sum of frequencies of characters in a string?

Tags:

Python

String

Python 3.X

Related

Recent Posts