How to find the minimum number of operation(s) to make the string balanced?
I don't think you really need dynamic programming here.
One approach in O(length(S)) time:
- Iterate over S, build a frequency-map (a mapping from distinct letters A–Z to counts). For your
ABCB
example, that would beA->1 B->2 C->1 D->0 E->0 ... Z->0
, which we can represent as the array[1, 2, 1, 0, 0, ..., 0]
.- We can do this because we don't actually care about the positions of letters at all; there's no real difference between
ABCB
andABBC
, in that each can be balanced by replacing itsC
with anA
.
- We can do this because we don't actually care about the positions of letters at all; there's no real difference between
- Sort the array. For your example, that gives
[0, 0, ..., 0, 1, 1, 2]
.- We can do this because we don't actually care about which letter was which; there's no real difference between
ABCB
andABDB
, in that each can be balanced by replacing one of its singleton letters with its other one.
- We can do this because we don't actually care about which letter was which; there's no real difference between
- Assuming the string is nonempty (since if it's empty then the answer is just 0), the final balanced string will contain at least 1 and at most 26 distinct letters. For each integer i between 1 and 26, determine how many operations you'd need to perform in order to produce a balanced string with i distinct letters:
- First, check that length(S) is a multiple of i; if not, this isn't possible, so skip to the next integer.
- Next, compute length(S)/i, the count of each distinct letter in the final balanced string.
- To count how many operations need to be performed, we're going to examine all the counts that need to be increased. (We could, equivalently, examine the counts that need to be decreased: they'll have to match.)
- We're only interested in the last i elements of the sorted array. Any elements before that point represent letters that won't occur in the balanced string: either the counts are already zero and will remain so, or they are nonzero but will be decreased to zero. Either way, since we're only tracking increases, we can ignore them.
- For each of the last i counts that are less than length(S)/i, add the difference. This sum is the number of operations. (Note that, since the counts are sorted, you can exit this inner loop as soon as you get to a count that's greater than or equal to the target count.)
- You can exit this loop after the first i that's greater than or equal to the number of distinct letters in the original S (aside from values of i that we had to skip because they don't evenly divide length(S)). For example, if length(S) = 100, and the original S has 19 distinct letters, then we only need to consider i as high as 20. (Hat-tip to Eric Wang for suggesting something along these lines.)
- Return the least of these up-to-26 sums. (Note that you don't actually need to store all the sums; you can just keep track of the minimum.)