Memoization algorithm time complexity
At first, for a given string with length N, we can break it into N*(N-1)/2 segments, and check whether each segment is contained in the dictionary. This cost is O(N^2)
Come back to the your code, begin with the string N, we break it down into two smaller strings, length 'a', and 'N - a'. And for each sub string (or prefix) start from 0 and end at 'a', we only check it once!
From each segment N - a, it also check each of its prefix once and store it into the memorized table, so, this step will make sure that the next time, when we made the same move, split the string at this exactly place, we only need to return the result, without further work (With assumption that a map will retrieve and return the result for a particular string in O(1)). This storing and retrieving step also makes sure that the conquer and dividing is only done once.
So, after first and second point, we conclude that there is only N*(N - 1)/2 segment to be examined at only once time, which lead to the fact that the cost is O(N^2)
Note: Only with the assumption for cost of both dict.contains(input)
and memoized.containsKey(input)
are O(1) so the complexity is O(N^2).
Intuition
(it is clear that the worst case is the one where there are no solutions, so I focus on that)
Since the recursive call is made before putting values in the memoization-cache, the last (shortest) suffixes will get cached first. This is because the function is first called with a string of length N, which then calls itself with a string of length N-1, which then .... , with a string of len 0, which is cached and returns, then the string of length 1 is cached and returns, ..., length N is cached and returns.
As the hint suggests, only suffixes get cached, and there are only N of them. This means that by the time the top-level function gets the result of its first recursive call (i.e. on a suffix of length N-1), the cache is already populated with the N-1 suffixes.
Now, assuming the N-1 last suffixes are already cached, the for-loop needs to make N-1 recursive calls, each taking O(1) (since the answer is already cached), totalling O(N). However, (pre-) building the cache of the last N-1 takes O(N^2) (explained below), totalling with O(N)+O(N^2) = O(N^2).
Proof by mathematical induction
This explanation can easily be translated to a formal proof using induction. Here is the gist of it:
(f(N)
being the number of operations required for the function to complete on an input of length N)
Induction hypothesis -- there exists a constant c
s.t. f(N) < c*N^2
.
The base case is trivial -- for a string of length 1, you can find a constant c
such that f(1) < c*N^2 = c
Induction step
Recap of the order things happen:
Step 1: the function first calls itself on the suffix of length N-1, building a cache containing the answer for the last N-1 suffixes
Step 2: the function then calls itself O(N) more times, each taking O(1) (thanks to this test: if (memoized.containsKey(input))
, and the fact that the cache has already been populated in Step 1).
So we get:
f(N+1) = f(N) + a*N < (by the hypothesis)
< c*N^2 + a*N < (for c large enough)
< c*N^2 + 2*c*N + c =
= c*(N+1)^2
Thus we got f(N+1) < c*(N+1)^2
, which completes the proof.
In General, the dimensions of the memoized table decide the complexity.
For a memoized table of only 1 dimension (memoized[n]) -> takes O(n) complexity, For a memoized table of only 2 dimensions (memoized[n][n]) -> takes O(n^2) complexity and so on.
Reason: In case of memoization, worst case complexity is the running time of the inputs for which none of its cases (overlapping sub-problems) are not cached (pre-calculated) yet.
Now, lets say the memoization table has 2 dimensions (memoization[n][n]). Worst case complexity can only be the maximum of dimensions of the memoization table. Hence, its worst case can be only reach upto O(n^2).
Thus the dimensions of the Memoization table basically decides the worst case time complexity.
Answer of @shx2 explains this mathematically.