Review an answer - Decode Ways

This is a really interesting problem. First, I will show how I would solve this problem. We will see that it is not that complicated when using recursion, and that the problem can be solved using dynamic programming. We will produce a general solution that does not hardcode an upper limit of 26 for each code point.

A note on terminology: I will use the term code point (CP) not in the Unicode sense, but to refer to one of the code numbers 1 though 26. Each code point is represented as a variable number of characters. I will also use the terms encoded text (ET) and clear text (CT) in their obvious meanings. When talking about a sequence or array, the first element is called the head. The remaining elements are the tail.

Theoretical Prelude

The EC "" has one decoding: the CT "".
The EC "3" can be destructured into '3' + "", and has one decoding.
The EC "23" can be destructured as '2' + "3" or '23' + "". Each of the tails has one decoding, so the whole EC has two decodings.
The EC "123" can be destructured as '1' + "23" or '12' + "3". The tails have two and one decodings respectively. The whole EC has three decodings. The destructuring '123' + "" is not valid, because 123 > 26, our upper limit.
… and so on for ECs of any length.

So given a string like "123", we can obtain the number of decodings by finding all valid CPs at the beginning, and summing up the number of decodings of each tail.

The most difficult part of this is to find valid heads. We can get the maximal length of the head by looking at a string representation of the upper limit. In our case, the head can be up to two characters long. But not all heads of appropriate lengths are valid, because they have to be ≤ 26 as well.

Naive Recursive Implementation

Now we have done all the necessary work for a simple (but working) recursive implementation:

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    // check base case for the recursion
    if (encodedText.length() == 0) {
        return 1;
    }

    // sum all tails
    int sum = 0;
    for (int headSize = 1; headSize <= maxHeadSize && headSize <= encodedText.length(); headSize++) {
        String head = encodedText.substring(0, headSize);
        String tail = encodedText.substring(headSize);
        if (Integer.parseInt(head) > upperLimit) {
            break;
        }
        sum += numDecodings(tail);
    }

    return sum;
}

Cached Recursive Implementation

Obviously this isn't very efficient, because (for longer ETs), the same tail will be analyzed multiple times. Also, we create a lot of temporary strings, but we'll let that be for now. One thing we can easily do is to memoize the number of decodings of a specific tail. For that, we use an array of the same length as the input string:

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    return numDecodings(encodedText, new Integer[1 + encodedText.length()]);
}

static int numDecodings(String encodedText, Integer[] cache) {
    // check base case for the recursion
    if (encodedText.length() == 0) {
        return 1;
    }

    // check if this tail is already known in the cache
    if (cache[encodedText.length()] != null) {
        return cache[encodedText.length()];
    }

    // cache miss -- sum all tails
    int sum = 0;
    for (int headSize = 1; headSize <= maxHeadSize && headSize <= encodedText.length(); headSize++) {
        String head = encodedText.substring(0, headSize);
        String tail = encodedText.substring(headSize);
        if (Integer.parseInt(head) > upperLimit) {
            break;
        }
        sum += numDecodings(tail, cache);  // pass the cache through
    }

    // update the cache
    cache[encodedText.length()] = sum;
    return sum;
}

Note that we use an Integer[], not an int[]. This way, we can check for non-existent entries using a test for null. This solution is not only correct, it is also comfortably fast – naive recursion runs in O(number of decodings) time, while the memoized version runs in O(string length) time.

Towards a DP Solution

When you run above code in your head, you will notice that the first invocation with the whole string will have a cache miss, then calculate the number of decodings for the first tail, which also misses the cache every time. We can avoid this by evaluating the tails first, starting from the end of the input. Because all tails will have been evaluated before the whole string is, we can remove the checks for cache misses. Now we also don't have any reason for recursion, because all previous results are already in the cache.

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    int[] cache = new int[encodedText.length() + 1];

    // base case: the empty string at encodedText.length() is 1:
    cache[encodedText.length()] = 1;

    for (int position = encodedText.length() - 1; position >= 0; position--) {
        // sum directly into the cache
        for (int headSize = 1; headSize <= maxHeadSize && headSize + position <= encodedText.length(); headSize++) {
            String head = encodedText.substring(position, position + headSize);
            if (Integer.parseInt(head) > upperLimit) {
                break;
            }
            cache[position] += cache[position + headSize];
        }
    }

    return cache[0];
}

This algorithm could be optimized further by noticing that we only ever query the last maxHeadSize elements in the cache. So instead of an array, we could use a fixed-sized queue. At that point, we would have a dynamic programming solution that runs in *O(input length) time and O(maxHeadSize) space.

Specialization for `upperLimit = 26`

The above algorithms were kept as general as possible, but we can go and manually specialize it for a specific upperLimit. This can be useful because it allows us to do various optimizations. However, this introduces “magic numbers” that make the code harder to maintain. Such manual specializations should therefore be avoided in non-critical software (and the above algorithm is already as fast as it gets).

static int numDecodings(String encodedText) {
    // initialize the cache
    int[] cache = {1, 0, 0};

    for (int position = encodedText.length() - 1; position >= 0; position--) {
        // rotate the cache
        cache[2] = cache[1];
        cache[1] = cache[0];
        cache[0] = 0;

        // headSize == 1
        if (position + 0 < encodedText.length()) {
            char c = encodedText.charAt(position + 0);

            // 1 .. 9
            if ('1' <= c && c <= '9') {
                cache[0] += cache[1];
            }
        }

        // headSize == 2
        if (position + 1 < encodedText.length()) {
            char c1 = encodedText.charAt(position + 0);
            char c2 = encodedText.charAt(position + 1);

            // 10 .. 19
            if ('1' == c1) {
                cache[0] += cache[2];
            }
            // 20 .. 26
            else if ('2' == c1 && '0' <= c2 && c2 <= '6') {
                cache[0] += cache[2];
            }
        }
    }

    return cache[0];
}

Comparision with your code

The code is superficially similar. However, your parsing around characters is more convoluted. You have introduced a used variable that, if set, will decrement the decode count in order to account for double-character CPs. This is wrong, but I am not sure why. The main problem is that you are doubling the count at almost every step. As we have seen, the previous counts are added, and may very well be different.

This indicates that you wrote the code without proper preparation. You can write many kinds of software without having to think too much, but you can't do without careful analysis when designing an algorithm. For me, it is often helpful to design an algorithm on paper, and draw diagrams of each step (along the lines of the “Theoretical Prelude” of this answer). This is especially useful when you are thinking too much about the language you are going to implement in, and too little about possibly wrong assumptions.

I suggest that you read up on “proofs by induction” to understand how to write a correct recursive algorithm. Once you have a recursive solution, you can always translate it into an iterative version.

So here is some what simpler way out for your problem. This is pretty close to calculating Fibonacci, with the difference that there are condition checks on each smaller size subproblem. The space complexity is O(1) and time is O(n)

The code is in C++.

   int numDecodings(string s)
   {
    if( s.length() == 0 ) return 0;


    int j  = 0;
    int p1 = (s[j] != '0' ? 1 : 0);         // one step prev form j=1
    int p2 = 1;                             // two step prev from j=1, empty
    int p = p1;

    for( int j = 1; j < s.length(); j++ )
    {
        p = 0;

        if( s[j] != '0' ) 
            p += p1;    


        if( isValidTwo(s, j-1, j) )
            p += p2;

        if( p==0 )                  // no further decoding necessary, 
            break;                  // as the prefix 0--j is has no possible decoding.

        p2 = p1;                    // update prev for next j+1;
        p1 = p;

    }

    return p;
    }

    bool isValidTwo(string &s, int i, int j)
    {
        int val= 10*(s[i]-'0')+s[j]-'0';

        if ( val <= 9 ) 
        return false;

        if ( val > 26 ) 
        return false;

        return true;

    }

Review an answer - Decode Ways

Theoretical Prelude

Naive Recursive Implementation

Cached Recursive Implementation

Towards a DP Solution

Specialization for `upperLimit = 26`

Comparision with your code

Tags:

Java

Related

Recent Posts

Review an answer - Decode Ways

Theoretical Prelude

Naive Recursive Implementation

Cached Recursive Implementation

Towards a DP Solution

Specialization for upperLimit = 26

Comparision with your code

Tags:

Java

Related

Specialization for `upperLimit = 26`