KMP prefix table

This code may not be the shortest, but easy to understand flow of code. Simple Java Code for calculating prefix-Array-

    String pattern = "ababaca";
    int i = 1, j = 0;
    int[] prefixArray = new int[pattern.length];
    while (i < pattern.length) {

        while (pattern.charAt(i) != pattern.charAt(j) && j > 0) {
            j = prefixArray[j - 1];

        }
        if (pattern.charAt(i) == pattern.charAt(j)) {
            prefixArray[i] = j + 1;
            i++;
            j++;

        } else {
            prefixArray[i] = j;
            i++;
        }
    }

    for (int k = 0; k < prefixArray.length; ++k) {
        System.out.println(prefixArray[k]);
    }

It produces the required output-

0 0 1 2 3 0 1

Python Implementation

p='ababaca'

l1 = len(p)

j = 0
i = 1
prefix = [0]

while len(prefix) < l1:
    if p[j] == p[i]:
        prefix.append(j+1)
        i += 1
        j += 1
    else:
        if j == 0:
            prefix.append(0)
            i += 1
        if j != 0:
            j = prefix[j-1]

print prefix

Every number belongs to corresponding prefix ("a", "ab", "aba", ...) and for each prefix it represents length of longest suffix of this string that matches prefix. We do not count whole string as suffix or prefix here, it is called self-suffix and self-prefix (at least in Russian, not sure about English terms).

So we have string "ababaca". Let's look at it. KMP computes Prefix Function for every non-empty prefix. Let's define s[i] as the string, p[i] as the Prefix function. prefix and suffix may overlap.

+---+----------+-------+------------------------+
| i |  s[0:i]  | p[i]  | Matching Prefix/Suffix |
+---+----------+-------+------------------------+
| 0 | a        |     0 |                        |
| 1 | ab       |     0 |                        |
| 2 | aba      |     1 | a                      |
| 3 | abab     |     2 | ab                     |
| 4 | ababa    |     3 | aba                    |
| 5 | ababac   |     0 |                        |
| 6 | ababaca  |     1 | a                      |
|   |          |       |                        |
+---+----------+-------+------------------------+

Simple C++ code that computes Prefix function of string S:

vector<int> prefixFunction(string s) {
    vector<int> p(s.size());
    int j = 0;
    for (int i = 1; i < (int)s.size(); i++) {
        while (j > 0 && s[j] != s[i])
            j = p[j-1];

        if (s[j] == s[i])
            j++;
        p[i] = j;
    }   
    return p;
}

KMP prefix table

Python Implementation

Tags:

Algorithm

String

Data Structures

Pattern Matching

Related

Recent Posts