Minimal cyclic shift algorithm explanation

First, I believe that your code has a bug in it. The last line should be return p;. I beleve that i holds the index of the lexicographically smallest cyclic shift, and p holds the smallest shift that matches. I also think that your stopping condition is too weak, i.e. you are doing too much checking after you have found a match, but I am not sure exactly what it should be.

Note that i and j only advance and that i is always less than j. We are looking for a string that matches the string starting at i, and we are trying to match it with a string that starts at j. We do this by comparing the k'th character of each string while increasing k (as long as they match). Note that we only change i if we determine that the string starting at j is lexicographically less than the string starting at j, and then we set i to j and reset k and p to their initial values.

I do not have time for a detailed analysis, but it looks like

i = the start of the lexicographic smallest cyclic shift
j = the start of the cyclic shift we are matching against the shift starting at i
k = the character in strings i and j currently under consideration (the strings match in positions 1 to k-1
p = the cyclic shift under consideration (i believe p stands for prefix)

Edit Going further

this section of code

    if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
        i=j++;
        k=p=1;

Moves the start of the comparison to a lexicographically earlier string when we find one and reinitializes everything else.

this section

   } else if (a<b) {
        j+=k; 
        k=1; 
        p=j-i;

is the tricky part. We have found a mismatch that is lexicographically later than our reference string, so we skip to the end of the text matched so far, and start matching from there. We also increase p (our stride). Why can we skip over all the starting points between j and j + k? This is because the string starting with i is the lexicographically smallest seen, and if the tail of the current j string is greater then the string at i then any suffix of the string at j will be greater than the string at i.

Finally

    } else if (a==b && k!=p) {
        k++;
    } else {
        j+=p; 
        k=1;

this just checks that the string of length p starting at i repeats.

**further edit* We do this by incrementing k until k == p, checking that the k'th character of the string starting at i equals the k'th character of the string starting at j. Once k reaches p we start scanning again at the next supposed occurrence of the string.

Even further edit to attempt to answer jethro's questions.

First: the k != p in else if (a==b && k!=p) Here we have a match in that the k'th and all previous characters in the strings starting at i and j are equal. The variable p represents the length that we think that the repeating string is. When k != p, actually k < p, so we are ensuring that the p characters at the string beginning at i are the same as the p characters of the string beginning at j. When k == p (the final else) we should be at a point where the string starting at j + k looks the same as the string starting at j, so we increase j by p and set k back to 1 and go back to comparing the two strings.

Second: Yes, I believe you are correct, it should return i. I was misunderstanding the meaning of "Minimum Cyclic Shift"

Minimal cyclic shift algorithm explanation

Tags:

Algorithm

String

Related

Recent Posts