Finding the shortest repetitive pattern in a string

To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:

result = regexp(str, '^(.+?)(?=\1*$)', 'match');

Some examples:

>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234'

>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234123412341234123'

>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby'

>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby1lullaby2'

I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.

It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.

EDIT: I made the code more compact.

str = 'lullabylullabylullaby';

pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;

while sPtr <= length(str)
     if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
            matchingState = true;
            pPtr = pPtr + 1;
            pPtr = mod(pPtr-1,length(pattern)) + 1;
     else                          %// if match fails, extend pattern string and start again
            if matchingState
                sPtr = sPtr - 1;   %// don't change str index when transitioning out of matching state
            end  
            matchingState = false;
            pattern = str(1:sPtr);
            pPtr = 1;
     end

     sPtr = sPtr + 1;

end

display(pattern);

The output is:

pattern =

lullaby

Note:

This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then

pattern =

lullaby1lullaby2

This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in

pattern =

lullaby1lullaby2

To fix this you could add the lines

if pPtr ~= length(pattern)
    pattern = str;
end

Another approach is as follows:

  1. determine length of string, and find all possible factors of the string length value
  2. for each possible factor length, reshape the string and check for a repeated substring

To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.

function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
    repeat = str(1:nf(ii));
    if all(ismember(reshape(str,nf(ii),[])',repeat)); 
        break;
    end
end