How to find repeating sequence of characters in a given array?
In Python, you can leverage regexes thus:
def recurrence(text):
import re
for i in range(1, len(text)/2 + 1):
m = re.match(r'^(.{%d})\1+$'%i, text)
if m: return m.group(1)
recurrence('abcabc') # Returns 'abc'
I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)
For your examples, my first approach would be to
- get the first character of the array (for your last example, that would be
C
) - get the index of the next appearance of that character in the array (e.g. 9)
- if it is found, search for the next appearance of the substring between the two appearances of the character (in this case
CARPENTER
) - if it is found, you're done (and the result is this substring).
Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)
If the repeated word contains the first character multiple times (e.g. CACTUS
), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).
Note that this extended algorithm would give a different result for your second example, namely RONRON
instead of RON
.
Tongue-in-cheek O(NlogN) solution
Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.
First write a method that find repeating substring sub
in the container string as below.
boolean findSubRepeating(String sub, String container);
Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2
.