Shortest simple regex matching a binary word
JavaScript (ES6), 488 341 bytes
s=>[s.replace(/(.)\1+/g,'$1+'),...[...Array(60)].map((_,i)=>`(${(i+4).toString(2).slice(1)})+`),...[...Array(1536)].map((_,i)=>`${i>>10?(i>>8&1)+(i&2?'+':''):''}(${i&1}${i&4?i>>4&1:i&16?'+':''}${i&8?''+(i>>7&1)+(i&64?i>>5&1:i&32?'+':''):''})+${i&512?(i>>8&1)+(i&2?'+':''):''}`)].filter(r=>s.match(`^${r}$`)).sort((a,b)=>a.length-b.length)[0]
Explanation: Since six regexes can express all possible binary words, and the longest two are nine characters long, it suffices to check those and all shorter regexes. One candidate is obviously the string with "run length encoding" (i.e. all digit runs replaced with appropriate +
s), but also strings with one set of ()
s need to be checked. I generate 1596 such regexes (this includes duplicates and useless regexes but they'll just be eliminated) and test all 1597 to see which is the shortest match. The generated regexes fall into two types: \(\d{2,5}\)\+
(60 regexes) and (\d\+?)?\(\d[\d+]?(\d[\d+]?)?\)(\d\+?)?
(1536 regexes as I avoid generating regexes with both leading and trailing digit).
Pyth - 31 30 29 bytes
Brute force! Can probably golf the iterator a little.
f=+Yf.x:zjY"^$")Z^"10+()"T1Y
Test Suite.
Pyth, 20 bytes
hf.x}z:zT1Zy*4"()01+
This takes approximately 30 seconds to run, so it needs to be run offline.
Explanation:
hf.x}z:zT1Zy*4"()01+
Implicit: z is the input string.
"()01+ "()01+"
*4 Repeated 4 times
y All subsequences in length order
hf Output the first one such that
:zT1 Form all regex matches of z with the candidate string
}z Check if the input is one of the strings
.x Z Discard errors
I'm not completely sure that every shortest string is a subsequence of "()01+" * 4, but 4 can be increased to 9 at no byte cost if needed.