Find all unambiguous prefixes of a set of strings

Python 2.7 - 146 141 bytes

l=raw_input().split(',')
for w in l:
 for a in range(len(w)):
    e=w[:a+1]
    if e==w or len(filter(lambda b:b.startswith(e),l))==1:print e+':'+w

Note that the indentation on lines 4 and 5 is not 4 spaces, that's a side effect of SE's markdown interpreter. That's a literal tab character, so only one byte.

This is not technically up to spec, but I'll change it if Doorknob clarifies. It uses newlines instead of commas to separate the output. For example:

$ python2 abbreviations.py <<< code,golf,golfing
c:code
co:code
cod:code
code:code
golf:golf
golfi:golfing
golfin:golfing
golfing:golfing

New: I was able to get rid of 5 characters by assigning the string that I'm checking to a variable e. This means that I only have to type e instead of w[:a] three times. It also means I save characters by doing e=w[:a+1] and changing ...range(1,len(w)+1) to range(len(w)).

Explanation:

l=raw_input().split(',') # Gets a line of input from stdin and splits it at every ',' to make a list
for w in l: # For each word in that list...

 for a in range(1,len(w)+1): # For each number a from 1 to the length of that word...

    if (w[:a]==w # w[:a] gets the string w up to the ath index. For example, 'aeiou'[:3] == 'aei'.
                 # We're testing every possible w[:a] to see if it's a unique abbreviation.
                 # However, a word is always its own abbreviation, so we hardcode that in by testing
                 # if w[:a] is the same as w.

or len(filter( # filter takes a function and an iterable as an argument, and returns a list of every
               # element of that iterable where that_function(that_element) returns a True-y value

lambda b:b.startswith(w[:a]),l) # We define an anonymous function that returns True for any string
                                # that begins with our current w[:a]. We filter for words that return
                                # True.

)==1): # If exactly one word returns True for this, it's a unique abbreviation!

     print w[:a]+':'+w # Print the abbreviation, a colon, and the full word.

J - 47 char

(,.~~.@,~[:(#~1-1({.\e."_1]\.){."1)@;(<\,.<)&.>)

J sees strings as just vectors of characters, which means that when it tries to make a list of strings it actually ends up making a table of characters, so the ends get padded with spaces. J's solution to this is called the box, so this function takes as argument a boxed list of strings, so as to preserve length.

   'code';'golf';'going'
+----+----+-----+
|code|golf|going|
+----+----+-----+

Also, J lacks a hash type, so the closest it has to that is a two-column table of items, say boxed strings, for instance. If that is unacceptable and I have to default to the key-value form, I can reformat the output to this form in 67 characters total:

;@|.@,@((<&>',:'),."1,.~~.@,[:(#~1-1({.\e."_1]\.){:"1)@;(<,.<\)&.>)

Explanation by explosion:

(,.~~.@,[:(#~1-1({.\e."_1]\.){."1)@;(<\,.<)&.>) NB. unambiguous prefixes
                                    (     )&.>  NB. for each string:
                                     <\         NB.   take all prefixes
                                       ,.<      NB.   pair each with string
        [:                         ;            NB. gather up "partial" hashes
          (#~1-                  )@             NB. remove those rows where:
               1({.\        ){."1               NB.   each key
                    e."_1                       NB.   is an element of
               1(        ]\.){."1               NB.   the rest of the keys
 ,.~                                            NB. hash each word to itself
       ,                                        NB. add these rows to hash
    ~.@                                         NB. remove duplicate rows

Examples:

   (,.~~.@,[:(#~1-1({.\e."_1]\.){."1)@;(<\,.<)&.>) 'pie';'pier';'pierre'
+------+------+
|pie   |pie   |
+------+------+
|pier  |pier  |
+------+------+
|pierre|pierre|
+------+------+
|pierr |pierre|
+------+------+
   NB. 1-char words have to be made into lists with ,
   (,.~~.@,[:(#~1-1({.\e."_1]\.){."1)@;(<\,.<)&.>) (,'a');'dog'
+---+---+
|a  |a  |
+---+---+
|dog|dog|
+---+---+
|d  |dog|
+---+---+
|do |dog|
+---+---+
   NB. "key:value," format, reversed order to save chars
   ;@|.@,@((<&>',:'),."1,.~~.@,[:(#~1-1({.\e."_1]\.){:"1)@;(<,.<\)&.>) 'code';'golf';'going'
goin:going,goi:going,gol:golf,cod:code,co:code,c:code,going:going,golf:golf,code:code,

Haskell 96 87

import Data.List
i=inits
f a=a>>= \x->[(y,x)|y<-i x,y/="",y`notElem`(a>>=i)\\i x||y==x]

Ungolfed version:

 import Data.List
 f a = concatMap (\x ->
     [(y, x) |
      y <- inits x,
      y /= "",
      y `notElem` concatMap inits a \\ inits x || y == x]
     ) a

Example:

> f ["pi","pier","pierre"]
[("pi","pi"),("pier","pier"),("pierr","pierre"),("pierre","pierre")]

I used the inits function, which finds all prefixes of a list/string. Does it count as cheating?

Find all unambiguous prefixes of a set of strings

Python 2.7 - 146 141 bytes

J - 47 char

Haskell 96 87

Tags:

Code Golf

Related

Recent Posts