Splitting words into specific fragments

Here is a hybrid recursive/StringReplaceList method. It builds a tree representing all possible splits.

Now with a massive speed improvement thanks to Rojo's brilliance.

Updated element list per bobthechemist.

elements =
  Array[ElementData[#, "Symbol"] &, 118] /.
    {"Uup" -> "Mc", "Uus" -> "Ts", "Uuo" -> "Og"} //
    ToLowerCase;

f1[""] = Sequence[];

f1[s_String] := 
  Block[{f1}, 
    StringReplaceList[s, 
      StartOfString ~~ a : elements ~~ b___ ~~ EndOfString :> a ~~ f1@b
  ]]

Testing:

f1 @ "titanic"
{"ti" ~~ {"ta" ~~ {"n" ~~ {"i" ~~ {"c"}}, "ni" ~~ {"c"}}}}
f1 @ "archbishop"
{"ar" ~~ {"c" ~~ {"h" ~~ {"b" ~~ {"i" ~~ {"s" ~~ {"h" ~~ {"o" ~~ {"p"}}, 
     "ho" ~~ {"p"}}}}, "bi" ~~ {"s" ~~ {"h" ~~ {"o" ~~ {"p"}}, "ho" ~~ {"p"}}}}}}}

Responding to comments below and whuber's post, an extension that generates string lists:

f2[s_String] := { f1[s] } //. x_ ~~ y_ :> Thread[x ~~ "." ~~ y] // Flatten

f2 @ "titanic"

f2 @ "archbishop"
{"ti.ta.n.i.c", "ti.ta.ni.c"}

{"ar.c.h.b.i.s.h.o.p", "ar.c.h.b.i.s.ho.p", "ar.c.h.bi.s.h.o.p", "ar.c.h.bi.s.ho.p"}

Incidentally:

f2 @ "inconspicuousness"
in.c.o.n.s.p.i.c.u.o.u.s.n.es.s
in.c.o.n.s.p.i.c.u.o.u.s.ne.s.s
in.c.o.n.s.p.i.c.u.o.u.sn.es.s
in.c.o.n.s.p.i.cu.o.u.s.n.es.s
in.c.o.n.s.p.i.cu.o.u.s.ne.s.s
in.c.o.n.s.p.i.cu.o.u.sn.es.s
in.co.n.s.p.i.c.u.o.u.s.n.es.s
in.co.n.s.p.i.c.u.o.u.s.ne.s.s
in.co.n.s.p.i.c.u.o.u.sn.es.s
in.co.n.s.p.i.cu.o.u.s.n.es.s
in.co.n.s.p.i.cu.o.u.s.ne.s.s
in.co.n.s.p.i.cu.o.u.sn.es.s
i.n.c.o.n.s.p.i.c.u.o.u.s.n.es.s
i.n.c.o.n.s.p.i.c.u.o.u.s.ne.s.s
i.n.c.o.n.s.p.i.c.u.o.u.sn.es.s
i.n.c.o.n.s.p.i.cu.o.u.s.n.es.s
i.n.c.o.n.s.p.i.cu.o.u.s.ne.s.s
i.n.c.o.n.s.p.i.cu.o.u.sn.es.s
i.n.co.n.s.p.i.c.u.o.u.s.n.es.s
i.n.co.n.s.p.i.c.u.o.u.s.ne.s.s
i.n.co.n.s.p.i.c.u.o.u.sn.es.s
i.n.co.n.s.p.i.cu.o.u.s.n.es.s
i.n.co.n.s.p.i.cu.o.u.s.ne.s.s
i.n.co.n.s.p.i.cu.o.u.sn.es.s

Here is a fairly simple approach using only higher level functions. First, note that StringCases does almost all the work for you. István mentioned it in passing, but it is more powerful than that. It has an Overlap option that you can set to True to get all possible decompositions in one go:

elements = Table[ElementData[i, "Symbol"], {i, 112}];
StringCases["titanic", elements, Overlaps -> True, IgnoreCase -> True]
Out[1]= {"ti", "i", "ta", "n", "ni", "i", "c"}

StringCases["archbishop", elements, Overlaps -> True, IgnoreCase -> True]
Out[2]= {"ar", "c", "h", "b", "bi", "i", "s", "h", "ho", "o", "p"}

That's a pretty clean way of getting them all! I used IgnoreCase instead of ToLowerCase in elements, but either way is fine.

Next, you just need to find the subsets of the decomposition that give you back the original string. Since we're dealing with symbols of max length 2, your subsets only need to be restricted to $\displaystyle\lceil\frac{\text{string length}}{2}\rceil$ to $\text{string length}$:

Select[Subsets[{"ti", "i", "ta", "n", "ni", "i", "c"}, {4, 7}], StringJoin[#] == "titanic" &]
Out[3]= {{"ti", "ta", "ni", "c"}, {"ti", "ta", "n", "i", "c"}}

Select[Subsets[{"ar", "c", "h", "b", "bi", "i", "s", "h", "ho", "o", "p"}, {5, 10}], 
    StringJoin[#] == "archbishop" &]
Out[4]= {{"ar", "c", "h", "bi", "s", "ho", "p"}, 
         {"ar", "c", "h", "b", "i", "s", "ho", "p"}, 
         {"ar", "c", "h", "bi", "s", "h", "o", "p"}, 
         {"ar", "c", "h", "b", "i", "s", "h", "o", "p"}}

You can now bundle this up neatly as follows:

Begin["FXWords`"];
    elements = Table[ElementData[i, "Symbol"], {i, 112}];
    ElementDecompose[word_String] := Module[{decomps},
        decomps = StringCases[word, elements, Overlaps -> True, IgnoreCase -> True];
        Select[Subsets[decomps, {Ceiling[#/2], #}], StringJoin[#] == word &] &@StringLength[word]
    ];
End[];

and call it as FXWords`ElementDecompose["titanic"]


Some really simple partial answers using the string patternmatcher:

elements = ToLowerCase /@ 
  Select[Table[ElementData[i, "Symbol"], {i, Length@ElementData[]}], StringLength[#] < 3 &];

StringReplace["archbishop", # -> {#} & /@ elements] /. StringExpression -> Join
StringReplace["titanic", # -> {#} & /@ elements] /. StringExpression -> Join

{"ar", "c", "h", "b", "i", "s", "h", "o", "p"}

{"ti", "ta", "n", "i", "c"}

Even more simple is StringCases:

StringCases["archbishop", Alternatives @@ elements]

{"ar", "c", "h", "b", "i", "s", "h", "o", "p"}

And a more general solution for finding all decompositions:

updated to return correct decompositions

split[word_String] := Module[{list, findPath, temp},

   (* Generate an exhaustive list of positions of all possible elements in the input *)
   list = Sort@Flatten@DeleteCases[
       Table[i -> #, {i, StringPosition[word, #]}] & /@ elements, {}];

   (* recursive function to find all possible neighbouring elements in the string starting from position pos *)
   findPath[pos_, rest_] := If[pos == StringLength@word, 
     Last /@ Cases[rest, _?(First@First@# == pos &)], 
     Module[{newPos, newRest},
      newRest = Cases[rest, _?(First@First@# == pos &)];
      If[newRest === {}, {},
       {Last@#, findPath[newPos = Last@First@# + 1, 
           Cases[rest, _?(First@First@# >= newPos &)]]} & /@ newRest
       ]]];

   (* call the auxiliary function and tidy up results *)
   temp = findPath[1, list];
   If[temp === {}, {}, 
    temp //. {{x_} :> x, {} -> Sequence[], {x_String, {y__String}} :> {x, y}} //. 
        {x__String, y : {__List}} :> (Join[{x}, #] & /@ y)]
     ];

words = {"titanic", "silicon", "archbishop", "wombat"};
split /@ words // Column
{{"ti", "ta", "n", "i", "c"}, {"ti", "ta", "ni", "c"}}
{{{"s", "i", "li", "c", "o", "n"}, {"s", "i", "li", "co", "n"}}, {{"si", "li", "c", "o", "n"}, {"si", "li", "co", "n"}}}
{{{"ar", "c", "h", "b", "i", "s", "h", "o", "p"}, {"ar", "c", "h", "b", "i", "s", "ho", "p"}}, {{"ar", "c", "h", "bi", "s", "h", "o", "p"}, {"ar", "c", "h", "bi", "s", "ho", "p"}}}
{"w", "o"}

It now correctly gets all the valid decompositions, and returns partial decompositions for words that cannot be decomposed to elements.