Programming paradigm change
In this response, I will focus upon the programming paradigm change when moving from Java to Mathematica. I will emphasize two differences between the languages. The first concerns the "feel" of writing Mathematica code. The second is about how iteration is expressed.
The "Feel" of Mathematica
Java is a reasonably conventional programming language, designed to conform to the principle of "least surprise" for programmers who are coming to it from other mainstream languages. The boundary between the language proper and the runtime libraries is well-defined. Learning Java is often just a matter of trying to find where a preconceived piece of functionality resides in the standard library.
By contrast, the symbolic nature of Mathematica blurs the line between language and library. From a strict computer science perspective, one could argue that the core language is very tiny indeed. It consists of a few data types, some term-rewriting machinery, and little else. However, it does not feel this way to use it. Instead, I would suggest that learning Mathematica is more akin to learning a natural language. There is endless vocabulary, obscure idioms, weird exceptions, and countless ways to express any given concept. For me, the act of writing Mathematica code sometimes feels more like writing prose (or perhaps even poetry). This is simultaneously Mathematica's greatest strength and its greatest weakness. On the one hand, it is flexible enough to express an algorithm in many differents styles, e.g. procedural, functional, query-based, language-based, etc. On the other hand, the computer sometimes acts like you've just read it a piece of poetry or prose :)
My advice is to learn Mathematica as if you were learning a new natural language. Slowly, patiently. Ability grows with use. Tackle lots of small problems (like this question -- you've come to the right place in Mathematica StackExchange!). Practice expressing the same concept many different ways. Read lots of code. Read random pages from the voluminous documentation -- there are many, many gems hiding in there. Don't try to bite off the whole thing at once.
Iteration
Okay, enough of the warm and fuzzy stuff. Let's talk about code. If I had to pick just one difference between Java and Mathematica to talk about, it would be iteration. Whenever we want to operate on a collection of items in Java, our first thought is to operate upon each item individually within an explicit loop (ignoring Java 8 for the moment). Mathematica takes a different approach. The first choice in Mathematica is to operate upon the collection as a single unit using implicit iteration. For example, the Java loops:
for (int i = 0; i < array.length; ++i) { result[i] = array[i] + 1; }
for (int i = 0; i < array.length; ++i) { result2[i] = someFunction(array[i]); }
would have the following Mathematica equivalents:
result = array + 1;
result2 = Map[someFunction, array];
The iteration is implicit. A big part of the study of Mathematica is about learning these higher-level functions that perform implicit iteration. Explicit iteration still happens in Mathematica, but it is not the first tool to reach for. It is almost always fruitful to seek an operator that will transform a collection in one fell swoop.
You can get a feel for iteration in Mathematica from the Functional Operations tutorial in the documentation, but you'd be hard-pressed to find a better investment of your time than to read Mathematica Programming - an advanced introduction.
One final point before we move on to the specific problem from the question. Mathematica has very few facilities to destructively alter large data structures. Any modification of, say, a list results in a copy of that list being generated. This is the Mathematica way, and one just has to embrace it. It is possible to write (often convoluted) code to minimize memory usage, but the first tool in the toolbox almost always involves lots of data structure copying. A common theme that you will find in the answers here on the Mathematica StackExchange is that the fastest algorithms frequently use the most memory. Don't fight it, unless there is no alternative. Memory is cheap. (ohh, the fading memory of C/C++ in my mind stirs restlessly :)
The Problem, At Last
As you can see from the many responses to this question, there are many ways to approach this problem. I dare say that they hardly scratch the surface of possible solutions. I will add another, but make no claim as to how it stacks up against other answers. This response is more concerned with the process of creating the solution.
Mathematica is first and foremost an exploratory programming environment. Let's explore.
The first thing we know for sure is that we will need to divide the string into pieces. Let's play:
$string = "[can {and it(it (mix) up)} look silly]";
Characters[$string]
(* {[,c,a,n, ,{,a,n,d, ,i,t,(,i,t, ,(,m,i,x,), ,u,p,),}, ,l,o,o,k, ,s,i,l,l,y,]} *)
StringSplit[$string, Characters["{[()]}"]]
(* {can ,and it,it ,mix, up,, look silly} *)
$split = StringSplit[$string, RegularExpression["(?<=[(){}[\\]])|(?=[(){}[\\]])"]]
(* {[,can ,{,and it,(,it ,(,mix,), up,),}, look silly,]} *)
That last one looks promising. Take no notice of the fact that I pulled that regex out of a hat -- this post is long enough! What we need to do is to scan for brackets and drop down a level when we see an opening one and pop back up when we see a closing one:
$changes = Replace[$split, {"["|"{"|"("->-1, "]"|"}"|")"->1, _->0}, {1}]
(* {-1,0,-1,0,-1,0,-1,0,1,0,1,1,0,1} *)
Those represent the level changes, but we are really more interested in the levels themselves. We need to keep a running total of the changes. It so happens that there is an operation for that:
$levels = Accumulate @ $changes
(* {-1,-1,-2,-2,-3,-3,-4,-4,-3,-3,-2,-1,-1,0} *)
Now we have the individual strings and their levels. We need to sort them by level. Here are a couple ways to do that. First, we could pair up each string with its level and then use SortBy
:
$pairs = Transpose[{$split, $levels}]
(* {{[,-1},{can ,-1},{{,-2},{and it,-2},{(,-3},{it ,-3},{(,-4},{mix,-4},{),-3},
{ up,-3},{),-2},{},-1},{ look silly,-1},{],0}} *)
$sorted = SortBy[$pairs, {Last}]
(* {{(,-4},{mix,-4},{(,-3},{it ,-3},{),-3},{ up,-3},{{,-2},{and it,-2},{),-2},
{[,-1},{can ,-1},{},-1},{ look silly,-1},{],0}} *)
Here, the "natural language" behaviour of Mathematica crops up. We need the sort to be stable, that is, elements at the same level must stay in the original order. The way to achieve that is to say SortBy[$pairs, {Last}]
instead of SortBy[$pairs, Last]
. There is a logical explanation for this (see the documentation), but it is far from obvious. This is an example of something one just picks up by experience and hard knocks.
There is another way to perform this sort that is even less obvious, but is actually a fairly common idiom:
$sorted = $split[[Ordering@$levels]]
(* {(,mix,(,it ,), up,{,and it,),[,can ,}, look silly,]} *)
Ordering
does not sort the list, but rather tells you the index of each element as it would appear in a sorted list. This is useful to sort one list based upon the contents of another. It eliminates the need to assemble an intermediate list of pairs first. (Although we are still assembling an intermediate list of indices. Memory. Cheap. Remember?).
The list still contains all of the brackets. We need to remove those:
$result = DeleteCases[$sorted, "{"|"["|"("|")"|"]"|"}"]
(* {mix,it , up,and it,can , look silly} *)
We need to join the strings together:
StringJoin @@ $result
(* mixit upand itcan look silly *)
Hmm, we need to something about those spaces. Let's delete the excess:
StringJoin @@ StringTrim @ $result
(* mixitupand itcanlook silly *)
Oops, we need spaces between the strings.
StringJoin @@ Riffle[StringTrim @ $result, " "]
(* mix it up and it can look silly *)
Riffle
? Where did that come from? Alas, it is simply vocabulary that must be memorized.
At last, we have the result. Let's pull it all together, keeping our favourites from the variations:
decode[string_] :=
Module[{split, changes, levels, sorted, result}
, split = StringSplit[string, RegularExpression["(?<=[(){}[\\]])|(?=[(){}[\\]])"]]
; changes = Replace[split, {"["|"{"|"("->-1, "]"|"}"|")"->1, _->0}, {1}]
; levels = Accumulate @ changes
; sorted = split[[Ordering@levels]]
; result = DeleteCases[sorted, "{"|"["|"("|")"|"]"|"}"]
; StringJoin @@ Riffle[StringTrim @ result, " "]
]
decode["[can {and it(it (mix) up)} look silly]"]
(* mix it up and it can look silly *)
If we are performing a one-off task, there is little need to search for alternative approaches as we have done above. The first approach we find is just fine, provided if gives us our result. But sometimes it takes a while to find an approach that performs well and/or is comprehensible enough to convince ourselves that it is correct. That is where the need to write, rewrite and rewrite again surfaces, just as when writing natural-language prose.
Parting Tip
The widespread mathematical ethic of terseness has taken hold in the Mathematica community. So we are just as likely to see the preceding function expressed without making the intermediate expressions explicit:
decode2[string_] :=
StringSplit[string, RegularExpression["(?<=[(){}[\\]])|(?=[(){}[\\]])"]] //
StringJoin @@ Riffle[
StringTrim @ DeleteCases[
#[[Ordering @ Accumulate @ Replace[#, {"["|"{"|"("->-1, "]"|"}"|")"->1, _->0}, {1}]]]
, "{"|"["|"("|")"|"]"|"}"
]
, " "
] &
decode2["[can {and it(it (mix) up)} look silly]"]
(* mix it up and it can look silly *)
There it is, fully-grown, armed and armoured as if sprung directly from Zeus' head. Time to post to StackExchange! We might consider such an approach to show little mercy to the reader, who must reverse-engineer what is happening. This is a common problem in expressive high-level languages. Terse expression can sometimes give few cues for understanding.
With experience, it becomes easier to read such expressions. In the meantime (and beyond), a little helper function can help wade through code like this:
P[x___, l_] := (Print[Row[{x, l}, " "]]; l)
This function prints out its arguments, returning the last unchanged. It can be quickly inserted at various places in a pipeline to see what they return. For example:
Module[{level = 0}
, P@StringCases["[can {and it(it (mix) up)} look silly]"
, { s:Characters["[{("] :> (--level; ##&[])
, s:Characters["]})"] :> (++level; ##&[])
, s:Except@Characters["[{()}] "].. :> {s, level}
}
]
] //
P@SortBy[#, {Last}]& //
#[[All, 1]]&
(*
{{can,-1},{and,-2},{it,-2},{it,-3},{mix,-4},{up,-3},{look,-1},{silly,-1}}
{{mix,-4},{it,-3},{up,-3},{and,-2},{it,-2},{can,-1},{look,-1},{silly,-1}}
{mix,it,up,and,it,can,look,silly}
*)
Note the use of P
before the StringCases
and the Sort
, causing their results to be shown. This technique is far more "low tech" than the debuggers in the Front-End and Workbench, but in the notebook interface it is quick and easy and often all that one needs. (And the attentive reader will have noticed that the code block was yet another approach to the problem, using a semi-imperative style. Assuming said attentive reader was not put to sleep half-way through this TL;DR :)
StringReplace method
After reading other answers I was inspired to write a new method. I place it first because it is almost as concise as the method below yet it is more robust (and safe) because it preserves strings as strings.
str = "[can {and it(it (mix) up)} look silly]";
StringReplace[str, {"["|"{"|"(" -> -1, "]"|"}"|")" -> 1, " " -> 0}] //
#[[ Ordering @ Accumulate[# /. _String :> 0] ]] ~Cases~ _String ~Row~ " " &
mix it up and it can look silly
This method is certainly not basic, but instead illustrates some interesting functionality in Mathematica.
StringReplace
is capable of substituting any expression into a string. The fragments are wrapped inStringExpression
.Accumulate
andOrdering
operate on expressions of any Head; it need not beList
.
Additionally this code is several times faster than WReach's decode
and decode2
upon which it was based. It appears to be the fastest method posted so far.
Do[str = StringReplace[str, "mix" -> str], {15}]; (* nest the original string *)
StringLength[str]
1146883
StringReplace[str, {"["|"{"|"(" -> -1, "]"|"}"|")" -> 1, " " -> 0}] //
#[[ Ordering @ Accumulate[# /. _String :> 0] ]] ~Cases~ _String ~Row~ " " & //
Timing // First
decode[str] // Timing // First
decode2[str] // Timing // First
0.483 2.356 2.387
Note: using List @@ #
to change the head before Accumulate
is a bit faster still, presumably because it eliminates a useless StringExpression
evaluation. It however isn't as didactic regarding Mathematica's capabilities.
Native-parser methods
Here is a method using Mathematica's parsing to assist in the process:
str = "[can {and it(it (mix) up)} look silly]";
he = ToHeldExpression @ StringReplace[str, {"["|"(" -> "{", "]"|")" -> "}"}]
Row[Join @@ Array[Cases[he, _Symbol, {99 - #}] &, 99], " "]
Hold[{can {and it {it {mix} up}} look silly}] mix it up and it can look silly
This has one clear flaw: words such as "can" are converted to Symbols. Since all of the words in the example are lower case this works without error, but to make it robust additional holding would be required, e.g. s_Symbol :> ToString @ Unevaluated[s]
or use of heldCases
. I also hard-coded a maximum depth of 98 just to make the code a bit cleaner.
With the same start but using ReplaceRepeated
to order and condense the expression:
First[he //. Except[Hold][a___, {x__}, b___] :> {x, a, b}] ~Row~ " "
mix it up and it can look silly
An example of how holding may be added:
silly = "Stop that! It's silly!"; (* if this appears in the output I failed *)
he //. Except[Hold][a___, {x__}, b___] :> {x, a, b} /. _[x_] :> HoldForm @ Row[x, " "]
mix it up and it can look silly
str = "[can {and it(it (mix) up)} look silly]";
i = 10;
StringJoin @@ Last[Replace[Characters@str,
{"[" | "(" | "{" :> Sow[" ", --i], "]" | ")" | "}" :> Sow["", ++i], c_ :> Sow[c, i]}
, 1] ~Reap~ Range@10]
(* " mix it up and it can look silly" *)
This just scans through the characters one at a time and Sows them with an integer tag. The tag starts with a high value (i=10
) and is decremented if an open bracket is encountered, and incremented when a closing bracket is encountered. The Reap
collects the results in order, starting with the smallest integer tag (i.e. the most deeply nested characters).