Is there way to remove not all, but only nested brackets?

bracket.awk:

BEGIN{quote=1}
{
    for(i=1;i<=length;i++){
        ch=substr($0,i,1)
        pr=1
        if(ch=="\""){quote=!quote}
        else if(ch=="[" && quote){brk++;pr=brk<2}
        else if(ch=="]" && quote){brk--;pr=brk<1}
        if(pr){printf "%s",ch}
    }
    print ""
}

Click to copy

$ awk -f bracket.awk file
["q", "0", "R", "L"], ["q", "1", "[", "]"], ["q", "2", "L", "R"], ["q", "3", "R", "L"]

The idea behind it:

Initialize quote=1. Read the file char-wise. Whenever a quote is found, invert quote variable (if 1, it becomes 0, and vice-versa).

Then, brackets are only counted if quote is set to 1 and excess brackets are not printed, according to brk counter.

The print "" statement is just to add a newline, as the printf above does not do it.

With `perl`:

Click to copy

perl -pe '
   s{([^]["]+|"[^"]*")|\[(?0)*\]}
    {$1 // "[". ($& =~ s/("[^"]*"|[^]["]+)|./$1/gr) . "]"}ge'

That makes use of perl's recursive regexp.

The outer s{regex}{replacement-code}ge tokenises the input into either:

any sequence of characters other than [, ] or "
a quoted string
a [...] group (using recursion in the regexp to find the matching ])

Then, we replace that token with itself if it's in the first two categories ($1), and if not the token with the non-quoted [, ] removed using the same tokenising technique in the inner substitution.

To handle escaped quotes and backslashes within quotes (like "foo\"bar\\"), replace [^"] with (?:[^\\"]|\\.).

With `sed`

If your sed supports the -E or -r options to work with extended regexps instead of basic ones, you could do it with a loop, replacing the innermost [...]s first:

Click to copy

LC_ALL=C sed -E '
  :1
  s/^(("[^"]*"|[^"])*\[("[^"]*"|[^]"])*)\[(("[^"]*"|[^]["])*)\]/\1\4/
  t1'

(using LC_ALL=C to speed it up and make it equivalent to the perl one which also ignores the user's locale when it comes to interpreting bytes as characters).

POSIXly, it should still be doable with something like:

Click to copy

LC_ALL=C sed '
  :1
  s/^\(\(\("[^"]*"\)*[^"]*\)*\[\(\("[^"]*"\)*[^]"]*\)*\)\[\(\(\("[^"]*"\)*[^]["]*\)*\)\]/\1\6/
  t1'

Here using $\(a$*$b$*\)* in place of (a|b)* as basic regexps don't have an alternation operator (the BREs of some sed implementations have \| for that, but that's not POSIX/portable).

This gawk is inelegant to say the least, it will break if you even look at it too long, so you don't need to tell me........ just have a quiet and self-satisfied chuckle that you can do better.

But as it more or less works (on Wednesdays and Fridays during months with a J in them) and consumed 20 minutes of my life I am posting it anyway

Schroedinger's awk (Thx @edmorton)

Click to copy

awk -F"\\\], \\\[" '
    {printf "["; 
       for (i=1; i<=NF; i++) {
         cs=split($i,c,","); 
           for (j=1; j<=cs; j++){
             sub("^ *\\[+","",c[j]); sub("\\]+$","",c[j]);
             t=(j==cs)?"]"((i<(NF-1))?", [":""):",";
             printf c[j] t
       }}print ""}' file

["q", "0", "R", "L"], ["q","1", "[", "]"], ["q","2", "L", "R"], ["q","3","R", "L"]

Walkthrough

Split the fields -F on ], [ which needs to be escaped to hell and back in order to get your final element groups in the fields.

Then split on , to get the elements and consume any leading ^[ or trailing ]$ from each element, then re-aggregate the split with , as a separator and finally re-aggregate the fields using a conditional combination of ] and , [.

Heisenberg's sed

If you pipe to sed it's slightly tidier

Click to copy

awk 'BEGIN{FS="\\], \\["}{for (i=1; i<=NF; i++) print $i}' file | 
   sed -E "s/(^| |,)\[+(\")/\1\2/g ;s/\]+(,|$)/\1/g" | 
   awk 'BEGIN{RS=""; FS="\n";OFS="], ["}{$1=$1; print "["$0"]"}'

["q", "0", "R", "L"], ["q", "1", "[", "]"], ["q", "2", "L", "R"], ["q", "3", "R", "L"]

Does the same job as the first version, the first awk splits out the fields as before, sed loses the excess [ and ] and the final awk recomposes the elements by redefining RS, FS and OFS

Is there way to remove not all, but only nested brackets?

With `perl`:

With `sed`

Tags:

Awk

Sed

Text Processing

Related

Recent Posts

Is there way to remove not all, but only nested brackets?

With perl:

With sed

Tags:

Awk

Sed

Text Processing

Related

With `perl`:

With `sed`