Can IFS (Internal Field Separator) function as a single separator for multiple consecutive delimiter chars?
From bash
manpage :
Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.
It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr
or sed
:
var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
echo "# arr[$x] \"$el\""
done
The %#%#%#%#%
thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).
To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS
variable to the empty string for the array parameter expansion.
This is documented in man bash
under Word Splitting:
Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.
(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do
echo "${i}: '${arr[${i}]}'"
done
)
As bash IFS does not provide an in-house way to treat consecutive delimiter chars as a single delimiter (for non-whitespace delimiters), I have put together an all bash version (vs.using an external call eg. tr, awk, sed)
It can handle mult-char IFS..
Here are its execution-time resu;ts, along with similar tests for the tr
and awk
options shown on this Q/A page... The tests are based on 10000 itterations of just building the arrray (with no I/O )...
pure bash 3.174s (28 char IFS)
call (awk) 0m32.210s (1 char IFS)
call (tr) 0m32.178s (1 char IFS)
Here is the output
# dlm_str = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"
Here is the script
#!/bin/bash
# Note: This script modifies the source string.
# so work with a copy, if you need the original.
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk in IFS causes a regex(?) issue, but * is ok in data.
# NOTE: ? Question-mark in IFS causes a regex(?) issue, but ? is ok in data.
# NOTE: 0..9 digits in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote in IFS; don't know yet, but ' is ok in data.
#
function shrink_repeat_chars () # A 'tr -s' analog
{
# Shrink repeating occurrences of char
#
# $1: A string of delimiters which when consecutively repeated and are
# considered as a shrinkable group. A example is: " " whitespace delimiter.
#
# $varG A global var which contains the string to be "shrunk".
#
# echo "# dlm_str = $1"
# echo "# original = $varG"
dlms="$1" # arg delimiter string
dlm1=${dlms:0:1} # 1st delimiter char
dlmw=$dlm1 # work delimiter
# More than one delimiter char
# ============================
# When a delimiter contains more than one char.. ie (different byte` values),
# make all delimiter-chars in string $varG the same as the 1st delimiter char.
ix=1;xx=${#dlms};
while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG
varG="${varG//${dlms:$ix:1}/$dlm1}"
ix=$((ix+1))
done
# echo "# unified = $varG"
#
# Binary shrink
# =============
# Find the longest required "power of 2' group needed for a binary shrink
while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
#
# Shrik groups of delims to a single char
while [[ ! "$dlmw" == "$dlm1" ]] ; do
varG=${varG//${dlmw}$dlm1/$dlm1}
dlmw=${dlmw:$((${#dlmw}/2))}
done
varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}
# Main
varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:'
sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
set -f # disable globbing
shrink_repeat_chars "$IFS" # The source string name must be $varG
arr=(${varG:1}) # Strip leading dlim; A single trailing dlim is ok (strangely
for ix in ${!arr[*]} ; do # Dump the array
echo "# arr[$ix] \"${arr[ix]}\""
done
set +f # re-enable globbing
IFS="$sfi" # re-instate the original IFS
#
exit