Separate numbers, strings from one line using bash
GNU grep
or compatible solution:
s="string123anotherstr456thenanotherstr789"
grep -Eo '[[:alpha:]]+|[0-9]+' <<<"$s"
[[:alpha:]]+|[0-9]+
- regex alternation group, matches either alphabetic character(s) or number(s); both will be considered as separate entries on output
The output:
string
123
anotherstr
456
thenanotherstr
789
POSIXly:
string=string123anotherstr456thenanotherstr789
sed '
s/[^[:alnum:]]//g; # remove anything other than letters and numbers
s/[[:alpha:]]\{1,\}/&\
/g; # insert a newline after each sequence of letters
s/[0-9]\{1,\}/&\
/g; # same for digits
s/\n$//; # remove a trailing newline if any' << EOF
$string
EOF
awk
Input contains only letters and numerals
Add a newline character after every [[:alpha:]]+
(sequence of letters) and after every [[:digit:]]+
(sequence of numerals):
awk '{ gsub(/([[:alpha:]]+|[[:digit:]]+)/,"&\n",$0) ; printf $0 }' filename
(The &
is awk
shorthand for the matched sequence.)
Input contains other characters (eg, punctuation)
As before, but now also dealing with substrings of [^[:alnum:]]+
(non-letter, non-numeral) characters:
awk '{ gsub(/([[:alpha:]]+|[[:digit:]]+|[^[:alnum:]]+)/,"&\n",$0) ; printf $0 }' filename
Negative numbers and decimal fractions
Treat -
(hyphen) and .
(period) as numbers:
awk '{ gsub(/([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/,"&\n",$0) ; printf $0 }' filename
Those characters must appear in both the [[:digit:].-]+
and [^[:alnum:].-]+
expressions. Also, to be interpreted as a literal hyphen, the -
must be last character before the final right square bracket of each expression; otherwise, it indicates a range of characters.
Example:
[test]$ cat file.txt
string123another!!str456.001thenanotherstr-789
[test]$ awk '{ gsub(/([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/,"&\n",$0) ; printf $0 }' file.txt
string
123
another
!!
str
456.001
thenanotherstr
-789
An exercise for the reader
If the input file requires it, you could modify the awk
command to:
- Ensure that
-
only counts as part of a number if it occurs at the start of a numeral sequence. - Allow numbers that are expressed in scientific notation.