How to make the 'cut' command treat same sequental delimiters as one?
As you comment in your question, awk
is really the way to go. To use cut
is possible together with tr -s
to squeeze spaces, as kev's answer shows.
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr | cut
tr -s ' ' < file | cut -d' ' -f4
awk
awk '{print $4}' file
bash
while read -r _ _ _ myfield _
do
echo "forth field: $myfield"
done < file
sed
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file
Tests
Given this file, let's test the commands:
$ cat a
this is line 1 more text
this is line 2 more text
this is line 3 more text
this is line 4 more text
tr | cut
$ cut -d' ' -f4 a
is
# it does not show what we want!
$ tr -s ' ' < a | cut -d' ' -f4
1
2 # this makes it!
3
4
$
awk
$ awk '{print $4}' a
1
2
3
4
bash
This reads the fields sequentially. By using _
we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield
as the 4th field in the file, no matter the spaces in between them.
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4
sed
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}
. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1
.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
Try:
tr -s ' ' <text.txt | cut -d ' ' -f4
From the tr
man page:
-s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character