grep: match all characters up to (not including) first blank space

I realize this has long since been answered with the grep solution, but for future generations I'd like to note that there are at least two other solutions for this particular situation, both of which are more efficient than grep.

Since you are not doing any complex text pattern matching, just taking the first column delimited by a space, you can use some of the utilities which are column-based, such as awk or cut.

Using awk

$ awk '{print $1}' text1.txt > text2.txt

Using cut

$ cut -f1 -d' ' text1.txt > text2.txt

Benchmarks on a ~1.1MB file

$ time grep -o '^[^ ]*' text1.txt > text2.txt

real    0m0.064s
user    0m0.062s
sys     0m0.001s
$ time awk '{print $1}' text1.txt > text2.txt

real    0m0.021s
user    0m0.017s
sys     0m0.004s
$ time cut -f1 -d' ' text1.txt > text2.txt

real    0m0.007s
user    0m0.004s
sys     0m0.003s

awk is about 3x faster than grep, and cut is about 3x faster than that. Again, there's not much difference for this small file for just one run, but if you're writing a script, e.g., for re-use, or doing this often on large files, you might appreciate the extra efficiency.

You are putting quantifier * at the wrong place.

Try instead this: -

grep '^[^\s]*' text1.txt > text2.txt

or, even better: -

grep '^\S*' text1.txt > text2.txt

\S means match non-whitespace character. And anchor ^ is used to match at the beginning of the line.

grep: match all characters up to (not including) first blank space

Tags:

Regex

Grep

Whitespace

Related

Recent Posts