Position, by word count, of all repetitions of a word in a text file

Here's one way, using GNU tools:

$ tr ' ' '\n' < file | tr -d '[:punct:]' | grep . | grep -nFx iPhone
25:iPhone
54:iPhone
58:iPhone

The first tr replaces all spaces with newlines, and then the second deletes all punctuation (so that iPhone, can be found as a word). The grep . ensures that we skip any blank lines (we don't want to count those) and the grep -n appends the line number to the output. Then, the -F tells grep not to treat its input as a regular expression, and the -x that it should only find matches that span the entire line (so that job will not count as a match for jobs). Note that the numbers you gave in your question were off by one.

If you only want the numbers, you could add another step:

$ tr ' ' '\n' < file | tr -d '[:punct:]' | grep . | grep -nFx iPhone | cut -d: -f1
25
54
58

As has been pointed out in the comments, this will still have problems with "words" such as aren't or double-barreled. You can improve on that using:

tr '[[:space:][:punct:]]' '\n' < file | grep . | grep -nFx iPhone

Use the tr command to replace all whitespace by a single newline (using the squeeze option).

Pipe that to nl -ba, which numbers each line (and thus word) sequentially.

Pipe that to grep -F for the word you want. This will show the number and text for just those words.

awk would also do this in one process, but probably look more complex.


An alternative with sed:

sed -e '/^$/d' -e 's/^[[:blank:]]*//g' < file | sed 's/[[:blank:]]/\n/g' | grep -ion "iphone"

Output:

25:iPhone
54:iPhone
58:iPhone

Tags:

Grep

Wc