Count lines containing word

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '
  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2

Straightfoward-ish in bash:

Click to copy

declare -A wordcount
while read -ra words; do 
    # unique words on this line
    declare -A uniq
    for word in "${words[@]}"; do 
        uniq[$word]=1
    done
    # accumulate the words
    for word in "${!uniq[@]}"; do 
        ((wordcount[$word]++))
    done
    unset uniq
done < file

Looking at the data:

Click to copy

$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

Click to copy

$ printf "%s\n" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2

It's a pretty straight-forward perl script:

Click to copy

#!/usr/bin/perl -w
use strict;

my %words = ();
while (<>) {
  chomp;
  my %linewords = ();
  map { $linewords{$_}=1 } split / /;
  foreach my $word (keys %linewords) {
    $words{$word}++;
  }
}

foreach my $word (sort keys %words) {
  print "$word:$words{$word}\n";
}

The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.

Count lines containing word

Tags:

Text Processing

Related

Recent Posts