Count number of similar lines in a file
You're file's "structure" is a bit lacking in the structure department, so you'll have to deal with some errors in the process.
Assuming you have all that in a file called input
, try:
tr '[A-Z]' '[a-z]' < input | \
egrep -v "^ *(join date|age|posts|location|re):" | \
sort | \
uniq -c
First line lowercases everything, second strips out the things that look like email headers in your sample, then sort and count unique items.
This command lists the lines and the number of times to repeat
sort nameFile | uniq -c
How about using awk
for this -
awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' INPUT_FILE
Explanation:
First we identify lines that has
:
in them or areblank
and ignore them. All other lines gets stored are converted to upper case and stored in an array. In ourEND statement
we print out everything in our array and the number of times it was found.
Test:
awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' file1
SOX 1
CHRISTMAS SONG 1
CUP OF WONDER 1
SOSSITY YER A WOMAN 1
FAT MAN 1
PUSSY WILLOW 1
VELVET GREEN 1
WITH YOU THERE TO HELP ME 1
ELEGY 1
WE USED TO KNOW 1
TEACHER 1
MY SUNDAY FEELING 1
SWEET DREAM 1
JACK-A-LYNN 1
SOMETHING'S ON THE MOVE 1
ROVER 1
DUN RINGILL 2
AVOIDING THE SWAN SONG 1
JACK FROST AND THE HOODED CROW 1
WITCHES PROMISE 1
LIFE'S A LONG SONG 2
LIVING IN THE PAST 1
WITCH'S PROMISE 1
WOW !!!! WHERE DO I START ? 1
SKATING AWAY ON THE THIN ICE OF A NEW DAY 1
MINSTRAL IN THE GALLERY 1
RAINBOW BLUES 1
MOTHER GOOSE 1
HEAVY HORSES 1
AQUALUNG 1
LOCOMOTIVE BREATH 1