How to remove duplicate words from a plain text file using linux command
Assuming that the words are one per line, and the file is already sorted:
uniq filename
If the file's not sorted:
sort filename | uniq
If they're not one per line, and you don't mind them being one per line:
tr -s [:space:] \\n < filename | sort | uniq
That doesn't remove punctuation, though, so maybe you want:
tr -s [:space:][:punct:] \\n < filename | sort | uniq
But that removes the hyphen from hyphenated words. "man tr" for more options.
ruby -pi.bak -e '$_.split(",").uniq.join(",")' filename
?
I'll admit the two kinds of quotations are ugly.