I want to compare values of two files, but not based on position or sequence
Compare the sorted files.
In bash (or ksh or zsh), with a process substitution:
diff <(sort File1.txt) <(sort File2.txt)
In plain sh:
sort File1.txt >File1.txt.sorted
sort File1.txt >File2.txt.sorted
diff File1.txt.sorted File2.txt.sorted
To quickly see the differences between sorted files, comm
can be useful: it shows directly the lines that are in one file but not the other.
comm -12 <(sort File1.txt) <(sort File2.txt) >common-lines.txt
comm -23 <(sort File1.txt) <(sort File2.txt) >only-in-file-1.txt
comm -13 <(sort File1.txt) <(sort File2.txt) >only-in-file-2.txt
If a line is repeated in the same file, the commands above insist on the two files having the same number of repetitions. If you want to treat
foo
bar
foo
as identical to
bar
foo
then remove duplicates when sorting: use sort -u
instead of sort
.
If you save the output of sort
on one file and use it later when the other file is available, note that the two files must be sorted in the same locale. If you do this, you should probably sort in byte order:
LC_ALL=C sort File1.txt >File1.txt.sorted
Sort the files first (in bash
):
diff <(sort file1) <(sort file2)
Using awk, you can make a hash index of every distinct input line text, using a command like:
awk 'The magic' Q=A fileA Q=B fileB Q=C fileC ...
'The magic' per input line is:
{ X[$0] = X[$0] Q; }
When you get to the END condition, you iterate over the index of X. Any line that occurred exactly once in each file will be like:
X["Apple"] = "ABC";
A line that appeared once in fileA and three times in fileC would present as "ACCC". You can report any anomalies any way you like, for any number of files. (I once had to implement a 14-way comparison on a safety-critical system that ran on a Main and Standby server, each with a real-time plus Oracle database.)
If you choose to include the line number NR on each tag, and write some interesting patterns, you can make the tags look like:
X["Walrus"] = "A347B38C90"
and report which matching texts were on which lines in the various files.