Difference between two lists using Bash
Use the comm(1)
command to compare the two files. They both need to be sorted, which you can do beforehand if they are large, or you can do it inline with bash process substitution.
comm
can take a combination of the flags -1
, -2
and -3
indicating which file to suppress lines from (unique to file 1, unique to file 2 or common to both).
To get the lines only in the old file:
comm -23 <(sort /tmp/oldList) <(sort /tmp/newList)
To get the lines only in the new file:
comm -13 <(sort /tmp/oldList) <(sort /tmp/newList)
You can feed that into a while read
loop to process each line:
while read old ; do
...do stuff with $old
done < <(comm -23 <(sort /tmp/oldList) <(sort /tmp/newList))
and similarly for the new lines.
Consider using Ruby if your scripts need readability.
To get the lines only in the old file:
ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"
To get the lines only in the new file:
ruby -e "puts File.readlines('/tmp/newList') - File.readlines('/tmp/oldList')"
You can feed that into a while read loop to process each line:
while read old ; do
...do stuff with $old
done < ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"
This is old, but for completeness we should say that if you have a really large set, the fastest solution would be to use diff to generate a script and then source it, like this:
#!/bin/bash
line_added() {
# code to be run for all lines added
# $* is the line
}
line_removed() {
# code to be run for all lines removed
# $* is the line
}
line_same() {
# code to be run for all lines at are the same
# $* is the line
}
cat /tmp/oldList | sort >/tmp/oldList.sorted
cat /tmp/newList | sort >/tmp/newList.sorted
diff >/tmp/diff_script.sh \
--new-line-format="line_added %L" \
--old-line-format="line_removed %L" \
--unchanged-line-format="line_same %L" \
/tmp/oldList.sorted /tmp/newList.sorted
source /tmp/diff_script.sh
Lines changed will appear as deleted and added. If you don't like this, you can use --changed-group-format. Check the diff manual page.
The diff command will do the comparing for you.
e.g.,
$ diff /tmp/oldList /tmp/newList
See the above man page link for more information. This should take care of your first part of your problem.