An easy way to diff log files, ignoring the time stamps?
Depending on the shell you are using, you can turn the approach @Blair suggested into a 1-liner
diff <(cut -b13- file1) <(cut -b13- file2)
(+1 to @Blair for the original suggestion :-)
Answers using cut
are fine but sometimes keeping timestamps within the diff
output is appreciable. As the OP's question is about ignoring the time stamps (not removing them), I share here my tricky command line:
diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
sed
isolates the timestamps (#
before and\n
after) within a process substitutiondiff -I '^#'
ignores lines having these timestamps (lines beginning by#
)
example
Two log files having same content but different timestamps:
$> for ((i=1;i<11;i++)) do echo "09:0${i::1}:00.000 data $i"; done > 1.log
$> for ((i=1;i<11;i++)) do echo "11:00:0${i::1}.000 data $i"; done > 2.log
Basic diff
command line says all lines are different:
$> diff 1.log 2.log
1,10c1,10
< 09:01:00.000 data 1
< 09:02:00.000 data 2
< 09:03:00.000 data 3
< 09:04:00.000 data 4
< 09:05:00.000 data 5
< 09:06:00.000 data 6
< 09:07:00.000 data 7
< 09:08:00.000 data 8
< 09:09:00.000 data 9
< 09:01:00.000 data 10
---
> 11:00:01.000 data 1
> 11:00:02.000 data 2
> 11:00:03.000 data 3
> 11:00:04.000 data 4
> 11:00:05.000 data 5
> 11:00:06.000 data 6
> 11:00:07.000 data 7
> 11:00:08.000 data 8
> 11:00:09.000 data 9
> 11:00:01.000 data 10
Our tricky diff -I '^#'
does not display any difference (timestamps ignored):
$> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
$>
Change 2.log
(replace data
by foo
on the 6th line) and check again:
$> sed '6s/data/foo/' -i 2.log
$> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
11,13c11,13
11,13c11,13
< #09:06:00.000
< data 6
< #09:07:00.000
---
> #11:00:06.000
> foo 6
> #11:00:07.000
=> timestamps are kept in the diff
output!
You can also use the side by side feature using -y
or --side-by-side
option:
$> diff -y -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
#09:01:00.000 #11:00:01.000
data 1 data 1
#09:02:00.000 #11:00:02.000
data 2 data 2
#09:03:00.000 #11:00:03.000
data 3 data 3
#09:04:00.000 #11:00:04.000
data 4 data 4
#09:05:00.000 #11:00:05.000
data 5 data 5
#09:06:00.000 | #11:00:06.000
data 6 | foo 6
#09:07:00.000 | #11:00:07.000
data 7 data 7
#09:08:00.000 #11:00:08.000
data 8 data 8
#09:09:00.000 #11:00:09.000
data 9 data 9
#09:01:00.000 #11:00:01.000
data 10 data 10
old sed
If your sed
implementation does not support the -r
option, you may have to count the twelve dots <(sed 's/^\(............\)/#\1\n/' 1.log)
or use another pattern of your choice ;)
@EbGreen said
I would just take the log files and strip the timestamps off the start of each line then save the file out to different files. Then diff those files.
That's probably the best bet, unless your diffing tool has special powers. For example, you could
cut -b13- file1 > trimmed_file1
cut -b13- file2 > trimmed_file2
diff trimmed_file1 trimmed_file2
See @toolkit's response for an optimization that makes this a one-liner and obviates the need for extra files. If your shell supports it. Bash 3.2.39 at least seems to...
For a graphical option, Meld can do this using its text filters feature.
It allows for ignoring lines based on one or more python regex. The differences still appear, but lines that don't have any other differences won't be highlighted.