Remove partial duplicates consecutive lines but keep first and last
uniq is (sort of) the perfect tool for this, by default in uniq you can keep/show the first but not last line in set.
uniq has a -f flag which allows you to skip the first few fields.
From man uniq:
-f, --skip-fields=N
avoid comparing the first N fields
-s, --skip-chars=N
avoid comparing the first N characters
A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars.
Example with uniq -c to show count see what uniq is doing:
-bash-4.2$ uniq -c -f 1 original_file
1 1447790360 99999 99999 20.25 20.25 20.25 20.50
9 1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1 1447790388 20.25 20.25 99999 99999 99999 99999
1 1447790389 99999 99999 20.25 20.25 20.25 20.50
1 1447790391 20.00 20.25 20.25 20.25 20.25 20.50
3 1447790394 20.25 20.25 20.25 20.25 20.25 20.50
Not bad. Pretty close to what is wanted. And easy to do. But missing the last matching line in group . . . .
The grouping options in uniq are also interesting for this question . . .
--group[=METHOD]
show all items, separating groups with an empty line METHOD={separate(default),prepend,append,both}
-D, --all-repeated[=METHOD]
print all duplicate lines groups can be delimited with an empty line METHOD={none(default),prepend,separate}
Example, uniq by group . . .
-bash-4.2$ uniq --group=both -f 1 original_file
1447790360 99999 99999 20.25 20.25 20.25 20.50
1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1447790365 20.25 20.25 20.25 20.25 20.25 20.50
1447790368 20.25 20.25 20.25 20.25 20.25 20.50
1447790371 20.25 20.25 20.25 20.25 20.25 20.50
1447790374 20.25 20.25 20.25 20.25 20.25 20.50
1447790377 20.25 20.25 20.25 20.25 20.25 20.50
1447790380 20.25 20.25 20.25 20.25 20.25 20.50
1447790383 20.25 20.25 20.25 20.25 20.25 20.50
1447790386 20.25 20.25 20.25 20.25 20.25 20.50
1447790388 20.25 20.25 99999 99999 99999 99999
1447790389 99999 99999 20.25 20.25 20.25 20.50
1447790391 20.00 20.25 20.25 20.25 20.25 20.50
1447790394 20.25 20.25 20.25 20.25 20.25 20.50
1447790397 20.25 20.25 20.25 20.25 20.25 20.50
1447790400 20.25 20.25 20.25 20.25 20.25 20.50
Then grep for line before and after every empty line and strip blank lines:
-bash-4.2$ uniq --group=both -f 1 original_file |grep -B1 -A1 ^$ |grep -Ev "^$|^--$"
1447790360 99999 99999 20.25 20.25 20.25 20.50
1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1447790386 20.25 20.25 20.25 20.25 20.25 20.50
1447790388 20.25 20.25 99999 99999 99999 99999
1447790389 99999 99999 20.25 20.25 20.25 20.50
1447790391 20.00 20.25 20.25 20.25 20.25 20.50
1447790394 20.25 20.25 20.25 20.25 20.25 20.50
1447790400 20.25 20.25 20.25 20.25 20.25 20.50
Tah dahhh! Pretty good.
With awk
one liner:
awk '{n=$2$3$4$5$6$7}l1!=n{if(p)print l0; print; p=0}l1==n{p=1}{l0=$0; l1=n}END{print}' file
The whole point is to manipulate few variables: n
stores all fields except first in current line, l1
the same for previous line and l0
the whole previous line. The p
is just a flag to mark if previous line was already printed.