Delete all lines which don't have n characters before delimiter
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]\{6\}\>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, \>
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The \>
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}\b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (\b
).