Find any line in VI that has something other than ATCG
First of all, you definitely do not want to open the file in an editor (it's much too large to edit that way).
Instead, if you just want to identify whether the file contains anything other than A
, T
, C
and G
, you may do that with
grep '[^ATCG]' filename
This would return all lines that contain anything other than those four characters.
If you would want to delete these characters from the file, you may do so with
tr -c -d 'ATCG\n' <filename >newfilename
(if this is the correct way to "correct" the file or not, I don't know)
This would remove all characters in the file that are not one of the four, and it would also retain newlines (\n
). The edited file would be written to newfilename
.
If it's a systematic error that has added something to the file, then this could possibly be corrected by sed
or awk
, but we don't yet know what your data looks like.
If you have the file open in vi
or vim
, then the command
/[^ATCG]
will find the next character in the editing buffer that is not a A
, T
, C
or G
.
And :%s/[^ATCG]//g
will remove them all.