AWK to replace character for lines not starting with ">"

It seems more natural to do this with sed:

sed '/^>/!y/./X/' Sfr.pep >Sfr2.pep

This would match ^> against the current line ("does this line start with a > character?"). If that expression does not match, the y command is used to change each dot in that line to X.

Testing:

$ cat Sfr.pep
>sequence.1
GTCAGTCAGTCA.GTCAGTCA

$ sed '/^>/!y/./X/' Sfr.pep >Sfr2.pep

$ cat Sfr2.pep
>sequence.1
GTCAGTCAGTCAXGTCAGTCA

The main issue with your awk code is that next is executed whenever you come across a fasta header line. This means that you code only produces sequence data, without headers. That sequence data should look ok though, but that would not be much help.

Simply negating the test and dropping the next block (or preceding the next with print) would solve it in awk for you, but, and this is my personal opinion, using the y command in sed is more elegant than using gsub() (or s///g in sed) for transliterating single characters.

You can try with:

awk '!/^>/ { gsub(/\./, "X") }1' Sfr.pep > Sfr2.pep

Output:

>sequence.1
GTCAGTCAGTCAXGTCAGTCA

AWK to replace character for lines not starting with ">"

Tags:

Awk

Text Processing

Bioinformatics

Related

Recent Posts