Mutate my DNA sequence
JavaScript (ES6), 124 ... 93 92 bytes
Output format: XiY
, with X
= original nucleotide, i
= position, Y
= new nucleotide.
s=>s[i=([,,x,y]=s.match(/(.A[AG]|.GA|T(.[AG]|(A|G).))(...)*$/)).index+!!x+!!y]+-~i+'AT'[+!x]
Try it online!
Regular expression
r = /(.A[AG]|.GA|T(.[AG]|(A|G).))(...)*$/
( | ) // 1st capturing group
.A[AG]|.GA // match ★AA, ★AG or ★GA
T(.[AG]|(A|G).) // 'T' + 2nd & 3rd capturing groups:
// match T★A, T★G, TA★ or TG★
(...)*$ // make sure that the number of trailing
// nucleotides is a multiple of 3
To summarize, this will match 21 codons that can be mutated to generate a STOP. By testing the 2nd and 3rd capturing groups, we can classify them into 3 categories describing the position of the nucleotide that needs to be altered.
2nd group | 3rd group | position | matching codons
-----------+-----------+----------+-------------------------------------------------------
not set | not set | 1st | (TAA) (TAG) (TGA) CAA CAG CGA AAA AAG AGA GAA GAG GGA
set | not set | 2nd | TTA TTG TCA TCG TGG
set | set | 3rd | TAT TAC TGT TGC
Commented
s => // s = input DNA sequence
s[ // append the original nucleotide ...
i = // ... which is at position i (0-based)
( [,, x, y] = // x = 2nd capturing group, y = 3rd capturing group
s.match(r) // apply the regular expression to s
).index // get the position of the matching codon
+ !!x + !!y // add 2 if x and y are set, 1 if only x is set, 0 if x is not set
] //
+ -~i // append the 1-indexed position
+ 'AT'[+!x] // append the new nucleotide: 'T' if x is not set, 'A' otherwise
Perl 5, 73 72 bytes
-1 byte using anchor at the end instead of the start
/(?|(.)(A[AG]|GA)|T(.)[AG]|T[AG](.))(...)*$/;$_="$+[1]$1>".($-[1]%3?A:T)
TIO
/^(...)*?(?|(.)(A[AG]|GA)|T(.)[AG]|T[AG](.))/;$_="$+[2]$2>".($-[2]%3?A:T)
Try it online!
Jelly, 30 bytes
os3ḅ4f“EGM‘Ḋ
JṬ€×Ɱ4ZẎçƇ⁸ḢĖżW€Ṁ
A monadic Link accepting a list of integers (in [1,2,3,4]
mapping to ACGT
respectively) which yields a list of lists of integers, [[position, new nucleotide], original nucleotide]
.
(A valid sequence with start, and stop but no possible substitution that would cause early termination will output [[4]]
)
Try it online!
...Or see this version which performs both a translation from the ACGT
input format, and a translation to the {position}{original nucleotide}>{new nucleotide}
output format.
How?
os3ḅ4f“EGM‘Ḋ - helper Link: [0,...,0,new nucleotide], N; DNA integer list, A
o - (N) logical OR (A) (vectorises)
s3 - split into chunks of three
ḅ4 - convert from base four to integer
“EGM‘ - code-page index list = [69,71,77]
f - filter keep (i.e. only keep stops)
Ḋ - dequeue (so if only a single stop was found and hence no early
stops we'll have an empty list which is falsey)
JṬ€×Ɱ4ZẎçƇ⁸ḢĖżW€Ṁ - Main Link: DNA integer list, A
J - range of length (of A) -> [1,2,...]
Ṭ€ - untruth each -> [[1],[0,1],[0,0,1],...]
×Ɱ4 - map across [1,2,3,4[ performing multiplication
Z - transpose
Ẏ - tighten -> [[1],[2],[3],[4],[0,1],[0,2],[0,3],[0,4],[0,0,1],...]
- (call that X)
⁸ - use chain's left argument, A, as the right argument
Ƈ - filter keep those (x in X) for which this is truthy:
ç - call the helper link as a dyad f(x, A)
Ḣ - head (which will be the shortest)
Ė - enumerate (that)
W€ - wrap each (integer a in A) in a list
ż - zip left and right together
Ṁ - get the maximum