Mutate my DNA sequence

JavaScript (ES6), 124 ... 93 92 bytes

Output format: XiY, with X = original nucleotide, i = position, Y = new nucleotide.

s=>s[i=([,,x,y]=s.match(/(.A[AG]|.GA|T(.[AG]|(A|G).))(...)*$/)).index+!!x+!!y]+-~i+'AT'[+!x]

Try it online!

Regular expression

r = /(.A[AG]|.GA|T(.[AG]|(A|G).))(...)*$/
     (          |               )           // 1st capturing group
      .A[AG]|.GA                            // match ★AA, ★AG or ★GA
                 T(.[AG]|(A|G).)            // 'T' + 2nd & 3rd capturing groups:
                                            // match T★A, T★G, TA★ or TG★
                                 (...)*$    // make sure that the number of trailing
                                            // nucleotides is a multiple of 3

To summarize, this will match 21 codons that can be mutated to generate a STOP. By testing the 2nd and 3rd capturing groups, we can classify them into 3 categories describing the position of the nucleotide that needs to be altered.

 2nd group | 3rd group | position | matching codons
-----------+-----------+----------+-------------------------------------------------------
  not set  |  not set  |    1st   | (TAA) (TAG) (TGA) CAA CAG CGA AAA AAG AGA GAA GAG GGA
    set    |  not set  |    2nd   | TTA TTG TCA TCG TGG
    set    |    set    |    3rd   | TAT TAC TGT TGC

Commented

s =>               // s = input DNA sequence
  s[               // append the original nucleotide ...
    i =            //   ... which is at position i (0-based)
    ( [,, x, y] =  //   x = 2nd capturing group, y = 3rd capturing group
        s.match(r) //   apply the regular expression to s
    ).index        //   get the position of the matching codon
    + !!x + !!y    //   add 2 if x and y are set, 1 if only x is set, 0 if x is not set
  ]                //
  + -~i            // append the 1-indexed position
  + 'AT'[+!x]      // append the new nucleotide: 'T' if x is not set, 'A' otherwise

Perl 5, 73 72 bytes

-1 byte using anchor at the end instead of the start

/(?|(.)(A[AG]|GA)|T(.)[AG]|T[AG](.))(...)*$/;$_="$+[1]$1>".($-[1]%3?A:T)

TIO

/^(...)*?(?|(.)(A[AG]|GA)|T(.)[AG]|T[AG](.))/;$_="$+[2]$2>".($-[2]%3?A:T)

Try it online!

Jelly, 30 bytes

os3ḅ4f“EGM‘Ḋ
JṬ€×Ɱ4ZẎçƇ⁸ḢĖżW€Ṁ

A monadic Link accepting a list of integers (in [1,2,3,4] mapping to ACGT respectively) which yields a list of lists of integers, [[position, new nucleotide], original nucleotide].

(A valid sequence with start, and stop but no possible substitution that would cause early termination will output [[4]])

Try it online!
...Or see this version which performs both a translation from the ACGT input format, and a translation to the {position}{original nucleotide}>{new nucleotide} output format.

How?

os3ḅ4f“EGM‘Ḋ      - helper Link: [0,...,0,new nucleotide], N; DNA integer list, A
o                 - (N) logical OR (A) (vectorises)
 s3               - split into chunks of three
   ḅ4             - convert from base four to integer
      “EGM‘       - code-page index list = [69,71,77]
     f            - filter keep (i.e. only keep stops)
           Ḋ      - dequeue (so if only a single stop was found and hence no early
                             stops we'll have an empty list which is falsey)

JṬ€×Ɱ4ZẎçƇ⁸ḢĖżW€Ṁ - Main Link: DNA integer list, A
J                 - range of length (of A) -> [1,2,...]
 Ṭ€               - untruth each -> [[1],[0,1],[0,0,1],...]
   ×Ɱ4            - map across [1,2,3,4[ performing multiplication
      Z           - transpose
       Ẏ          - tighten -> [[1],[2],[3],[4],[0,1],[0,2],[0,3],[0,4],[0,0,1],...]
                  - (call that X)
          ⁸       - use chain's left argument, A, as the right argument
         Ƈ        - filter keep those (x in X) for which this is truthy:
        ç         -   call the helper link as a dyad f(x, A)
           Ḣ      - head (which will be the shortest)
            Ė     - enumerate (that)
              W€  - wrap each (integer a in A) in a list
             ż    - zip left and right together
                Ṁ - get the maximum

Mutate my DNA sequence

JavaScript (ES6), 124 ... 93 92 bytes

Regular expression

Commented

Perl 5, 73 72 bytes

Jelly, 30 bytes

How?

Tags:

String

Code Golf

Bioinformatics

Related

Recent Posts