How to break long words after n chars (long genomic sequences)

The seqsplit package will break up such expressions, by adding suitable break points. It is designed exactly for these types of DNA sequences, and copes in a sophisticated way with various forms of input. However, you wish to break your material after a specific number of characters instead. This can be achieved with the commands provided by the xstring package, via its splitting command \StrSplit:

Sample output

\documentclass{article}

\usepackage{xstring,etoolbox}

\newcommand{\fixsplit}[2]{\StrLen{#2}[\mynum]\ifnumcomp{\mynum}{<}{\numexpr(#1)+1\relax}%
  {#2}%
  {\StrSplit{#2}{#1}{\myfirststr}{\mysecondstr}\myfirststr\linebreak
  \fixsplit{#1}{\mysecondstr}}}

\begin{document}


\begin{quote}
  \ttfamily
  \fixsplit{30}{CTCCTTGGGCTGTTATTCCGTAAAAGTATTTGTGGAAGATACGGCTGTCATACATGATATGTTTTTTGTTTATAACAATAGTTCTTTCTTTGATTTCACCATAGGTTGCCTCAAATTGCTCTTTTGTTGCTTGTCCAGCTGTTAAGACTAAATGTTTTGACCCCTCATTTATAAGACCGATTGCGTTGAATGGTAAGACATTCTGTTGTGCTGATTGTAATTCTGAATAGCTACGGATTTTTATGAAGATATAGTTTTTTAATATTGGTATTTCATTCCAGACATACTTCTGTATAAAGGATTTATTAAACGGTGTTGTTTTGATTGCTCTATAATACTTATCTTGTTGTCCTCTTAATTTTACCCAAGGTCTTTCAAACTCTTGGGAGTTAATGATTATAAGCATATTGTAAAGCTGTCCAGCTAATCCGAAGAATACTGGAAGCCAGTGGGTAAAGCTTGTCTGTTTTGGTAAAGCTGTTTGAACGTCTGACAAGAACAAGTCCAGACCTTCATATTTGTGGATTTTTTGAAACTTCATATTTTGATATGAACCGTCTACAATATCACTATATTTTACTGGTTGCCCAGTTTTTTGATTAATGTATCCAGGTCTTTAATATCTACTACTAAAACCACCGTAACCATAGTCCACGTTAGAGATATAGAGAGGTTTCGCATAAATGTGAACCCAGATTGCTTGTTGTTGTCTTTCATAACTCATTTGAAGACCAGTTTTAATGCGTTCTTTAATTGCTTGATACGTT}
\end{quote}

\end{document}

Note that I have chosen to print the result with a fixed width font, otherwise you get a rather strange effect. Also note that the way xstring works, results of operations usually have to be stored in a macro, rather than being used directly.


If you can change and replace selected text with your editor, then replace C by C\brk{}, G by G\brk{}, A by A\brk{} and T by T\brk{}in your long strings. If you don't want to have text disappearing out of view, use a % at linebreaks

\documentclass{article}

\newcommand*{\brk}{\discretionary{}{}{}\hfil}
\begin{document}

\noindent C\brk{}T\brk{}C\brk{}C\brk{}T\brk{}T\brk{}G\brk{}G\brk{}G\brk{}C\brk{}T\brk{}G\brk{}T\brk{}T\brk{}A\brk{}T\brk{}T\brk{}C\brk{}C\brk{}G\brk{}T\brk{}A\brk{}A\brk{}A\brk{}A\brk{}G\brk{}T\brk{}A\brk{}T\brk{}T\brk{}T\brk{}G\brk{}T\brk{}G\brk{}G\brk{}A\brk{}A\brk{}G\brk{}A\brk{}T\brk{}A\brk{}C\brk{}G\brk{}G\brk{}C\brk{}T\brk{}G\brk{}T\brk{}C\brk{}A\brk{}T\brk{}A\brk{}C\brk{}A\brk{}T\brk{}G\brk{}A\brk{}T\brk{}A\brk{}T\brk{}G\brk{}T\brk{}T\brk{}T\brk{}T\brk{}T\brk{}T\brk{}G\brk{}T\brk{}T\brk{}T\brk{}A\brk{}T\brk{}A\brk{}A\brk{}C\brk{}A\brk{}A\brk{}T\brk{}A\brk{}G\brk{}T\brk{}T\brk{}C\brk{}T\brk{}T\brk{}T\brk{}C\brk{}T\brk{}T\brk{}T\brk{}G\brk{}A\brk{}T\brk{}
\hfill\mbox{}

\medskip

\noindent 
C\brk{}T\brk{}C\brk{}C\brk{}T\brk{}T\brk{}G\brk{}G\brk{}G\brk{}C\brk{}%
T\brk{}G\brk{}T\brk{}T\brk{}A\brk{}T\brk{}T\brk{}C\brk{}C\brk{}G\brk{}%
T\brk{}A\brk{}A\brk{}A\brk{}A\brk{}G\brk{}T\brk{}A\brk{}T\brk{}T\brk{}%
T\brk{}G\brk{}T\brk{}G\brk{}G\brk{}A\brk{}A\brk{}G\brk{}A\brk{}T\brk{}%
A\brk{}C\brk{}G\brk{}G\brk{}C\brk{}T\brk{}G\brk{}T\brk{}C\brk{}A\brk{}%
T\brk{}A\brk{}C\brk{}A\brk{}T\brk{}G\brk{}A\brk{}T\brk{}A\brk{}T\brk{}%
G\brk{}T\brk{}T\brk{}T\brk{}T\brk{}T\brk{}T\brk{}G\brk{}T\brk{}T\brk{}%
T\brk{}A\brk{}T\brk{}A\brk{}A\brk{}C\brk{}A\brk{}A\brk{}T\brk{}A\brk{}%
G\brk{}T\brk{}T\brk{}C\brk{}T\brk{}T\brk{}T\brk{}C\brk{}T\brk{}T\brk{}%
T\brk{}G\brk{}A\brk{}T\brk{}
\hfill\mbox{}

\end{document}

Can you please try this one:

\documentclass{article}
\begin{document}
\parindent=0pt
\ttfamily

\makeatletter
\def\xfoo#1#2{\@tempcnta=0%
  \@tfor\xx:=#2\do{\advance\@tempcnta 1%
    \xx\ifnum\the\@tempcnta=#1\newline\@tempcnta=0\fi%
  }%
}

\xfoo{10}{CTCCTTGGGCTGTTATTCCGTAAAAGTATTTGTGGAAGATACGGCTGTCATACATGATATGTTTTTTGTTTATAACAATAGTTCTTTCTTTGATTTCACCATAGGTTGCCTCAAATTGCTCTTTTGTTGCTTGTCCAGCTGTTAAGACTAAATGTTTTGACCCCTCATTTATAAGACCGATTGCGTTGAATGGTAAGACATTCTGTTGTGCTGATTGTAATTCTGAATAGCTACGGATTTTTATGAAGATATAGTTTTTTAATATTGGTATTTCATTCCAGACATACTTCTGTATAAAGGATTTATTAAACGGTGTTGTTTTGATTGCTCTATAATACTTATCTTGTTGTCCTCTTAATTTTACCCAAGGTCTTTCAAACTCTTGGGAGTTAATGATTATAAGCATATTGTAAAGCTGTCCAGCTAATCCGAAGAATACTGGAAGCCAGTGGGTAAAGCTTGTCTGTTTTGGTAAAGCTGTTTGAACGTCTGACAAGAACAAGTCCAGACCTTCATATTTGTGGATTTTTTGAAACTTCATATTTTGATATGAACCGTCTACAATATCACTATATTTTACTGGTTGCCCAGTTTTTTGATTAATGTATCCAGGTCTTTAATATCTACTACTAAAACCACCGTAACCATAGTCCACGTTAGAGATATAGAGAGGTTTCGCATAAATGTGAACCCAGATTGCTTGTTGTTGTCTTTCATAACTCATTTGAAGACCAGTTTTAATGCGTTCTTTAATTGCTTGATACGTT}
\end{document}