Listing with background color not line breaking correctly
See the bottom of this answer for a method not using any package.
Update: see the bottom of the bottom for a more flexible yet environment.
(the update has been updated)
Using
alltt
rather than listings
and some modifications in your input:
- replace
\colorbox
by\ccolorbox
(to use some macro defined next) - replace the
!
at the start of non colored segments by\!
- get rid of all other
!
Here is the code snippet:
\documentclass{article}
\def\cccolorbox#1#2{\ifx#2\relax\let\next\allowbreak\else
\def\next{\colorbox{#1}{#2}\allowbreak\cccolorbox{#1}}\fi\next}
\def\ccolorbox#1#2{\fboxsep0pt\cccolorbox{#1}#2\relax}
\def\!#1{\ifx#1\ccolorbox\allowbreak\expandafter\ccolorbox\else
\ifx#1\end\expandafter\expandafter\expandafter\end\else
#1\allowbreak\expandafter\expandafter\expandafter\!\fi\fi}
\usepackage{alltt}
\usepackage{color}
\begin{document}\pagestyle{empty}
\begin{alltt}
>unknown protein sequence
\ccolorbox{red}{MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS}\!SAISVKNVHRTRFHFQPPKHWINDPNAP\ccolorbox{red}{MYYNG}\ccolorbox{red}{VY}\!HLFYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY\ccolorbox{red}{PSKKFDKYGTWSGSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL}\!IVPDNSINRTEFRDPTTA\ccolorbox{red}{WMGQDGLWRILIASMRKHRGMALLYRSRDFMKWIKAQ}\ccolorbox{red}{HPLHSSTN}\ccolorbox{red}{TGNWECPDFFPVLFNSTNGLDVSYR}\!GKNVKYVLKNSLDVARFD\ccolorbox{red}{YYTIGMYHTKIDRYIPNNNSIDGWKGL}\!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGIQGIPRQVWLNLSGKQLLQWPIEELE\ccolorbox{red}{TLRKQKVQLNNKKLSKGEM}\!FEVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG\ccolorbox{red}{STIQGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM}\ccolorbox{red}{CSDARR}\!SSMR\ccolorbox{red}{QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKTCITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN}
\end{alltt}
\end{document}
And to avoid having irregular boxes, use an additional \strut
:
\def\cccolorbox#1#2{\ifx#2\relax\let\next\allowbreak\else
\def\next{\colorbox{#1}{\strut #2}\allowbreak\cccolorbox{#1}}\fi\next}
(the ABC
at the end is because I added \!ABC
for testing purposes at the end of the protein sequence)
One may also question the need for an alltt
environment. Just using a \ttfamily
(with a \\
after ''unknown protein sequence'') should be enough (with some modification to the \!
code which in its current version checks for an \end
).
Here is now a solution along those lines. It does not use any package (apart from
color
). Put this in the preamble
\catcode`\?=\active\catcode`\!=\active
\newenvironment{proteinlisting}
{\fboxsep0pt\catcode`\?=\active\catcode`\!=\active
\def!##1{\ifx##1!\let\next!\else
\ifx##1?\let\next?\else
\ifx##1\end\let\next\end\else
##1\allowbreak\let\next!\fi\fi\fi\next}%
\def?##1{\ifx##1!\let\next!\else
\ifx##1?\let\next?\else
\ifx##1\end\let\next\end\else
\colorbox{red}{\strut ##1}\allowbreak\let\next?\fi\fi\fi\next}%
\ttfamily}{\par}
\catcode`\?=12 \catcode`\!=12
Then prefix with ?
colored segments of your sequence and with !
uncolored ones inside a proteinlisting
environment:
\begin{proteinlisting}
\noindent>unknown protein sequence\\
\noindent?MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS!SAISVKNVHRTRFHFQPPKHWINDPNAP?MYYNG?VY!HLFYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY?PSKKFDKYGTWSGSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL!IVPDNSINRTEFRDPTTA?WMGQDGLWRILIASMRKHRGMALLYRSRDFMKWIKAQ?HPLHSSTN?TGNWECPDFFPVLFNSTNGLDVSYR!GKNVKYVLKNSLDVARFD?YYTIGMYHTKIDRYIPNNNSIDGWKGL!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGIQGIPRQVWLNLSGKQLLQWPIEELE?TLRKQKVQLNNKKLSKGEM!FEVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG?STIQGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM?CSDARR!SSMR?QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKTCITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN
\end{proteinlisting}
The output is the same as in the previous proposal. The environment may be customized to use other colors and more markers, for different colors, just imitate the code.
Ok, here is one such final variant. In the preamble of the document:
\catcode`\?=\active \catcode`\!=\active
\newenvironment{proteinseqlst}[1][60]
{\fboxsep0pt \catcode`\?=\active \catcode`\!=\active
\ttfamily
\setbox0=\hbox{A}\hsize=\wd0 \multiply\hsize by #1\relax
\def!##1{\ifx##1!\let\next!\else
\ifx##1?\let\next?\else
\ifx##1\end\let\next\end\else
##1\allowbreak\let\next!\fi\fi\fi\next}%
\def?##1##2{\ifx##2!\let\next!\else
\ifx##2?\let\next?\else
\ifx##2\end\let\next\end\else
\colorbox{##1}{\strut ##2}\allowbreak\def\next{?{##1}}\fi\fi\fi\next}%
}{\par}
\catcode`\?=12 \catcode`\!=12
This environment typesets the amino acid sequences with a number of letters per line indicated as an optional parameter (hence withing square brackets). The default is 60. Line breaks in the source are allowed and have no influence on the output (no empty line though, else an error will be raised on the tex run).
This environment allows arbitrary colors. One puts the desired colors within braces after a ?
. One prefixes uncolored segments with a !
. Note that the xcolor
syntax as in ?{yellow!20}
is allowed, the !
inside the braces will be treated by xcolor
, not by the environment definition.
And the tex run produces no overfull boxes warnings in the log file.
With the following input (arbitrarily here 53 characters per input line, has no influence on the output):
\begin{proteinseqlst}[40]
\noindent>unknown protein sequence\\
\noindent
?{yellow}MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS!
SAISVKNVHRTRFHFQPPKHWINDPNAP?{blue}MYYNG?{green}VY!HL
FYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY?{red}PSKKFDKYGTWS
GSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL!I
VPDNSINRTEFRDPTTA?{green}WMGQDGLWRILIASMRKHRGMALLYRSR
DFMKWIKAQ?{yellow}HPLHSSTN?{blue}TGNWECPDFFPVLFNSTNGL
DVSYR!GKNVKYVLKNSLDVARFD?{yellow}YYTIGMYHTKIDRYIPNNNS
IDGWKGL!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGI
QGIPRQVWLNLSGKQLLQWPIEELE?{blue}TLRKQKVQLNNKKLSKGEM!F
EVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG?{red}STI
QGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM?{blue}CSDARR!S
SMR?{red}QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKT
CITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN
\end{proteinseqlst}
the output is: