Fully-justify & hyphenate a block of text
GNU sed -r
, 621 bytes
Takes input as two lines: The width as a unary number first and the string second.
I'm certain this could be golfed much more but I've already dumped way too much time into it.
x;N
G
s/\n/!@/
:
/@\n/bZ
s/-!(.*)@ /\1 !@/
s/!(.*[- ])(@.*1)$/\1!\2/
s/@(.)(.*)1$/\1@\2/
s/-!(.*-)(@.*)\n$/\1!\2\n1/
s/(\n!@) /\1/
s/-!(.* )(@.*)\n$/\1!\2\n1/
s/-!(.*-)(@.*1)$/\1!\21/
s/!(.*)-@([^ ]) /\1\2!@ /
t
s/ !@(.*)\n$/\n!@\1#/
s/!(.*-)@(.*)\n$/\1\n!@\2#/
s/!(.*)(@ | @)(.*)\n$/\1\n!@\3#/
s/-!(.*[^-])@([^ ]) (.*)\n$/\1\2\n!@\3#/
s/!(.+)@([^ ].*)\n$/\n!@\1\2#/
/#|!@.*\n$/{s/#|\n$//;G;b}
:Z
s/-?!|@.*//g
s/ \n/\n/g
s/^/%/
:B
G
/%.*\n.+\n/!bQ
:C
s/%([^\n])(.*)1$/\1%\2/
tC
s/([^\n]+)%\n/%\1\n/
:D
s/%([^ \n]* )(.*)1$/\1 %\2/
tD
s/(^|\n)([^\n]+)%(.*1)$/\1%\2\3/
tD
s/%([^\n]*)\n(.*)\n$/\1\n%\2/
tB
:Q
s/%(.*)\n1*$/\1/
Try it online!
Explanation
The program works in two phases: 1. Split, and 2. Justify. For the below, suppose our input is:
111111111111 I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.
Setup
First we read the input, moving the first line (the width as a unary number) to the hold space (x
), then appending the next line (N
) and then a copy of the width from the hold space (G
) to the pattern space. Since N
left us with a leading \n
we replace it with !@
, which we'll use as cursors in Phase 1.
x;N
G
s/\n/!@/
Now the content of the hold space is 1111111111111
(and won't change henceforth) and the pattern space is (in the format of sed's "print unambiguously" l
command):
!@I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$
Phase 1
In Phase 1, the main @
cursor advances one character at a time, and for each character a 1
is removed from the "counter" at the end of the pattern space. In other words, @foo\n111$
, f@oo\n11$
, fo@o\n1$
, etc.
The !
cursor trails behind the @
cursor, marking places we could break if the counter reaches 0 in the middle of the line. A couple rounds would look like this:
!@I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$ !I@ re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n11111111111$ !I @re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111111$
Here there's a pattern we recognize: a space immediately followed by the @
cursor. Since the counter is greater than 0, we advance the break marker, then keep advancing the main cursor:
I !@re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111111$ I !r@e-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111$ I !re@-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n11111111$ I !re-@mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111$
Here's another pattern: -@
, and we still have 7 in the counter, so we advance the break cursor again and keep advancing:
I re-!mem-@ber a time of cha-os, ru-ined dreams, this was-ted land.\n111$
Here's a different pattern: A hyphen immediately preceding the break cursor and another preceding the main cursor. We remove the first hyphen, advance the break cursor, and, since we removed a character, add 1 to the counter.
I remem-!@ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111$
We keep advancing the main cursor:
I remem-!ber@ a time of cha-os, ru-ined dreams, this was-ted land.\n1$
Similar to before, but this time the main cursor precedes a space rather than following a hyphen. We remove the hyphen, but since we're also advancing the main cursor we neither increment no decrement the counter.
I remember !@a time of cha-os, ru-ined dreams, this was-ted land.\n1$ I remember !a@ time of cha-os, ru-ined dreams, this was-ted land.\n$
Finally, our counter has reached zero. Since the character after the main cursor is a space, we insert a newline and put both cursors immediately after it. Then we replenish the counter (G
) and start again.
I remember a\n!@ time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$
Phase 1 continues, advancing the cursors and matching various patterns, until the @
cursor reaches the end of the string.
# Phase 1
:
# End of string; branch to :Z (end of phase 1)
/@\n/bZ
# Match -!.*@_
s/-!(.*)@ /\1 !@/
# Match [-_]@ and >0
s/!(.*[- ])(@.*1)$/\1!\2/
# Advance cursor
s/@(.)(.*)1$/\1@\2/
# Match -!.*-@ and 0; add 1
s/-!(.*-)(@.*)\n$/\1!\2\n1/
# Match \n!@_
s/(\n!@) /\1/
# Match -!.*_@ and 0; add 1
s/-!(.* )(@.*)\n$/\1!\2\n1/
# Match -!.*-@ and >0; add 1
s/-!(.*-)(@.*1)$/\1!\21/
# Match -@[^_]_
s/!(.*)-@([^ ]) /\1\2!@ /
# If there were any matches, branch to `:`
t
# Match _!@ and 0
s/ !@(.*)\n$/\n!@\1#/
# Match -@ and 0
s/!(.*-)@(.*)\n$/\1\n!@\2#/
# Match @_|_@ and 0
s/!(.*)(@ | @)(.*)\n$/\1\n!@\3#/
# Match -!.*[^-]@[^_]_ and 0
s/-!(.*[^-])@([^ ]) (.*)\n$/\1\2\n!@\3#/
# Match !.+@[^_] and 0
s/!(.+)@([^ ].*)\n$/\n!@\1\2#/
# Match marked line (#) or !@ and 0
/#|!@.*\n$/{
# Remove mark; append width and branch to `:`
s/#|\n$//
G
b
}
:Z
# Cleanup
s/-?!|@.*//g
s/ \n/\n/g
At the end of Phase 1, our pattern space looks like this:
I remember a\ntime of cha-\nos, ruined\ndreams, this\nwasted land.
Or:
I remember a time of cha- os, ruined dreams, this wasted land.
Phase 2
In Phase 2 we use %
as a cursor and use the counter in a similar way, beginning like this:
%I remember a\ntime of cha-\nos, ruined\ndreams, this\nwasted land.\n111111111111$
First, we count the characters on the first line by advancing the cursor and removing 1s from the counter, after which we have;
I remember a%\ntime of cha-\nos, ruined\ndreams, this\nwasted land.\n$
Since the counter is 0, we don't do anything else on this line. The second line also has the same number of characters as the counter, so let's skip to the third line:
I remember a\ntime of cha-\nos, ruined%\ndreams, this\nwasted land.\n11$
The counter is greater than 0, so we move the cursor back to the beginning of the line. Then we find the first run of spaces and add a space, decrementing the counter.
I remember a\ntime of cha-\nos, % ruined\ndreams, this\nwasted land.\n1$
The counter is greater than 0; since the cursor is already in the last (only) run of spaces on the line, we move it back to the beginning of the line and do it again:
I remember a\ntime of cha-\nos, % ruined\ndreams, this\nwasted land.\n$
Now the counter is 0, so we move the cursor to the beginning of the next line. We repeat this for every line except the last. That's the end of phase 2 and the end of the program! The final result is:
I remember a time of cha- os, ruined dreams, this wasted land.
# Phase 2
# Insert cursor
s/^/%/
:B
# Append counter from hold space
G
# This is the last line; branch to :Q (end of phase 1)
/%.*\n.+\n/!bQ
:C
# Count characters
s/%([^\n])(.*)1$/\1%\2/
tC
# Move cursor to beginning of line
s/([^\n]+)%\n/%\1\n/
:D
# Add one to each space on the line as long as counter is >0
s/%([^ \n]* )(.*)1$/\1 %\2/
tD
# Counter is still >0; go back to beginning of line
s/(^|\n)([^\n]+)%(.*1)$/\1%\2\3/
tD
# Counter is 0; move cursor to next line and branch to :B
s/%([^\n]*)\n(.*)\n$/\1\n%\2/
tB
:Q
# Remove cursor, any remaining 1s
s/%(.*)\n1*$/\1/
JavaScript (ES6), 218 bytes
w=>s=>s.map((c,i)=>c.map((p,j)=>(k+p)[l="length"]-w-(b=!i|j>0)+(j<c[l]-1)<0?k+=b?p:" "+p:(Array(w-k[l]-b).fill(h=k.split` `).map((_,i)=>h[i%(h[l]-1)]+=" "),o.push(h.join` `+(b?"-":"")),k=p)),o=[],k="")&&o.join`
`+`
`+k
Takes arguments in currying syntax (f(width)(text)
) and the text input is in the double array format described in the challenge. Strings are converted to that format via .split` `.map(a=>a.split`-`))
. Also, the newlines are literal newlines inside template strings.
Un-golfed and rearranged
width=>string=> {
out=[];
line="";
string.map((word,i)=> {
word.map((part,j)=> {
noSpaceBefore = i==0 || j>0;
if ((line+part).length - width - noSpaceBefore + (j<word.length-1) < 0) {
line += noSpaceBefore ? part : " "+part;
}
else {
words=line.split` `;
Array(width - line.length - noSpaceBefore).fill()
.map((_,i) => words[i % (words.length-1)] += " ");
out.push(words.join(" ") + (noSpaceBefore? "-" : ""));
line=part;
}
});
});
return out.join("\n") + "\n"+line
}
The idea here was to step through each part of the entire string and build up each line one part at a time. Once a line was complete, it increases the word spacing from left to right until all extra spaces are placed.
Test Snippet
f=
w=>s=>s.map((c,i)=>c.map((p,j)=>(k+p)[l="length"]-w-(b=!i|j>0)+(j<c[l]-1)<0?k+=b?p:" "+p:(Array(w-k[l]-b).fill(h=k.split` `).map((_,i)=>h[i%(h[l]-1)]+=" "),o.push(h.join` `+(b?"-":"")),k=p)),o=[],k="")&&o.join`
`+`
`+k
<style>*{font-family:Consolas,monospace;}</style>
<div oninput="O.innerHTML=f(+W.value)(S.value.split` `.map(a=>a.split`-`))">
Width: <input type="number" size="3" min="5" max="100" id="W">
Tests: <select id="T" style="width:20em" oninput="let x=T.value.indexOf(','),s=T.value;W.value=s.slice(0,x);S.value=s.slice(x+2)"><option></option><option>20, The q-uick brown fox ju-mp-s ove-r t-h-e lazy dog.</option><option>32, Given a width and a block of text cont-ain-ing pos-sible hyphen-ation points, for-mat it ful-ly-just-ified (in mono-space).</option><option>80, Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.</option><option>20, Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.</option><option>5, a b c d e f g h i j k l mm nn oo p-p qq rr ss t u vv ww x yy z</option><option>10, It's the bl-ack be-ast of Araghhhhh-hhh-h-hhh-h-h-h-hh</option></select><br>
Text: <textarea id="S" cols="55" rows="4"></textarea>
</div>
<pre id="O" style="border: 1px solid black;display:inline-block;"></pre>
JavaScript (ES6), 147 bytes
Takes input as (width)(text)
.
w=>F=(s,p=S=' ')=>(g=([c,...b],o='',h=c=='-')=>c?o[w-1]?c==S&&o+`
`+F(b):o[w+~h]?o+c+`
`+F(b):c>S?g(b,h?o:o+c):g(b,o+p)||g(b,o+p+c):o)(s)||F(s,p+S)
Try it online!
Commented
w => // w = requested width
F = ( // F is a recursive function taking:
s, // s = either the input string (first iteration) or an
// array of remaining characters (next iterations)
p = // p = current space padding
S = ' ' // S = space character
) => ( //
g = ( // g is a recursive function taking:
[c, // c = next character
...b], // b[] = array of remaining characters
o = '', // o = output for the current line
h = c == '-' // h = flag set if c is a hyphen
) => //
c ? // if c is defined:
o[w - 1] ? // if the line is full:
c == S && // fail if c is not a space
o + `\n` + F(b) // otherwise, append o + a linefeed and process the
// next line
: // else:
o[w + ~h] ? // if this is the last character and c is a hyphen:
o + c + `\n` + F(b) // append o + c + a linefeed and process the next
// line
: // else, we process the next character:
c > S ? // if c is not a space:
g(b, h ? o : o + c) // append c if it's not a hyphen
: // else:
g(b, o + p) || // append either the current space padding
g(b, o + p + c) // or the current padding and one extra space
: // else:
o // success: return o
)(s) // initial call to g() with s
|| F(s, p + S) // in case of failure, try again with a larger padding