Implementing Text Justification with Dynamic Programming
i just saw the lecture and thought would put here whatever I could understand. I have put in the code in the similar format as that of the questioner. I have used recursion here, as the lecture had explained.
Point #3, defines recurrence. This is basically a bottom to approach, where in you calculate a value of the function pertaining to a higher input earlier, and then use it to calculate the for the lower valued input.
The lecture explains it as :
DP(i) = min(DP(j) + badness(i, j))
for j which varies from i+1 to n.
Here, i varies from n to 0 (bottom to top!).
As DP(n) = 0 ,
DP(n-1) = DP(n) + badness(n-1, n)
and then you calculate D(n-2) from D(n-1) and D(n) and take a minimum out of them.
This way you can go down till i=0 and that's the final answer of badness!
In point #4, as you can see, there are two loops going on here. One for i and the other inside i for j.
Hence, when i=0, j(max) = n, i = 1, j(max) = n-1, ... i = n , j(max) = 0.
hence total time = addition of these = n(n+1)/2.
Hence O(n^2).
Point #5 just identifies the solution which DP[0]!
Hope this helps!
import math
justification_map = {}
min_map = {}
def total_length(str_arr):
total = 0
for string in str_arr:
total = total + len(string)
total = total + len(str_arr) - 1 # spaces
return total
def badness(str_arr, page_width):
line_len = total_length(str_arr)
if line_len > page_width:
return float('nan')
else:
return math.pow(page_width - line_len, 3)
def justify(i, n, words, page_width):
if i == n:
return 0
ans = []
for j in range(i+1, n+1):
#ans.append(justify(j, n, words, page_width)+ badness(words[i:j], page_width))
ans.append(justification_map[j]+ badness(words[i:j], page_width))
min_map[i] = ans.index(min(ans)) + 1
return min(ans)
def main():
print "Enter page width"
page_width = input()
print "Enter text"
paragraph = input()
words = paragraph.split(' ')
n = len(words)
#justification_map[n] = 0
for i in reversed(range(n+1)):
justification_map[i] = justify(i, n, words, page_width)
print "Minimum badness achieved: ", justification_map[0]
key = 0
while(key <n):
key = key + min_map[key]
print key
if __name__ == '__main__':
main()
For anyone else still interested in this: The key is to move backwards from the end of the text (as mentioned here). If you do so, you just compare already memorized elements.
Say, words
is a list of strings to be wrapped according to textwidth
. Then, in the notation of the lecture, the task reduces to three lines of code:
import numpy as np
textwidth = 80
DP = [0]*(len(words)+1)
for i in range(len(words)-1,-1,-1):
DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)])
With:
def badness(line,textwidth):
# Number of gaps
length_line = len(line) - 1
for word in line:
length_line += len(word)
if length_line > textwidth: return float('inf')
return ( textwidth - length_line )**3
He mentions that one can add a second list to keep track of the breaking positions. You can do so by altering to code to:
DP = [0]*(len(words)+1)
breaks = [0]*(len(words)+1)
for i in range(len(words)-1,-1,-1):
temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)]
index = np.argmin(temp)
# Index plus position in upper list
breaks[i] = index + i + 1
DP[i] = temp[index]
To recover the text, just use the list of breaking positions:
def reconstruct_text(words,breaks):
lines = []
linebreaks = []
i = 0
while True:
linebreaks.append(breaks[i])
i = breaks[i]
if i == len(words):
linebreaks.append(0)
break
for i in range( len(linebreaks) ):
lines.append( ' '.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() )
return lines
Result: (text = reconstruct_text(words,breaks)
)
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
One might be tempted to add some whitespaces. This is pretty tricky (since one might come up with various aesthetic rules) but a naive try might be:
import re
def spacing(text,textwidth,maxspace=4):
for i in range(len(text)):
length_line = len(text[i])
if length_line < textwidth:
status_length = length_line
whitespaces_remain = textwidth - status_length
Nwhitespaces = text[i].count(' ')
# If whitespaces (to add) per whitespace exeeds
# maxspace, don't do anything.
if whitespaces_remain/Nwhitespaces > maxspace-1:pass
else:
text[i] = text[i].replace(' ',' '*( 1 + int(whitespaces_remain/Nwhitespaces)) )
status_length = len(text[i])
# Periods have highest priority for whitespace insertion
periods = text[i].split('.')
# Can we add a whitespace behind each period?
if len(periods) - 1 + status_length <= textwidth:
text[i] = '. '.join(periods).strip()
status_length = len(text[i])
whitespaces_remain = textwidth - status_length
Nwords = len(text[i].split())
Ngaps = Nwords - 1
if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain
# List of whitespaces in line i
gaps = re.findall('\s+', text[i])
temp = text[i].split()
for k in range(Ngaps):
temp[k] = ''.join([temp[k],gaps[k]])
for j in range(whitespaces_remain):
if status_length >= textwidth:pass
else:
replace = temp[int(factor*j)]
replace = ''.join([replace, " "])
temp[int(factor*j)] = replace
text[i] = ''.join(temp)
return text
What gives you: (text = spacing(text,textwidth)
)
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
In case you have trouble understanding the core idea of dynamic programming itself here is my take on it:
Dynamic programming is essentially sacrificing space complexity for time complexity (but the extra space you use is usually very little compared to the time you save, making dynamic programming totally worth it if implemented correctly). You store the values from each recursive call as you go (e.g. in an array or a dictionary) so you can avoid computing for the second time when you run into the same recursive call in another branch of the recursion tree.
And no you do not have to use recursion. Here is my implementation of the question you were working on using just loops. I followed the TextAlignment.pdf linked by AlexSilva very closely. Hopefully you find this helpful.
def length(wordLengths, i, j):
return sum(wordLengths[i- 1:j]) + j - i + 1
def breakLine(text, L):
# wl = lengths of words
wl = [len(word) for word in text.split()]
# n = number of words in the text
n = len(wl)
# total badness of a text l1 ... li
m = dict()
# initialization
m[0] = 0
# auxiliary array
s = dict()
# the actual algorithm
for i in range(1, n + 1):
sums = dict()
k = i
while (length(wl, k, i) <= L and k > 0):
sums[(L - length(wl, k, i))**3 + m[k - 1]] = k
k -= 1
m[i] = min(sums)
s[i] = sums[min(sums)]
# actually do the splitting by working backwords
line = 1
while n > 1:
print("line " + str(line) + ": " + str(s[n]) + "->" + str(n))
n = s[n] - 1
line += 1