Keep the unique characters down
CJam, 266 281 456 bytes * 14 12 7 unique = 3724 3372 3192
Try it online.
14301201124204202034420112034224204431020210101232301021240204310431312122132100240400222324402030223103420431324402222132223233141443401210314023001122320404112224314302132421403301243334313000011124244441400003310332301330220022110121411122100040310110020040121444302100143202204330334033211334242120304123121024200421121232100303121022431044444423243331440434010014400~~10~~100~~1~43c~c~4~~41c~100~~1~43c~c~123~~100~~1~43c~c~44~~14~~43c~c100~~40c~c43c~~
Explanation
The strategy I've used is to treat each character in the string as a base-123 digit and encode that as a decimal number in the program. The program then converts that number back to base 123 and maps each base-123 digit back to a character. Because it's hard to explain why the program is in its current state, I'll explain each version of it.
Here's what the end of the program looked like in the first version:
...2068438725 123b:c
This implements the strategy in the most straightforward way possible. The number, encoded normally in base 10, is converted back to base 123 and each base-123 digit is mapped back to a character. But this uses 4 unique non-digit characters, and being able to get rid of any one of them would likely be worth the size hit due to having to use less straightforward code.
First, I realized that I could get rid of the b
and the :
operators by creating them at runtime as their ASCII character values converted back to a character (with the already present c
operator) and evaluating them with the ~
operator. It turned out to be a little tricky to do this with the :
operator, since it has to be parsed together with the following c
operator. I solved this by producing the characters :
and c
and then producing and evaluating the character +
, which concatenates the former two characters into the string :c
which can then be evaluated properly.
Second, I realized that the ~
operator I just introduced had a handy new overloaded variant: when given a number, it produces the bitwise complement. By using this twice in a row after a number, I could introduce a token break in the source with no resultant computational effect, allowing me to replace the spaces used to separate numbers with ~~
.
The final result is 15 more bytes of code at the end, but this cost is greatly outweighed by the benefit of eliminating 2 unique characters out of 14. Here's a comparison of the end of the first version with the end of the second version:
...2068438725 123[ b ][ :c ]
...2068438725~~123~~98c~58c99c43c~~
Using any fewer than the 2 operators I was using would be impossible, but I still wanted fewer unique characters. So the next step was to eliminate digits. By changing the number's encoding so that each decimal digit was really a base-5 digit, I could potentially eliminate the digits 6-9. Before eliminating anything from the end of the prgoram, it looked like this:
...4010014400 10b5b123b:c
As mentioned before, eliminating the space is easy. But the b
, :
, and c
would not be so easy, as their character codes are 98
, 58
, and 99
, respectively. These all contained digits marked for elimination, so I had to find ways to derive them all. And the only useful numeric operators with character values not containing 5-9 were decrement, increment, multiply, and add.
For 98
, I initially used 100~~40c~40c~
, which decrements 100
twice. But then I realized I could make yet another use of the ~
operator, as bitwise complement lets me get negative numbers which, when added, let me emulate subtraction. So I then used 100~~1~43c~
, which adds 100
and -2
and is 2 bytes smaller. For 58
, I used 44~~14~~43c~
, which adds 44
and 14
. And for 99
, I used 100~~40c~
, which decrements 100
.
The final result is pretty big and obfuscated, but the cost of the significantly larger number and processing code were slightly outweighed by the big benefit of eliminating 5 unique characters out of 12. Here's a comparison of the final end of the program before eliminations and after eliminations:
...4010014400 10[ b ][ 5 ][ b ]123[ b ][ :c ]
...4010014400~~10~~100~~1~43c~c~4~~41c~100~~1~43c~c~123~~100~~1~43c~c~44~~14~~43c~c100~~40c~c43c~~
Whitespace, 1157 937 bytes * 3 unique = 3471 2811
By popular(?) request, I'm posting my whitespace solution.
In order to reduce the code needed, I hardcoded the entire string as one binary number (7 bits for each byte). A simple loop extracts the characters and prints them.
Source code on filebin.ca.
NOTE: The specifications allow for arbitrary large integers, but the Haskell interpreter on the official page is limited to 20 bits. Use, for example, this ruby interpreter on github/hostilefork/whitespaces.
The ruby script to create the whitespace program (l=WHITESPACE, t=TAB, u=NEWLINE, everything after // ignored, writes to a file prog.h
):
str = 'Elizabeth obnoxiously quoted (just too rowdy for my peace): "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG," giving me a look.'
def b2ws(bin)
bin.chars.map{|x|x=='0' ? "l" : "t"}.join
end
def d2ws(dec,pad=0)
b2ws(dec.to_s(2).rjust(pad,'0'))
end
bin = str.reverse.chars.map do |x|
" " + d2ws(x.getbyte(0),7) + " // char <#{x=="\n" ? "LF" : x}>"
end
data = "lll // pushes string as one large number\n#{bin.join("\n")}\nu"
File.open('prog.h','w') do |fout|
fout << data << "\n"
fout << <<-eos
// output string loop
ulllu // label 0
// extract, print, and remove the last 7 bits of the data
lul // dup
llltlllllllu // push 128
tltt // mod
tull // print as char
llltlllllllu // push 128
tltl // div
// break loop if EOS
lul // dup
utltu // jump to 1 if top of stack is zero
ululu // jump to 0
ulltu // label 1
uuu
eos
end
For illustration, the whitespace program in human-readable form. See below for a simple script to convert it to an actual whitespace program.
lll // pushes string as one large number
ltltttl // char <.>
ttltltt // char <k>
ttltttt // char <o>
ttltttt // char <o>
ttlttll // char <l>
ltlllll // char < >
ttllllt // char <a>
ltlllll // char < >
ttlltlt // char <e>
ttlttlt // char <m>
ltlllll // char < >
ttllttt // char <g>
ttltttl // char <n>
ttltllt // char <i>
tttlttl // char <v>
ttltllt // char <i>
ttllttt // char <g>
ltlllll // char < >
ltllltl // char <">
ltlttll // char <,>
tlllttt // char <G>
tlltttt // char <O>
tllltll // char <D>
ltlllll // char < >
tlttllt // char <Y>
tlttltl // char <Z>
tlllllt // char <A>
tllttll // char <L>
ltlllll // char < >
tllltlt // char <E>
tlltlll // char <H>
tltltll // char <T>
ltlllll // char < >
tltlltl // char <R>
tllltlt // char <E>
tltlttl // char <V>
tlltttt // char <O>
ltlllll // char < >
tltlltt // char <S>
tltllll // char <P>
tllttlt // char <M>
tltltlt // char <U>
tlltltl // char <J>
ltlllll // char < >
tlttlll // char <X>
tlltttt // char <O>
tlllttl // char <F>
ltlllll // char < >
tlltttl // char <N>
tltlttt // char <W>
tlltttt // char <O>
tltlltl // char <R>
tlllltl // char <B>
ltlllll // char < >
tlltltt // char <K>
tlllltt // char <C>
tlltllt // char <I>
tltltlt // char <U>
tltlllt // char <Q>
ltlllll // char < >
tllltlt // char <E>
tlltlll // char <H>
tltltll // char <T>
ltllltl // char <">
ltlllll // char < >
ltttltl // char <:>
ltltllt // char <)>
ttlltlt // char <e>
ttllltt // char <c>
ttllllt // char <a>
ttlltlt // char <e>
tttllll // char <p>
ltlllll // char < >
ttttllt // char <y>
ttlttlt // char <m>
ltlllll // char < >
tttlltl // char <r>
ttltttt // char <o>
ttllttl // char <f>
ltlllll // char < >
ttttllt // char <y>
ttlltll // char <d>
tttlttt // char <w>
ttltttt // char <o>
tttlltl // char <r>
ltlllll // char < >
ttltttt // char <o>
ttltttt // char <o>
tttltll // char <t>
ltlllll // char < >
tttltll // char <t>
tttlltt // char <s>
tttltlt // char <u>
ttltltl // char <j>
ltltlll // char <(>
ltlllll // char < >
ttlltll // char <d>
ttlltlt // char <e>
tttltll // char <t>
ttltttt // char <o>
tttltlt // char <u>
tttlllt // char <q>
ltlllll // char < >
ttttllt // char <y>
ttlttll // char <l>
tttlltt // char <s>
tttltlt // char <u>
ttltttt // char <o>
ttltllt // char <i>
ttttlll // char <x>
ttltttt // char <o>
ttltttl // char <n>
ttllltl // char <b>
ttltttt // char <o>
ltlllll // char < >
ttltlll // char <h>
tttltll // char <t>
ttlltlt // char <e>
ttllltl // char <b>
ttllllt // char <a>
ttttltl // char <z>
ttltllt // char <i>
ttlttll // char <l>
tllltlt // char <E>
u
// output string loop
ulllu // label 0
// extract, print, and remove the last 7 bits of the data
lul // dup
llltlllllllu // push 128
tltt // mod
tull // print as char
llltlllllllu // push 128
tltl // div
// break loop if EOS
lul // dup
utltu // jump to 1 if top of stack is zero
ululu // jump to 0
ulltu // label 1
uuu
Basically, the string to output is a long integer, and you need to reduce its score.
Take a number x, and convert it to base b. Its length will be
floor(log_b(x)+1)
, and it will containb
different symbols. So the score isb*floor(log_b(x)+1)
.x
is a given large number, and if you plot this for b, you'll find the minimum is pretty much atb=3
(andb=2
is almost as good). Ie, the length reduces slightly as you use higher bases (log), but size of the charset increases linearly, so it isn't worth it.
Thus I looked for a language with only 0/1's, but I didn't find any, and then I remembered there was whitespace and tried it. In whitespace, you can enter binary numbers with 0's and 1's directly.
Old code, worse score but more interesting
Old code on filebin.
The ruby script I used for creating the program (l=WHITESPACE, t=TAB, u=NEWLINE, everything after //
ignored, writes to a file prog.h
):
str = 'Elizabeth obnoxiously quoted (just too rowdy for my peace): "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG," giving me a look.' + "\n"
def shift(x)
(x-97)%128
end
EOS = "lttltl" #26
STACK = []
bin = str.reverse.chars.map do |x|
byte = shift(x.getbyte(0))
rf = STACK.index(byte)
STACK.unshift(byte)
y = byte.to_s(2).chars.map{|y|y=='0' ? 'l' : 't'}.join
ph = "lll#{y}u" # pushing directly
if rf
bn = rf.to_s(2).chars.map{|y|y=='0' ? 'l' : 't'}.join
cp = "ltll#{bn}u" # copying from stack
end
if STACK.size>0 && STACK[0]==STACK[1]
"lul // dup #{x.inspect}"
elsif cp && cp.size < ph.size
"#{cp} // copy <#{x.inspect}> (from #{rf})"
else
"#{ph} // push <#{x.inspect}> (#{shift(x.getbyte(0))})"
end
end
File.open('prog.h','w') do |fout|
fout << "ll#{EOS}u // push EOS" << "\n"
fout << bin.join("\n") << "\n"
fout << <<-eos
// output string
ullu // label 0
// add 97 (128) before printing
lllttlllltu // push 97
tlll // add
llltlllllllu // push 128
tltt // mod
tull // print top of stack
// break loop if EOS
lul // dup
ll#{EOS}u // push EOS
tllt // subtract
utltu // jump to 1 if top of stack is zero
uluu // jump to 0
ulltu // label 1
uuu
eos
end
For illustration, the whitespace program in human-readable form. See below for a simple script to convert it to an actual whitespace program.
lllttltlu // push EOS
llltltlltu // push <"\n"> (41)
llltllttltu // push <"."> (77)
llltltlu // push <"k"> (10)
llltttlu // push <"o"> (14)
lul // dup "o"
llltlttu // push <"l"> (11)
lllttttttu // push <" "> (63)
llllu // push <"a"> (0)
ltlltu // copy <" "> (from 1)
llltllu // push <"e"> (4)
lllttllu // push <"m"> (12)
ltlltlu // copy <" "> (from 2)
lllttlu // push <"g"> (6)
lllttltu // push <"n"> (13)
llltlllu // push <"i"> (8)
llltltltu // push <"v"> (21)
ltlltu // copy <"i"> (from 1)
lllttlu // push <"g"> (6)
ltllttlu // copy <" "> (from 6)
llltllllltu // push <"\""> (65)
llltlltlttu // push <","> (75)
lllttllttlu // push <"G"> (102)
lllttltttlu // push <"O"> (110)
lllttlllttu // push <"D"> (99)
ltlltltu // copy <" "> (from 5)
lllttttlllu // push <"Y"> (120)
lllttttlltu // push <"Z"> (121)
lllttlllllu // push <"A"> (96)
lllttltlttu // push <"L"> (107)
ltlltllu // copy <" "> (from 4)
lllttlltllu // push <"E"> (100)
lllttlltttu // push <"H"> (103)
llltttllttu // push <"T"> (115)
ltllttu // copy <" "> (from 3)
llltttllltu // push <"R"> (113)
ltlltllu // copy <"E"> (from 4)
llltttltltu // push <"V"> (117)
ltlltttlu // copy <"O"> (from 14)
ltlltllu // copy <" "> (from 4)
llltttlltlu // push <"S"> (114)
lllttlttttu // push <"P"> (111)
lllttlttllu // push <"M"> (108)
llltttltllu // push <"U"> (116)
lllttltlltu // push <"J"> (105)
ltlltltu // copy <" "> (from 5)
llltttltttu // push <"X"> (119)
ltlltlllu // copy <"O"> (from 8)
lllttlltltu // push <"F"> (101)
ltllttu // copy <" "> (from 3)
lllttlttltu // push <"N"> (109)
llltttlttlu // push <"W"> (118)
ltlltllu // copy <"O"> (from 4)
ltlltllltu // copy <"R"> (from 17)
lllttlllltu // push <"B"> (97)
ltlltltu // copy <" "> (from 5)
lllttltltlu // push <"K"> (106)
lllttllltlu // push <"C"> (98)
lllttltlllu // push <"I"> (104)
ltllttttu // copy <"U"> (from 15)
llltttllllu // push <"Q"> (112)
ltlltltu // copy <" "> (from 5)
ltllttlltu // copy <"E"> (from 25)
ltllttttlu // copy <"H"> (from 30)
ltllttttlu // copy <"T"> (from 30)
llltllllltu // push <"\""> (65)
ltlltllu // copy <" "> (from 4)
llltlttlltu // push <":"> (89)
llltlltlllu // push <")"> (72)
llltllu // push <"e"> (4)
llltlu // push <"c"> (2)
llllu // push <"a"> (0)
llltllu // push <"e"> (4)
lllttttu // push <"p"> (15)
ltlltttu // copy <" "> (from 7)
lllttlllu // push <"y"> (24)
lllttllu // push <"m"> (12)
ltlltlu // copy <" "> (from 2)
llltllltu // push <"r"> (17)
llltttlu // push <"o"> (14)
llltltu // push <"f"> (5)
ltllttu // copy <" "> (from 3)
ltllttlu // copy <"y"> (from 6)
lllttu // push <"d"> (3)
llltlttlu // push <"w"> (22)
llltttlu // push <"o"> (14)
ltlltttu // copy <"r"> (from 7)
ltlltltu // copy <" "> (from 5)
ltlltlu // copy <"o"> (from 2)
lul // dup "o"
llltllttu // push <"t"> (19)
ltllttu // copy <" "> (from 3)
ltlltu // copy <"t"> (from 1)
llltlltlu // push <"s"> (18)
llltltllu // push <"u"> (20)
llltlltu // push <"j"> (9)
llltllltttu // push <"("> (71)
ltlltltu // copy <" "> (from 5)
lllttu // push <"d"> (3)
llltllu // push <"e"> (4)
ltlltttu // copy <"t"> (from 7)
llltttlu // push <"o"> (14)
ltlltttu // copy <"u"> (from 7)
llltllllu // push <"q"> (16)
ltllttlu // copy <" "> (from 6)
lllttlllu // push <"y"> (24)
llltlttu // push <"l"> (11)
llltlltlu // push <"s"> (18)
ltlltltu // copy <"u"> (from 5)
llltttlu // push <"o"> (14)
llltlllu // push <"i"> (8)
llltltttu // push <"x"> (23)
ltlltlu // copy <"o"> (from 2)
lllttltu // push <"n"> (13)
llltu // push <"b"> (1)
ltlltlu // copy <"o"> (from 2)
ltlltlttu // copy <" "> (from 11)
llltttu // push <"h"> (7)
llltllttu // push <"t"> (19)
llltllu // push <"e"> (4)
llltu // push <"b"> (1)
llllu // push <"a"> (0)
lllttlltu // push <"z"> (25)
llltlllu // push <"i"> (8)
llltlttu // push <"l"> (11)
lllttlltllu // push <"E"> (100)
// output string
ullu // label 0
// add 97 (128) before printing
lllttlllltu // push 97
tlll // add
llltlllllllu // push 128
tltt // mod
tull // print top of stack
// break loop if EOS
lul // dup
lllttltlu // push EOS
tllt // subtract
utltu // jump to 1 if top of stack is zero
uluu // jump to 0
ulltu // label 1
uuu
This whitespace program itself is rather simple, but there are three golfing optimizations:
- use
lul
to clone the stack when there's a duplicate character - use
ltl
to clone the n-th entry of the stack if its shorter than pushing the char directly - shift down all bytes by 97 (mod 128), makes the binary numbers smaller
A simple ruby script to convert my human readable whitespace code to an actual whitespace program (read a file prog.h
and writes to a file prog.ws
):
WHITESPACE = "l"
NEWLINE = "u"
TAB = "t"
File.open('prog.ws','w') do |fout|
code = ""
fin = File.read('prog.h')
fin.each_line do |line|
line.gsub!(/\/\/.*/,'')
line.scan(/#{NEWLINE}|#{WHITESPACE}|#{TAB}/i) do |x|
code << case x.downcase
when NEWLINE.downcase
"\n"
when WHITESPACE.downcase
" "
when TAB.downcase
"\t"
else
""
end
end
end
fout << code
end
Ruby 144 Bytes * 39 Unique = 5616
puts"Elizabeth obnoxiously quoted (just too rowdy for my peace): \"#{'the quick brown fox jumps over the lazy dog,'.upcase}\" giving me a look."
Sometimes the simplest is the best.