Generating PBM bitmap files from ASCII text
Perl, 164 bytes, no zlib/gzip compression
After sleeping on the problem, I managed to figure out a much shorter solution than my first one. The trick is to take advantage of a minor loophole in the rules: the characters need to fit in 8 by 8 pixels each, but nothing says they have to fill all that space. So I drew my own 4 by 5 pixel font, allowing me to pack two characters into 5 bytes.
The output looks like this:
(scaled x 4)
(original size)
Before giving the actual code with the embedded font data, let me show a de-golfed version:
y/A-Z!./\0-\033/ for @a = <> =~ /./g;
say "P4 " . 8*@a . " 8";
for $p (qw'PACKED FONT DATA') {
print chr vec $p, ord, 4 for @a;
}
In the actual code, the PACKED FONT DATA
is replaced by a binary string consisting of eight whitespace-delimited rows (four 14-byte rows and one 13-byte one, plus three single null bytes for the blank rows). I deliberately designed my font so that the packed data contains no whitespace, single quotes or backslashes, so that it could be encoded in qw'...'
.
Since the packed font string contains unprintable characters, I've provided the actual script as a hex dump. Use xxd -r
to turn it back into executable Perl code:
0000000: 792f 412d 5a21 2e2f 002d 1b2f 666f 7240 y/A-Z!./.-./for@
0000010: 613d 3c3e 3d7e 2f2e 2f67 3b73 6179 2250 a=<>=~/./g;say"P
0000020: 3420 222e 382a 4061 2e22 2038 223b 666f 4 ".8*@a." 8";fo
0000030: 7224 7028 7177 2700 20e6 e6ff 9612 8999 r$p(qw'. .......
0000040: e6e6 7759 99f5 0420 9999 8898 128a df99 ..wY... ........
0000050: 9928 5999 1504 20ef 98ee fb12 8cb9 e9e9 .(Y... .........
0000060: 2659 6965 0420 9999 8899 928a 9989 ab21 &Yie. .........!
0000070: 599f 8220 e9e6 8f96 62f9 9986 972e 2699 Y.. ....b.....&.
0000080: f284 2000 2000 2729 7b70 7269 6e74 2063 .. . .'){print c
0000090: 6872 2076 6563 2470 2c6f 7264 2c34 666f hr vec$p,ord,4fo
00000a0: 7240 617d r@a}
Here's how it works:
The first line (in the de-golfed version) reads a single line of input, splits it into an array of characters (conveniently omitting any trailing newlines) and maps the letters
A
toZ
and the characters!
and.
to the character codes 0 to 28, which normally correspond to unprintable control characters in ASCII / Unicode. (A minor side effect of this is that any tabs in the input get printed asJ
s.) The space character is left unmapped, since the output loop turns any codes above 28 into blanks anyway.The second line just prints the PBM header. It uses the Perl 5.10
say
feature, so you need to run this script withperl -M5.010
for it to work.The output loop takes a whitespace-delimited list of packed image rows and assigns each of them into
$p
in turn. (I designed the font so that the packed data wouldn't contain any whitespace or'
characters.) It then loops over the input characters in@a
, using Perl'svec
command to extract the 4-bit nibble corresponding to the mapped character code from the image row, pads it to an 8-bit byte and prints it.
Old answer, 268 bytes:
This is a quick and dirty first attempt. I stole PleaseStand's font and compressed it along with my source code. Since the resulting script is mostly unprintable, here's a hexdump; use xxd -r
to turn it into executable Perl code:
0000000: 7573 6520 436f 6d70 7265 7373 275a 6c69 use Compress'Zli
0000010: 623b 6576 616c 2075 6e63 6f6d 7072 6573 b;eval uncompres
0000020: 7320 2778 da85 d03d 4b03 4118 85d1 452c s 'x...=K.A...E,
0000030: b69c 72cb 7519 4894 552c 2c02 3319 ee5c ..r.u.H.U,,.3..\
0000040: d7b8 5a89 6093 4634 7e82 c490 6c91 8597 ..Z.`.F4~...l...
0000050: 80fe 7267 d660 23ae e52d 0e0f dcd6 f8c3 ..rg.`#..-......
0000060: e9d1 5e6e ccec a15c ddb5 c5d5 495e 94a3 ..^n...\....I^..
0000070: 83b7 c7f9 73f3 5216 f9a8 787a 5fea 666c ....s.R...xz_.fl
0000080: 9dd1 b763 dd98 76f8 2df6 0799 5811 7144 ...c..v.-...X.qD
0000090: 4acc ee9d b8b0 c90f 7e4a 8264 6016 cbd7 J.......~J.d`...
00000a0: 79f3 1b91 047c 4055 409e 9e54 1dda ed41 y....|@[email protected]
00000b0: 9a20 8080 6adc 5c47 8488 7495 f621 01d7 . ..j.\G..t..!..
00000c0: 6b6c 902e b6c8 2a6a 6643 f56f e99c 115d kl....*jfC.o...]
00000d0: 5c7a f1b2 13d0 3453 790f da74 c813 751d \z....4Sy..t..u.
00000e0: 11ce d821 ad90 247f 2292 5b54 c14f 3c4e ...!..$.".[T.O<N
00000f0: 49c5 4c53 a1a7 c478 391c 714c f113 0747 I.LS...x9.qL...G
0000100: ab6c 4482 9fd2 177a 5677 6327 .lD....zVwc'
The decompressed Perl code consists of the following preamble:
y;A-Z.! ;;cd,say"P4 ",8*length," 8"for$t=<>
followed by eight repetitions of the following code:
;$_=$t;y(A-Z.! )'BITMAP DATA HERE';print
with BITMAP DATA HERE
replaced with 29 bytes encoding one row of the font.
GolfScript, 133 bytes
This is based on my 164-byte Perl solution and uses the same nibble-packed 4 by 5 pixel font. Again, I'll give the readable version first:
{91,65>"!. "+?}%:s"P4"\,8*8'FONT DATA HERE'{16base}%[3]/{:p;{[p=0]0=}s%}%]n*
Here, FONT DATA HERE
stands for 71 bytes of binary packed font data. The encoding is slightly different than in the Perl version: instead of splitting the packed string on whitespace, I expand it first and then split it on the nibble 3
(chosen because it just happens not to occur anywhere in the font).
Since the font data in the actual script contains unprintable characters, I give it as a hex dump below. Use xxd -r
to turn the hex dump back into executable GolfScript code:
0000000: 7b39 312c 3635 3e22 212e 2022 2b3f 7d25 {91,65>"!. "+?}%
0000010: 3a73 2250 3422 5c2c 382a 3827 36e6 eff6 :s"P4"\,8*8'6...
0000020: 9219 8996 e6e7 7959 95f4 3999 9888 921a ......yY..9.....
0000030: 8fd9 9998 2959 9514 3fe8 9eeb f21c 89b9 ....)Y..?.......
0000040: e9e6 2959 6564 3999 9889 929a 8999 8ba1 ..)Yed9.........
0000050: 295f 9283 9e6e f869 269f 9968 79e2 6299 )_...n.i&..hy.b.
0000060: 2f48 3327 7b31 3662 6173 657d 255b 335d /H3'{16base}%[3]
0000070: 2f7b 3a70 3b7b 5b70 3d30 5d30 3d7d 7325 /{:p;{[p=0]0=}s%
0000080: 7d25 5d6e 2a }%]n*
Unlike the Perl script, this code prints any characters outside the set A
–Z
, !
, .
, space
as funny-looking little squiggles. Replacing the squiggles with blanks would cost 2 extra chars; removing them entirely would cost 4.
This is my first GolfScript program ever, so I wouldn't be surprised if there's some room left for optimization. Here's how it works:
{91,65>"!. "+?}%:s
maps the valid input characters (A
–Z
,!
,.
,space
) to the numbers 0 – 28 and assigns the result tos
. Any characters outside the valid set get mapped to -1, which is what produces the squiggles when printed."P4"\,8*8
pushes the values "P4", 8 times the length of the input, and 8 onto the stack. When printed at the end, these will form the PBM header.{16base}%[3]/
takes the preceding string of font data, splits each byte of it into two nibbles, and splits the result into blocks delimited by the value3
.{:p;{[p=0]0=}s%}%
then loops over these blocks, first assigning each block to the variablep
and then looping over the remapped input strings
, replacing each character with the value at the corresponding offset inp
. The funny-looking construct[p=0]0=
does the same asp=
, except that it returns 0 for any offsets past the end ofp
; I don't really like it, but I haven't been able to figure out any shorter way to handle that.Finally,
]n*
takes everything on the stack (the three header values and the image data array) and joins them together with newlines for printing.
Shell script (code+data = 295 chars)
I hope tail, gzip, and dd do not count as "external libraries." Run as echo -n 'YOUR TEXT HERE' | ./text.sh > out.pbm
. The font I used is Small Fonts size 7.5, although I did have to clip the descender off the Q.
Example output
Code (137 chars)
i=`od -tu1|cut -c9-`
echo P4
for a in {0..7}
do for b in $i
do tail -2 $0|zcat|dd bs=1 count=1 skip=$((8*b+a))
done
done>8
wc -c 8
cat 8
Complete script
(use xxd -r
to recreate original file)
0000000: 693d 606f 6420 2d74 7531 7c63 7574 202d i=`od -tu1|cut -
0000010: 6339 2d60 0a65 6368 6f20 5034 0a66 6f72 c9-`.echo P4.for
0000020: 2061 2069 6e20 7b30 2e2e 377d 0a64 6f20 a in {0..7}.do
0000030: 666f 7220 6220 696e 2024 690a 646f 2074 for b in $i.do t
0000040: 6169 6c20 2d32 2024 307c 7a63 6174 7c64 ail -2 $0|zcat|d
0000050: 6420 6273 3d31 2063 6f75 6e74 3d31 2073 d bs=1 count=1 s
0000060: 6b69 703d 2428 2838 2a62 2b61 2929 0a64 kip=$((8*b+a)).d
0000070: 6f6e 650a 646f 6e65 3e38 0a77 6320 2d63 one.done>8.wc -c
0000080: 2038 0a63 6174 2038 0a1f 8b08 0000 0000 8.cat 8........
0000090: 0000 ffed cdb1 0a83 3014 8561 910e 8e8e ........0..a....
00000a0: 193b dca1 631f 2084 9353 6ba3 a3e0 e2a8 .;..c. ..Sk.....
00000b0: 2fe0 d8e1 22d8 276f 9a50 e813 940e fdb8 /...".'o.P......
00000c0: 70f9 a753 247f 7829 f0b5 b9e2 c718 2322 p..S$.x)......#"
00000d0: 1ba9 e9a8 9688 6895 892a 7007 f0fe 701e ......h..*p...p.
00000e0: b879 ef48 6e8c aa4f 219c d984 750d 0d91 .y.Hn..O!...u...
00000f0: e9b2 8c63 d779 3fcf c3d0 f76d eb7c e2d2 ...c.y?....m.|..
0000100: 1880 d4d7 4b6e 9296 b065 49ab 75c6 cc92 ....Kn...eI.u...
0000110: 1411 63f6 7de7 3489 9031 847c 3c9a 531d ..c.}.4..1.|<.S.
0000120: e9a1 aa8f 803e 01 .....>.
Explanation
od
is the standard "octal dump" utility program. The-tu1
option tells it to produce a decimal dump of individual bytes instead (a sufficient workaround for bash's lack of asc(), ord(), .charCodeAt(), etc.)P4
is the magic number for a binary format PBM file, which packs eight pixels into each byte (versusP1
for the ASCII format PBM file). You will see how this proves useful.- Per row of final output, the program pulls an eight-pixel byte (corresponding to the ASCII code and line number) from the gzip-compressed data section at the end using
dd
. (tail -2 $0
extracts the script's last two lines; the compressed data includes one 0x0a linefeed byte.) It so happens that eight pixels is the width of a single character. The null bytes that fill the gaps between supported characters are easily compressible because they are all the same. - All this is written to a file named "8". Because there are exactly eight rows (and also eight pixels per byte), the number of bytes is the width of the output in pixels. The output's height is also included in that
wc -c
prints the input filename "8" after its byte count. - Now that the header is complete, the image data is printed. Bash only notices that the last two lines are not valid commands (the last one actually invalid UTF-8) after it has executed everything coming before.
- I used KZIP only to compress the data section, as Ilmari Karonen did for an entire submission to the 12 Days of Christmas challenge. As described there, it is essentially necessary to use a hex editor to replace the ZIP header format with a gzip header. Including the CRC-32 and file size from the original ZIP header seems to be unnecessary.