Output "Lorem ipsum" with minimal number of characters

JavaScript (ES7), 326 283 273 249 243 242 chars

_=>""[r='replace'](/./gu,c=>(c.codePointAt()-4**8).toString(32))[r](/\d/g,d=>"  , exum. ".substr(d,2))[r](/^.|\. ./g,x=>x.toUpperCase())

How it works

The first step in my compression technique is to convert the whole string to lowercase (not mandatory, but looks better), and replace each pair of chars in , exum.  (as well as the trailing space by itself) with its index in the string plus 2. This makes the text a valid base-32 number:

lorem9ips69dolor9sit9amet2consectetur9adipiscing3lit2sed9do3iusmod9tempor9incididunt9ut9labore3t9dolore9magna9aliqua8ut3nim9ad9minim9veniam2quis9nostrud94ercitation9ullamco9laboris9nisi9ut9aliquip943a9commodo9consequat8duis9aute9irure9dolor9in9reprehenderit9in9voluptate9velit3sse9cill69dolore3u9fugiat9nulla9pariatur84cepteur9sint9occaecat9cupidatat9non9proident2sunt9in9culpa9qui9officia9deserunt9mollit9anim9id3st9laboru7

The next step is to convert each 4-char run to decimal, then get the character at that code point. This can be done with the following function:

f=s=>s.replace(/..../g,x=>(n=parseInt(x,32),String.fromCharCode(0xD800+(n>>10),0xDC00+(n&0x03FF))))

(Note: Since all digits are 2 or greater, the minimum possible value of four digits is 2222₃₂. This is equal to 95978₁₀, or 176EA₁₆; therefore, code points will never be in the restricted range.)

And now we have our compressed string:


That's 445 chars compressed into 106 chars. The decompression simply reverses this process:

  1. Convert each char to its code-point in base-32, minus 65536.
  2. Replace each digit n with " , exum. ".substr(n,2).
  3. Convert each letter after a period or at the beginning of the string to uppercase.

The only ES7 feature used is **. Replace 4**8 with 65536 to run in a browser that doesn't yet support ES7.


Javascript (ES6), 261 255 254 characters

Saved 1 byte, thanks to ETHproductions

_=>'L'+"⫒㠰拳␰䨒堵̎⨦W䙨ⅶ嵷˘㥆姳䗨⠬巯堡Ŋɩ懪䨶尩个˒≎㥎䜩怷㰷䤆ŵ̊㹩⫒ᨠᩌ㳠抮f̅㩊ᠰ䀩㩎搰㩊ئ抠ˮ婱拗⠩啺巨㬆ɒ㸘∦㰲䤆姵㩀Ƕ̘㨆㬴⠳⠺…䈲䥒䤠⫱᬴w㬣ᠶ⬘嗠⫘䥀噯䗠⫀⫓䕭啩̎Ɏ㹹庘⬆⭀巯奠Ŷ㷨䌯䥀噯⠪ⰸ㦸̆㼱ï哳峮૘梠䵨慷堵幎≠⣨峨愠◳ᬆ䐷ɒ䫓⥎ܑ拠̑Ɏ㼨ó㬴⹠⇫î奩拊̑㹰巯䓠ȮŎ廪ᨀ噧ਸ".replace(/./g,c=>(s=" ,.DEUabcdefghilmnopqrstuvx")[(c=c.charCodeAt()-32)&31]+s[c>>5&31]+s[c>>10])

Breakdown

Payload: 148 Unicode characters
Code: 107 bytes

How it works

We first remove the leading 'L' from the original message so that we're left with 444 = 148 * 3 characters.

Without the leading 'L', the character set is made of the 27 following characters:

" ,.DEUabcdefghilmnopqrstuvx"

Each group of 3 characters is encoded as:

n = 32 + a + b * 32 + c * 32^2

where a, b and c are the indices of the characters in the above character set.

This leads to a Unicode code point in the range U+0020 to U+801F, ending somewhere in the "CJK Unified Ideographs".

let f =
_=>'L'+"⫒㠰拳␰䨒堵̎⨦W䙨ⅶ嵷˘㥆姳䗨⠬巯堡Ŋɩ懪䨶尩个˒≎㥎䜩怷㰷䤆ŵ̊㹩⫒ᨠᩌ㳠抮f̅㩊ᠰ䀩㩎搰㩊ئ抠ˮ婱拗⠩啺巨㬆ɒ㸘∦㰲䤆姵㩀Ƕ̘㨆㬴⠳⠺…䈲䥒䤠⫱᬴w㬣ᠶ⬘嗠⫘䥀噯䗠⫀⫓䕭啩̎Ɏ㹹庘⬆⭀巯奠Ŷ㷨䌯䥀噯⠪ⰸ㦸̆㼱ï哳峮૘梠䵨慷堵幎≠⣨峨愠◳ᬆ䐷ɒ䫓⥎ܑ拠̑Ɏ㼨ó㬴⹠⇫î奩拊̑㹰巯䓠ȮŎ廪ᨀ噧ਸ".replace(/./g,c=>(s=" ,.DEUabcdefghilmnopqrstuvx")[(c=c.charCodeAt()-32)&31]+s[c>>5&31]+s[c>>10])


console.log(f())


bash + coreutils + gzip + recode, 191 characters

echo -ne "ᾋࠀ㰟퍗\03㖐셱䌱ࡄ戋⪒宮⦀⃬〣ख़ʏ쬏湂삲מּ浊莎ᔍ얪䴬畐Ꮏ肭⽡តप㩴뇶ᮤ樶鞔岀梬昅⹭盖ꈥ먣Ვ빓ỢꞴꃑ괓꣪㷨삗䎺뛔䛓ﵸ摉篨䊷૤⦓헉픺ꉖ橬ꟲỒꗻ퉋則ใ⢍럴摧耼⒅୴䘺㦳櫇鐱窑駁愵䚞鎴鍉Ⅻक़毽➔脂ힸ⤹喝葁㎋頇㺞ⳃ┶왤惌⒜猜䌋吏젔掚ᛩ鯢⚕䜹鴛皽⨫ꇈ銹믍䄛逦軵융杻龇븁\0"|recode u8..utf16be|tr -d ٣ܣ|gunzip

The string is the gzip of the text interpreted as UTF-16BE, plus a few extra bytes to pair with the unpaired surrogate halves. The tr strips off the extra surrogate halves.

This script file (or the shell into which this command is typed) should interpret text as UTF-8, which is why the recode is needed.