Concatenative counting

Perl 5, 50,091 151 snippets

First snippet:

use utf8; print length A

2nd through 26th snippets: B through Z

27th through 46nd snippets: a through z, excluding the characters in "length"

47th through 56th snippets: 0 through 9

57th snippet: _

The remaining snippets are the 50,105 individual Unicode characters which Perl regards as "word" characters, excluding the 14 distinct word characters in the initial snippet, in any order.

Well, it was a nice thought, but it turns out that after a certain length Perl gives you an "identifier too long" error. This is the longest combined program I was able to get Perl to digest:

use utf8; print length A012345679BCDEFGHIJKLMNOPQRSTUVWXYZ_abcdjkmoqsvwxyzĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţ

The perldiag manual page says "Future versions of Perl are likely to eliminate these arbitrary limitations" but my Perl 5.18 has not done so.

Explanation:

In non-strict mode, Perl 5 interprets unquoted strings of word characters as "barewords," essentially quoting them for you automatically. They're usually best avoided, but they sure help here!


JavaScript (ES6, V8 6.x), 52 50298 119526 119638 119683 128781 snippets, 88 149147 575179 575631 576121 612789 bytes

Farther below is a Stack Snippet that generates the full program, evaluates it, and creates a download link for the file. That snippet will continue to generate better answers as later versions of Unicode are supported by newer versions of JavaScript, which add new valid identifiers to the language.

Using ASCII only

console.log(new Proxy({},{get:(n,{length:e})=>e>>(e/e)}).nn$$00112233445566778899AABBCCDDEEFFGGHHIIJJKKLLMMNNOOQQRRSSTTUUVVWWXXYYZZ__aabbccddffiijjkkmmppqqssuuvvzz)

Explanation

This uses the metaprogramming technique of Proxy to enable a get handler trap on the object and access the property name as a string, returning the identifier's length / 2 as its value.

With the first snippet starting as new Proxy({},{get:(n,{length:e})=>e>>(e/e)}).nn, each additional snippet added increments the string length of the identifier by 2 by making sure to .repeat() the respective code point twice for 2 byte utf-16 characters, and once for 4 byte utf-16 characters.

Identifiers in JavaScript

In the ECMAScript Specification, an IdentifierName is defined with the following grammar:

IdentifierName::
  IdentifierStart
  IdentifierName IdentifierPart

IdentifierStart::
  UnicodeIDStart
  $
  _
  \UnicodeEscapeSequence

IdentifierPart::
  UnicodeIDContinue
  $
  _
  \UnicodeEscapeSequence
  <ZWNJ>
  <ZWJ>

UnicodeIDStart::
  any Unicode code point with the Unicode property “ID_Start”

UnicodeIDContinue::
  any Unicode code point with the Unicode property “ID_Continue”

Generating the answer

Initially using the "ID_Continue" Unicode property, I wrote a Node.js script that generates the full answer. Now it's just a client-side script that uses a naive eval() to test for valid characters, iterating through all the unicode code points instead:

// first snippet
let answer = 'new Proxy({},{get:(n,{length:e})=>e>>(e/e)}).nn'

const used = Array.from(
  answer,
  c => c.codePointAt(0)
).sort(
  (a, b) => a - b
)

// create a O(1) lookup table for used characters in first snippet
const usedSet = Array.from(
  { length: Math.max(...used) + 1 }
)

for (const codePoint of used) {
  usedSet[codePoint] = true
}

// equal to 1 for first snippet
let snippets = eval(answer)
let identifier = ''

for (let codePoint = 0, length = 0x110000; codePoint < length; codePoint++) {
  const character = String.fromCodePoint(codePoint)

  // if unused
  if (usedSet[codePoint] === undefined) {
    // if valid `IdentifierPart`
    try {
      eval(`{let _${character}$}`)
    } catch (error) {
      // console.log(character)
      continue
    }

    // repeat so that `snippet.length === 2`
    identifier += character.repeat(2 / character.length)
    snippets++
  }
}

// number of snippets generated
console.log(`snippets: ${snippets}`)

const program = `console.log(${answer + identifier})`

// output of program to validate with
eval(program)

// download link to check number of bytes used
dl.href = URL.createObjectURL(new Blob([program], { type: 'text/javascript' }))
<a id=dl download=answer.js>Click to Download</a>

Running stat -f%z answer.js yields a byte count of 612802, but we subtract 13 bytes for the console.log( and ) wrapping the actual submission.

Encoding

The source is stored as utf-8, which is reflected in the enormous byte count of the answer. This is done because Node.js can only run source files encoded in utf-8.

JavaScript internally stores strings with utf-16 encoding, so the string "character length" returned in JavaScript is actually just half the number of bytes of the string encoded in utf-16.


Python 2, score 32

for r in range(32):locals()[''.join(map(chr,range(65,66+r)[:26]+range(117,92+r)))]=r+1
print A

With subsequent snippets B, C, D, … Y, Z, u, v, w, x, y, z.

In a twist of dramatic irony, Python 3 supports Unicode identifiers, which would let us get very silly with this trick — but it can’t print without parentheses. I could cram digits into the identifier, too, but I don’t think this approach is very fun to squeeze more out of.

Try it online!

Python 2, score 18, less cheat-y

print 0x10-1&0x1
print 0x10-1&0x12
print 0x10-1&0x123
print 0x10-1&0x1234
print 0x10-1&0x12345
print 0x10-1&0x123456
print 0x10-1&0x1234567
print 0x10-1&0x12345678
print 0x10-1&0x123456789
print 0x10-1&0x123456789A
print 0x10-1&0x123456789Ab
print 0x10-1&0x123456789Abc
print 0x10-1&0x123456789Abcd
print 0x10-1&0x123456789AbcdE
print 0x10-1&0x123456789AbcdEf
print 0x10-1&0x123456789AbcdEf^((()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==()))
print 0x10-1&0x123456789AbcdEf^((()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==()))|[[[]]>[]][[]>[]]
print 0x10-1&0x123456789AbcdEf^((()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==())+(()==()))|[[[]]>[]][[]>[]]<<False**False

Try it online!