Quickly find length of n-th term of the look-and-say sequence

GolfScript, 300 chars

~8-:m;'!5?HZi " A 9 "$??HLTZ^`eik 5Z`e <^ $k H ,Sds ?LT /5@MUav I " i ? "5i G  !#+.4;>@K]_cdhjy{| *HCRJD@:64122/42.,,*860-40-0.+9?B<862//?D<7=@:420-/5:60.-20.+./384/162.-HCRJD@:64122/42.,,*)! !""$&&('{32-}/]1,/~:&;:L;:T;]:S;0m>{&m=}{L,,{.49=\71=+}%:V{,(,{).V=\T?S={(V=+}/}%0+:V}m*,,0\{.V=\L=8-*+}/}if

Here's a slightly shorter entry. Reads a number m from stdin, writes the length of the m-th term in the look-and-say sequence to stdout.

For example, input 50 produces output 894810, while input 1000 produces (after a few seconds):

21465050086246039983937316427688400486337379416867195943208935654503780844828013651860410256502890125611856727316762

I'll post a more detailed explanation of the code later, but basically this also works by repeatedly multiplying a vector with a sparse matrix, encoded as an ASCII string. Some clever compression tricks, such as leaving out the all-ones superdiagonal of the matrix, are employed.

Ps. Here's a link to an online demo. The server has been pretty slow lately, though, so if you see an error message about a timeout, try again (or just download the interpreter and run it yourself).


Addendum: Here's the promised detailed explanation. First, let me repost the code with whitespace and comments added:

# Parse input, subtract 8, save in variable m:
~ 8- :m;

# This long string encodes all the constants used by the program (I'll describe it below):
'!5?HZi " A 9 "$??HLTZ^`eik 5Z`e <^ $k H ,Sds ?LT /5@MUav I " i ? "5i G  !#+.4;>@K]_cdhjy{| *HCRJD@:64122/42.,,*860-40-0.+9?B<862//?D<7=@:420-/5:60.-20.+./384/162.-HCRJD@:64122/42.,,*)! !""$&&('

# Parse the data string:
# The data consists of several arrays of positive integers, encoded as printable ASCII
# characters and separated by spaces.  (Note that this means that the range of possible
# data values starts from 1, which requires some adjustment below.  Also, the values in
# L are actually shifted up by 8 to avoid having single quotes in the data string.)

{32-}/   # Subtract 32 (ASCII code of space) from each char and dump result on stack.
]        # Collect numbers from stack into an array.
1,/ ~    # Split array on zeros (formerly spaces) and dump pieces on stack.
:&;      # Save topmost piece (canned responses for inputs 1-7) in var &.
:L;      # Save second piece (element lengths) in var L.
:T;      # Save third piece (nonstandard decay targets) in var T.
]:S;     # Collect remaining pieces (nonstd decay sources) in array and save it in S.

# If m < 0, return a precomputed response from &, else run the indented code below:
0 m > { & m = } {
  # Set V to be the initial element abundance vector (1 Hf + 1 Sn):
  # (Note: both L and V have an extra sentinel 0 at the end to avoid overruns below.) 
  L,, { . 49= \ 71= + } % :V
  # Update the element abundances m times:
  {
    # Run the following block for all numbers i from 0 to length(V)-2,
    # collecting the results in an array:
    ,(, {
      # Start with the abundance of element i+1 in V (all elements except H decay
      # downwards; the extra 0 at the end of V is needed to avoid a crash here):
      ).V=\
      # If this element is one of the decay targets listed in T, find the corresponding
      # list of source elements in S and add their abundances from V to the total:
      # (If this element is not in T, we just loop over an empty array at the end of S.)
      T? S= { ( V= + } /
    } %
    # Append a new sentinel 0 to the end of the array and save it as the new V:
    0+:V
  } m *
  # Multiply the element abundances from V with their lengths from L and sum them:
  ,,0\{.V=\L=8-*+}/
} if

The key fact that allows this code to work is that, as the look-and-say transformation is repeatedly applied to any string, it eventually splits into substrings which evolve independently, in the sense that we can obtain all the following elements of the look-and-say sequence starting from that string by taking each of the substrings, applying the look-and-say transformation to each of them independently n times and concatenating the results.

Furthermore, in his 1986 article "The Weird and Wonderful Chemistry of Audioactive Decay" (which basically introduced the look-and-say sequence to the world), John Conway proved a remarkable (and, for this challenge, crucial) result which he called the "cosmological theorem":

Cosmological theorem (Conway): There is a set of 92 finite strings of numbers 1–3 (which Conway named after the chemical elements; see table here), and two infinite families of strings each consisting of a fixed prefix followed by some number higher than 3 (these can only appear if the initial string contained number higher than 3 or sequences of more than 3 identical numbers), such that, after at most 24 applications of the look-and-say transformation, any string splits into a chain of such elements, each of which thereafter evolves independently (and splits into further elements).

Thus, to determine the length of a string sufficiently far in the look-and-say sequence, it's sufficient to know how many times each element appears in it. And, since each element evolves independently, knowing these "elemental abundances" is also sufficient for calculating the corresponding abundances of the next string in the sequence, and so on.

As it happens, the look-and-say sequence starting from the string 1 splits into two Conway elements (Hf72 = 11132 and Sn50 = 13211) after just seven iterations. Thus, thereafter, we only need to keep track of the abundances of the 92 "common" elements (as the two families of "transuranic" elements cannot arise from common elements).

Mathematically, we can treat the elemental abundances as a vector of length 92. Calculating the abundances in the next step of the look-and-say sequence is then just equivalent to multiplying this vector with a 92 × 92 matrix, which is what r.e.s.'s solution does literally.

In fact, r.e.s.'s solution actually tracks the abundance of each element multiplied with its length (i.e. the "total mass" of that element in the string), adjusting the matrix accordingly, so that the total length of the string is given simply by adding the numbers in the abundance vector together. However, I did not find that a particularly convenient representation for code golf. (It was convenient for Nathaniel Johnston's derivation of Conway's polynomial, since it encodes the lengths into the matrix and thus gives the right eigenvalue for the asymptotic growth rate, but we don't care about that here.)

Instead, I just track the raw abundances of the elements, and only multiply them with the lengths at the end. This has, combined with the convenient fact that none of the decay products of any element contain more than one copy of any element, means that the entries of the transition matrix are just zeros and ones. Also, by far most of them are, in fact, zeros (in math jargon, the matrix is sparse), so I can save a lot of space by only storing the indices of those entries which are ones.

Also, in his article, Conway ordered the elements such that (except for H1 = 22, which is a fixed point of the look-and-say transformation), each element decays to the next lower element in the sequence (and possibly some others). In fact, most elements decay only to the next lower elements. Thus, by using Conway's ordering of the elements, I can hardcode this "standard decay" as part of the iteration, so that I only need to store the matrix entries corresponding to the other, "nonstandard" decays, saving even more space.

(In fact, what I actually do, for reasons mostly to do with how GolfScript's array operations work, is store one array listing the matrix rows containing non-zero entries (i.e. the elements which are produced in nonstandard decays) and another array of arrays listing, for each of these rows, the columns where the non-zero entries occur (i.e. the elements which produce these elements in such decays). See the code and comments above for how these arrays are actually used to calculate the updated element abundances.)

The final space-saving trick I used was to encode all these arrays as strings of ASCII characters. Conveniently, there are 95 printable ASCII characters from space (32) to ~ (126), and the largest value that appears in any of the arrays is 92, so they fit very well. One extra tweak I made was to shift the values in the element length array (which only go up to 42 anyway) up by 8 so that I could avoid the character ' (39) and thus store my data in a single quoted string without having to add any backslash escape characters.

There are some other minor optimizations too, but those are just standard GolfScript coding tricks. I should note that the actual code is not all that carefully optimized — it's quite possible that it could be golfed down further.

In any case, despite the inherent slowness of a double-interpreted language like GolfScript, the code runs quite fast and scales well: computing the length of the 10,000th string in the look-and-say sequence takes about 1.5 minutes on my computer.

Conveniently, GolfScript stores all numbers as arbitrary-length integers (bignums), so I didn't have to do anything special to make it produce exact results even for large inputs. The expected time complexity of this program is O(n log n) (as the element abundances grow linearly with n, and bignum addition takes O(log n) time), and the actual runtime seems to fit that prediction pretty well, as the graph below shows:

Graph of execution time divided by input value, showing logarithmic growth.

(Time is in seconds; for what it's worth, the green curve is given by y = (a x log(b + x) + c) / x, where a = 0.00106462026133545, b = 2720.25843721031 and c = -0.0401630789530016. The spike on the left mainly comes from the roughly constant runtimes for n < 8, which become relatively large when divided by small n. I did not include it in the curve fit.)


Sage (18664 2021 1977)

This is a version that uses Nathaniel Johhnston's matrix formulation. (The matrix could probably be golfed further using eval.)

def f(n):
 z=[0];a=z*10;b=z*15;c=z*18;d=z*25;e=z*31;g=z*68;T=Matrix([z*28+[2]+z*63,e+[7/23]+z*60,z*29+[4/3]+z*62,z*30+[4/3]+z*61,z*32+[2]+z*59,z*53+[1/2]+z*38,z*36+[3/2]+z*55,z*37+[2]+z*54,z*38+[8/5]+z*53,z*39+[5/3]+z*52,z*43+[5/3]+z*48,z*45+[7/4]+z*46,z*46+[12/7]+z*45,z*47+[7/4]+z*44,z*48+[3/2]+z*43,z*50+[21/17]+z*41,z*51+[21/17]+z*40,z*49+[13/10]+z*42,z*44+[7/5]+z*47,z*52+[7/5]+z*39,z*40+[7/5]+z*51,z*41+[4/3]+z*50,z*42+[4/3]+z*49,z*34+[5/32,5/32]+z*56,z*56+[7/11,7/13,1/3,7/17]+z*32,z*54+[10/7]+z*37,z*55+[10/7]+z*36,z*33+[4/3]+z*58,b+[1/21,1/21,0,1/7]+z*12+[2/23,0,0,1/16,1/16]+z*17+[1/5,0,0,2/11,2/13,2/21,4/17]+z*21+[1/8]+z*3+[1/3]+z*6,z*17+[9/26]+g+[9/10]+z*5,z*87+[9/10]+z*4,z*19+[23/28]+z*72,z*34+[1/16,1/16]+d+[2]+z*19+[1/8]+a,z*88+[2]+z*3,z*90+[32/27,0],z*89+[32/27,0,0],z*91+[8/5],z*66+[3/7]+z*7+[1/3]+a+[1/2,3/10,3/10]+z*4,z*66+[5/7]+d,z*62+[3/2]+z*29,z*63+[10/7]+z*28,z*64+[9/7]+z*27,z*65+[9/7]+z*26,z*67+[3/2]+z*24,z*82+[5/3]+z*9,z*83+[8/5]+z*8,z*74+[7/9,7/12,7/12,7/16,7/18,7/24,7/23,7/16]+a,g+[4/3]+z*23,z*70+[6/5]+z*21,z*71+[5/4]+z*20,z*72+[17/14]+z*19,z*73+[17/14]+c,z*84+[4/3]+z*7,z*69+[5/4]+z*22,z*6+[7/12]+g+[7/12]+z*16,z*76+[7/12]+b,z*77+[11/16]+z*14,z*78+[13/18]+z*13,z*79+[7/8]+z*12,z*80+[17/23]+z*11,e+[2/23,0,0,1/16,1/16]+z*17+[1/5]+z*5+[2/17,1]+z*20+[1/8]+a,[0,1/7]+z*90,[1]+z*91,[0,1]+z*90,[0,0,7/6]+z*89,z*3+[7/6]+z*88,z*57+[7/13]+z*34,z*4+[1]+z*54+[4/17]+z*32,z*5+[6/5]+z*86,z*7+[4/3]+z*84,z*8+[5/4]+z*83,z*20+[8/7]+z*71,z*21+[7/6]+z*70,z*22+[7/6]+z*69,c+[9/14,9/28]+z*72,z*9+[6/5]+z*82,a+[6/5]+z*81,z*12+[4/3]+z*79,z*13+[9/7]+z*78,z*14+[4/3]+z*77,b+[23/42,23/42,23/26]+z*74,z*11+[8/7]+z*80,z*23+[6/5]+g,z*6+[5/12]+z*85,e+[15/23]+z*26+[5/7]+z*33,z*24+[6/7]+z*67,d+[1]+z*66,z*26+[1]+z*65,z*27+[3/8]+e+[3/17]+d+[1/2]+z*6,z*16+[9/14]+c+[27/32]+z*56,b+[9/14]+c+[27/32]+z*57,c+[5/14]+z*8+[5/8]+d+[1/2,0,0,5/11]+z*24+[5/16]+a]);v=vector(z*23+[5]+z*14+[5]+z*53);L=[1,2,2,4,6,6,8,10]
 if n<9:return L[n-1]
 Q=T
 for i in[1..n-9]:Q=Q*T
 return sum(Q*v)

E.g., [f(n) for n in [1..50]] produces

[1, 2, 2, 4, 6, 6, 8, 10, 14, 20, 26, 34, 46, 62, 78, 102, 134, 176, 226, 302, 408, 528, 678, 904, 1182, 1540, 2012, 2606, 3410, 4462, 5808, 7586, 9898, 12884, 16774, 21890, 28528, 37158, 48410, 63138, 82350, 107312, 139984, 182376, 237746, 310036, 403966, 526646, 686646, 894810]