Understanding The Math Behind Elchanan Mossel’s Dice Paradox

When you roll a die until $6$ appears, you can represent the sample space as all possible finite sequences from the set $\{1, 2, 3, 4, 5, 6\}$ ending in $6$, with probability of any sequence of length $k$ being $(1/6)^k$. The original question is asking for

$(1)$ the expected length of a sequence conditional on all throws being even.

You've correctly enumerated all sequences from $\{2,4,6\}$ that end in $6$, and calculated the sum $$\sum_{n=1}^\infty n\left( {\frac{1}{6}} \right)^n 2^{n-1} = 0.375$$ properly, but you forgot to divide this by the probability of the event you are conditioning on, which is $$ \sum_{n=1}^\infty\left(\frac16\right)^n2^{n-1}=1/4. $$ So your approach does yield the correct answer, namely $4\times 0.375=1.5$.

The act of conditioning on all throws being even is tantamount to restricting the sample space to all possible finite sequences from the set $\{2, 4, 6\}$ that end in $6$, and rescaling the probability function (by a factor $4$) so that this new sample space has total mass $1$.

As for the Jin paper, he claims that the original question $(1)$ is equivalent to

$(2)$ the expected number of times you can roll only $2$'s or $4$'s until you roll a $6$.

I disagree with $(2)$; it is incorrect to compute an unconditional expectation, as he just explained in his previous paragraph. He still needs an expectation conditional on some event, and I would argue the original question $(1)$ is equivalent to computing

$(2')$ the expected number of times you can roll only $2$'s or $4$'s until you roll any other number, given that the other number is $6$.

The reason is that conditioning on the event "the other number is $6$" results in the same restricted sample space as before. In fact his subsequent argument that it suffices to compute the unconditional expectation

$(3)$ the expected number of times you can roll only $2$'s or $4$'s until you roll any other number.

(i.e., that $(2') = (3)$, which is what he actually proves) is relevant only if we intend $(2')$ instead of $(2)$.


EDIT: Here's a Python simulation of the experiment, based on code provided by @thecoder:

import random

times = 0 #number of times a successful (all-even) sequence was rolled
rolls = 0 #total of all number of rolls it took to get a 6, on successful sequences
curr = 0
alleven = True

for x in range(0, 100000):

  num = random.randint(1,6)
  if num % 2 != 0:
    alleven = False
  else:
    if num == 6:
      if alleven:
        times += 1
        rolls += curr + 1
      curr = 0
      alleven = True
    else:
      curr += 1

print(rolls * 1.0 / times)
#1.51506456241

Part of Jimmy Jin's explanation is right, but his account of "the subtraction fallacy" is unfortunate. Conditioning does consist of discarding part of the sample space.

Suppose one throws a die repeatedly and gets this sequence: $$ 2, 5, 3, 2, 1, 4, 3, 6 $$ Next suppose one excludes the odd numbers: $$ \require{cancel} 2, \xcancel5, \xcancel3, 2, \xcancel1, 4, \xcancel3, 6 $$ One then has this sequence: $$ 2,2,4,6 $$ The average length of this sequence will be $3.$

But now suppose you get this sequence: $$ 2, 5, 3, 2, 1, 4, 3, 6 $$ Then try again and get this: $$ 3,2,5,2,1,6 $$ Then try again and get this: $$ 2,4,4,2,6 $$ Then discard the first two since one got odd numbers before getting a $6:$ $$ \xcancel{2, 5, 3, 2, 1, 4, 3, 6} $$ $$ \xcancel{3,2,5,2,1,6} $$ But keep that last one: $$ 2,4,4,2,6 $$ Among the ones you keep the average length is $3/2,$ not $3.$

You are in fact excluding that part of the sample space in which at least one odd number appears before the first $6,$ and in what's left of the sample space the average position of the first $6$ is $3/2.$ Thus Jimmy Jin is wrong about the "subtraction fallacy."

However, he is right that the average position of the first $6,$ given that only even numbers appear, is $3/2.$

In sequences pared down by excluding all the odd numbers that appear, the average position of the first $6$ is $3.$