How to handle sums in Tait's reducibility proof of strong normalisation?
The typical strategy when you're building a model to perform normalisation by evaluation (i.e. the computational part of Tait's method, see [1]) with sum types is to have as a semantics for A + B
:
- either the semantics of
A
- or the semantics of
B
- or a neutral term of type
A + B
(i.e. a head variable with a spine of eliminators)
This makes it possible for the proof to go through. I'd wager that the same strategy works for the proof of SN. In other words, your reducibility candidate $R_{A+B}(t)$ would be:
- either $t \leadsto^* \texttt{inl}(a)$ and $R_A(a)$
- or $t \leadsto^* \texttt{inr}(b)$ and $R_B(b)$
- or $t \leadsto^* \textit{ne} = x(t_1)\dots(t_n)$ and $SN(\textit{ne})$
[1] COQUAND, T., & DYBJER, P. (1997). Intuitionistic model constructions and normalization proofs. Mathematical Structures in Computer Science, 7(1), 75-94.
Let me add to the above (perfectly correct) answers that there is a more general perspective, which works for all positive connectives, of which sums are a special case.
The fundamental insight is that reducibility actually arises from a duality between terms and contexts, which is itself parametric in a set of terms closed under certain operations (such as $\beta$-expansion, but I think something trickier must be used for strong normalization), called a pole. Given a pole $P$, one defines
$$\mathsf C\perp t\quad\text{just if}\quad\mathsf C[t]\in P$$
where $\mathsf C$ is a context (a term with a hole denoted by $\bullet$) and $t$ a term. The idea is that $P$ is a set of terms which "behave well" (e.g. they have a normal form), so $\mathsf C\perp t$ means that the context $\mathsf C$ and the term $t$ "interact well" with respect to the property expressed by $P$.
Given a set $S$ of contexts, one may define a set of terms $$S^\bot:=\{t\mathrm{|}\forall\mathsf C\in S,\,\mathsf C\perp t\}.$$ A similar definition induces a set of contexts from a set of terms $T$, which we still denote by $T^\bot$.
We may then define the interpretation of a type $A$, which is a set of contexts denoted by $|A|$, by induction on $A$: $$ \begin{array}{rcl} |X| &:=& \{\bullet\} \\ |A\to B| &:=& \{\mathsf C[\bullet\,u]\mathrel{|} u\in|A|^\bot,\ \mathsf C\in|B|\} \\ |A\times B| &:=& \{\mathsf C[\pi_1\bullet]\mathrel{|}\mathsf C\in|A|\}\cup\{\mathsf C[\pi_2\bullet]\mathrel{|}\mathsf C\in|B|\}\\ |A+B| &:=& \{\mathsf C\mathrel{|}\mathsf C[\mathsf{inl}\,\bullet]\in |A|^{\bot\bot}\text{ and }\mathsf C[\mathsf{inr}\,\bullet]\in|B|^{\bot\bot}\} \end{array} $$ Finally, we let $$\|A\|:=|A|^\bot,$$ which is a set of terms. This is the "reducibility predicate" on $A$.
One then goes on to prove the so-called adequacy theorem: if $$x_1:C_1,\ldots,x_n:C_n\vdash t:A$$ is derivable, then for all $u_1,\ldots,u_n$, $u_i\in\|C_i\|$ implies $t[u_1/x_1,\ldots,u_n/x_n]\in\|A\|$. This is proved by induction on the last rule of the derivation, using the properties of the pole (closure under $\beta$-expansion or something like that).
Now, if things are setup correctly, one also has $x\in\|A\|$ as well as $\|A\|\subseteq P$ for every variable $x$ and every type $A$, which implies that every term belongs to the pole by adequacy, i.e., every term "behaves well".
I am sure that this works for weak normalization, i.e., one may take $P$ to be the set of weakly normalizing terms (which is closed under $\beta$-expansion) and immediately obtain weak normalization of the simply-typed $\lambda$-calculus with sums. For strong normalization, some kind of hack is needed, because $\beta$-expanding a strongly normalizable term does not necessarily yield a strongly normalizable term, but I'm sure that a suitable definition will make things work.
This "reducibility by duality" may be found already in Girard's Linear logic paper (1987), although I'm not sure he is the one who introduced it. It has been used by many authors since then and of course it easily extends to higher order quantification. In particular, the above presentation is adapted from Krivine, who uses it as the basis of his classical realizablity theory (see his many papers on the subject, all available on his web page).
Guillaume's answer is correct, that is you may define the candidate for $A+B$ as the smallest set containing $$ \mathrm{inl}(a)\ a\in R_A(a)$$ $$ \mathrm{inr}(b)\ b\in R_B(b)$$
and which is a candidate, i.e. closed under anti-reductions and inclusion of neutral terms. But it is possible to define an "elim" candidate as well which states that $t$ is "well behaved in elimination contexts". This would look something like:
$$R_{A+B}(t)\Longleftrightarrow \forall C\ l\ r, R_{A\rightarrow C}(l)\wedge R_{B\rightarrow C}(r)\Longrightarrow R_C(\mathrm{case}\ t\ \mathrm{of}\ \mathrm{inl} \Rightarrow l,\mathrm{inr}\Rightarrow r)$$
This is a little impredicative, since nothing forces $C$ to be simpler than $A+B$, but that's ok: impredicative definitions can be done by taking the intersection of all "candidate" relations $R_T$! This is exactly the trick for defining interpretations for system $F$.