Confused on DataKinds extension
In unextended Haskell, the declaration
data A = B
defines two new entities, one each at the computation and type level:
- At the type level, a new base type named
A
(of kind*
), and - At the computation level, a new base computation named
B
(of typeA
).
When you turn on DataKinds
, the declaration
data A = B
now defines four new entities, one at the computation level, two at the type level, and one at the kind level:
- At the kind level, a new base kind
A
, - At the type level, a new base type
'B
(of kindA
), - At the type level, a new base type
A
(of kind*
), and - At the computation level, a new base computation
B
(of typeA
).
This is a strict superset of what we had before: the old (1) is now (3), and the old (2) is now (4).
These new entities explain the following interactions you described:
:type Zero
Zero :: Nat
:kind 'Zero
'Zero :: Nat
:type 'Zero
Syntax error on 'Zero
I think it's clear how it explains the first two. The last one it explains because 'Zero
is a type-level thing -- you can't ask for the type of a type, only the type of a computation!
Now, in Haskell, every place where a name occurs, it is clear from the surrounding syntax whether that name is intended to be a computation-level name, a type-level name, or a kind-level name. For this reason, it is somewhat annoying to have to include the tick mark in 'B
at the type level -- after all, the compiler knows we're at the type level and therefore can't be referring to the unlifted computation-level B
. So for convenience, you are permitted to leave off the tick mark when that is unambiguous.
From now on, I will distinguish between the "back end" -- where there are only the four entities described above and which are always unambiguous -- and the "surface syntax", which is the stuff you type into a file and pass off to GHC that includes ambiguity but is more convenient. Using this terminology, in the surface syntax, one may write the following things, with the following meanings:
Surface syntax Level Back end
Name computation Name
Name type Name if that exists; 'Name otherwise
'Name type 'Name
Name kind Name
---- all other combinations ---- error
This explains the first interaction you had (and the only one left unexplained above):
:kind Zero
Zero :: Nat
Because :kind
must be applied to a type-level thing, the compiler knows the surface syntax name Zero
must be a type-level thing. Since there is no type-level back end name Zero
, it tries 'Zero
instead, and gets a hit.
How can it be ambiguous? Well, notice above that we defined two new entities at the type level with one declaration. For simplicity of introduction, I named the new entities on the left- and right-hand side of the equation different things. But let's see what happens if we just tweak the declaration slightly:
data A = A
We still introduce four new back end entities:
- Kind
A
, - Type
'A
(of kindA
), - Type
A
(of kind*
), and - Computation
A
(of typeA
).
Whoops! There is now both an 'A
and an A
at the type level. If you leave off the tick mark in the surface syntax, it will use (3), and not (2) -- and you can explicitly choose (2) with the surface syntax 'A
.
What's more, this doesn't have to happen all from a single declaration. One declaration may produce the ticked version and another the non-ticked version; for example
data A = B
data C = A
will introduce a type-level back end name A
from the first declaration and a type-level back end name 'A
from the second declaration.
{-# LANGUAGE DataKinds #-}
does not change what the data
keyword usually does: it still creates a type Nat
and two value constructors Zero
and Succ
. What this extension does do is, it allows you to use such types as if they were kinds, and values as if they were types.
Thus, if you use Nat
in a type-level expressions, it'll just use it as a boring Haskell98 type. Only if you use it in an unabiguously kind-level expression, do you get the kind version.
This automatic lifting could sometimes cause name clashes, which is, I think, the reason for the '
syntax:
{-# LANGUAGE DataKinds #-}
data Nat = Zero | Succ Nat
data Zero
Now, Zero
by itself is the plain (empty) data type Zero :: *
, so
*Main> :k Zero
Zero :: *
In principle, thanks to DataKinds
, Zero
is now also a type lifted from the value constructor Zero :: Nat
, but this is shadowed by the data Zero
. Thus the quoting-syntax, which always refers to lifted types, never to directly defined ones:
*Main> :k 'Zero
'Zero :: Nat