Haskell: Why does pattern matching work for types without being instances of equality?
I just want to point out that data types and pattern matching (to a first approximation) are merely useful but redundant language features, that can be implemented purely using lambda calculus. If you understand how to implement them in lambda calculus, you can understand why they don't need Eq
to implement pattern matching.
Implementing data types in lambda calculus is known as "Church-encoding" them (after Alonzo Church, who demonstrated how to do this). Church-encoded functions are also known as "Continuation-passing style".
It's called "continuation-passing style" because instead of providing the value, you provide a function that will process the value. So for example, instead of giving a user an Int
, I could instead give them a value of the following type:
type IndirectInt = forall x . (Int -> x) -> x
The above type is "isomorphic" to an Int
. "Isomorphic" is just a fancy way of saying that we can convert any IndirectInt
into an Int
:
fw :: IndirectInt -> Int
fw indirect = indirect id
... and we can convert any Int
into an IndirectInt
:
bw :: Int -> IndirectInt
bw int = \f -> f int
... such that:
fw . bw = id -- Exercise: Prove this
bw . fw = id -- Exercise: Prove this
Using continuation-passing style, we can convert any data type into a lambda-calculus term. Let's start with a simple type like:
data Either a b = Left a | Right b
In continuation-passing style, this would become:
type IndirectEither a b = forall x . (Either a b -> x) -> x
But Alonzo Church was smart and noticed that for any type with multiple constructors, we can just provide a separate function for each constructor. So in the case of the above type, instead of providing a function of type (Either a b -> x)
, we can instead provide two separate functions, one for the a
, and one for the b
, and that would be just as good:
type IndirectEither a b = forall x . (a -> x) -> (b -> x) -> x
-- Exercise: Prove that this definition is isomorphic to the previous one
What about a type like Bool
where the constructors have no arguments? Well, Bool
is isomorphic to Either () ()
(Exercise: Prove this), so we can just encode it as:
type IndirectBool = forall x . (() -> x) -> (() -> x) -> x
And () -> x
is just isomorphic to x
(Exercise: Prove this), so we can further rewrite it as:
type IndirectBool = forall x . x -> x -> x
There are only two functions that can have the above type:
true :: a -> a -> a
true a _ = a
false :: a -> a -> a
false _ a = a
Because of the isomorphism, we can guarantee that all Church encodings will have as many implementations as there were possible values of the original data type. So it's no coincidence that there are exactly two functions that inhabit IndirectBool
, just like there are exactly two constructors for Bool
.
But how do we pattern-match on our IndirectBool
? For example, with an ordinary Bool
, we could just write:
expression1 :: a
expression2 :: a
case someBool of
True -> expression1
False -> expression2
Well, with our IndirectBool
it already comes with the tools to deconstruct itself. We would just write:
myIndirectBool expression1 expression2
Notice that if myIndirectBool
is true
, it will pick the first expression, and if it is false
it will pick the second expression, just as if we had somehow pattern-matched on its value.
Let's try to do the same thing with an IndirectEither
. Using an ordinary Either
, we'd write:
f :: a -> c
g :: b -> c
case someEither of
Left a -> f a
Right b -> g b
With the IndirectEither
, we'd just write:
someIndirectEither f g
In short, when we write types in continuation-passing style, the continuations are like the case statements of a case construct, so all we are doing is passing each different case statement as arguments to the function.
This is the reason you don't need any sense of Eq
to pattern-match on a type. In lambda calculus, the type decides what it is "equal" to, simply by defining which argument it selects out of the ones provided to it.
So if I'm a true
, I prove I am "equal" to true
by always selecting my first argument. If I'm a false
, I prove I am "equal" to false
by always selecting my second argument. In short, constructor "equality" boils down to "positional equality", which is always defined in a lambda calculus, and if we can distinguish one parameter as the "first" and another as the "second", that's all we need to have the ability to "compare" constructors.
First of all, match
and match3
are actually exactly the same (ignoring the different strings): pattern matching in functions is desugared to a case statement.
Next, pattern matching works on constructors rather than arbitrary values. Pattern matching is built into the language and does not depend on boolean expressions--it just acts on the constructors directly. This is most evident with more complex matches that include some matchable terms:
f :: MyType -> Int
f (A a) = a + 1
f (B a b) = a + b
How would you rewrite these patterns into boolean expressions? You can't (without knowing anything else about MyType
).
Instead, the pattern matching just goes by constructor. Each pattern has to be headed by a constructor--you can't write a pattern like f (a b c)
in Haskell. Then, when the function gets a value, it can look at the value's constructor and jump to the appropriate cases immediately. This is the way it has to work for more complicated patterns (like A a
), and is also the way it works for the simple patterns you used.
Since pattern matching just works by going to the appropriate constructor, it does not depend on Eq
at all. Not only do you not have to have an Eq
instance to pattern match, but having one also does not change how pattern matching behaves. For example, try this:
data MyType = TypeA | TypeB | TypeC
instance Eq MyType where
TypeA == TypeA = True
TypeB == TypeC = True
TypeC == TypeB = True
_ == _ = False
match :: MyType → String
match TypeA = "this is type A"
match TypeB = "this is type B"
match TypeC = "this is type C"
match2 :: MyType → String
match2 a | a == TypeA = "this is type A matched by equality"
| a == TypeC = "this is type B matched by equality"
| a == TypeB = "this is type C matched by equality"
| otherwise = "this is neither type A nor type B"
Now you've defined equality such that TypeB
is equal to TypeC
but not to itself. (In real life, you should ensure that equality behaves reasonably and follows the reflexive property, but this is a toy example.) Now, if you use pattern matching, TypeB
still matches TypeB
and TypeC
matches TypeC
. But if you use your guard expressions, TypeB
actually matches TypeC
and TypeC
matches TypeB
. TypeA
is unchanged between the two.
Moreover, note how the Eq
instance was defined using pattern matching. When you use a deriving
clause, it gets defined in a similar way with code generated at compile time. So pattern matching is more fundamental than Eq
--it is part of the language where Eq
is just part of the standard library.
In summary: pattern matching is built into the language and works by comparing the constructor and then recursively matching on the rest of the value. Equality is usually implemented in terms of pattern matching and compares the whole value rather than just the constructor.
The thing you are missing is that constructors in Haskell can have arguments. The type tags "themselves" are comparable by equality (at least internally, behind the scenes), but the full values are only comparable if the constituent arguments are also comparable.
So if you have a type like
data Maybe a = Nothing | Just a
then even though you can test if a type tag is "Nothing" or "Just" (ie.; pattern match on the maybe value) in general you can't compare the full maybe unless the value of type "a" that is being held by the Just also happens to be comparable.
--note that your first and third examples are
--just syntactic sugar for each other...
matchMaybe mb = case mb of
Nothing -> "Got a Nothing"
Just _ -> "Got a Just but ignored its value"
It should now also be clear why its not possible to write a variation of match2 for Maybes. What would you use to test for equality in the Just case?
matchMaybe_2 mb | mb == Nothing = "Got a Nothing"
| mb == Just {- ??? -} = "This case is impossible to write like this"