Clarification on Existential Types in Haskell

GADT's provide the clear & better syntax to code using Existential Types by providing implicit forall's

I think there's general agreement that the GADT syntax is better. I wouldn't say that it's because GADTs provide implicit foralls, but rather because the original syntax, enabled with the ExistentialQuantification extension, is potentially confusing/misleading. That syntax, of course, looks like:

data SomeType = forall a. SomeType a

or with a constraint:

data SomeShowableType = forall a. Show a => SomeShowableType a

and I think the consensus is that the use of the keyword forall here allows the type to be easily confused with the completely different type:

data AnyType = AnyType (forall a. a)    -- need RankNTypes extension

A better syntax might have used a separate exists keyword, so you'd write:

data SomeType = SomeType (exists a. a)   -- not valid GHC syntax

The GADT syntax, whether used with implicit or explicit forall, is more uniform across these types, and seems to be easier to understand. Even with an explicit forall, the following definition gets across the idea that you can take a value of any type a and put it inside a monomorphic SomeType':

data SomeType' where
    SomeType' :: forall a. (a -> SomeType')   -- parentheses optional

and it's easy to see and understand the difference between that type and:

data AnyType' where
    AnyType' :: (forall a. a) -> AnyType'

Existential Types not seem to be interested in the type they contain but pattern matching them say that there exists some type we don't know what type it is until & unless we use Typeable or Data.

We use them when we want to Hide types (ex: for Heterogeneous Lists) or we don't really know what the types at Compile Time.

I guess these aren't too far off, though you don't have to use Typeable or Data to use existential types. I think it would be more accurate to say an existential type provides a well-typed "box" around an unspecified type. The box does "hide" the type in a sense, which allows you to make a heterogeneous list of such boxes, ignoring the types they contain. It turns out that an unconstrained existential, like SomeType' above is pretty useless, but a constrained type:

data SomeShowableType' where
    SomeShowableType' :: forall a. (Show a) => a -> SomeShowableType'

allows you to pattern match to peek inside the "box" and make the type class facilities available:

showIt :: SomeShowableType' -> String
showIt (SomeShowableType' x) = show x

Note that this works for any type class, not just Typeable or Data.

With regard to your confusion about page 20 of the slide deck, the author is saying that it's impossible for a function that takes an existential Worker to demand a Worker having a particular Buffer instance. You can write a function to create a Worker using a particular type of Buffer, like MemoryBuffer:

class Buffer b where
  output :: String -> b -> IO ()
data Worker x = forall b. Buffer b => Worker {buffer :: b, input :: x}
data MemoryBuffer = MemoryBuffer
instance Buffer MemoryBuffer

memoryWorker = Worker MemoryBuffer (1 :: Int)
memoryWorker :: Worker Int

but if you write a function that takes a Worker as argument, it can only use the general Buffer type class facilities (e.g., the function output):

doWork :: Worker Int -> IO ()
doWork (Worker b x) = output (show x) b

It can't try to demand that b be a particular type of buffer, even via pattern matching:

doWorkBroken :: Worker Int -> IO ()
doWorkBroken (Worker b x) = case b of
  MemoryBuffer -> error "try this"       -- type error
  _            -> error "try that"

Finally, runtime information about existential types is made available through implicit "dictionary" arguments for the typeclasses that are involved. The Worker type above, in addtion to having fields for the buffer and input, also has an invisible implicit field that points to the Buffer dictionary (somewhat like v-table, though it's hardly huge, as it just contains a pointer to the appropriate output function).

Internally, the type class Buffer is represented as a data type with function fields, and instances are "dictionaries" of this type:

data Buffer' b = Buffer' { output' :: String -> b -> IO () }

dBuffer_MemoryBuffer :: Buffer' MemoryBuffer
dBuffer_MemoryBuffer = Buffer' { output' = undefined }

The existential type has a hidden field for this dictionary:

data Worker' x = forall b. Worker' { dBuffer :: Buffer' b, buffer' :: b, input' :: x }

and a function like doWork that operates on existential Worker' values is implemented as:

doWork' :: Worker' Int -> IO ()
doWork' (Worker' dBuf b x) = output' dBuf (show x) b

For a type class with only one function, the dictionary is actually optimized to a newtype, so in this example, the existential Worker type includes a hidden field that consists of a function pointer to the output function for the buffer, and that's the only runtime information needed by doWork.


In Page 20 of above PDF it is mentioned for below code that it is impossible for a Function to demand specific Buffer. Why is it so?

Because Worker, as defined, takes only one argument, the type of the "input" field (type variable x). E.g. Worker Int is a type. The type variable b, instead, is not a parameter of Worker, but is a sort of "local variable", so to speak. It can not be passed as in Worker Int String -- that would trigger a type error.

If we instead defined:

data Worker x b = Worker {buffer :: b, input :: x}

then Worker Int String would work, but the type is no longer existential -- we now always have to pass the buffer type as well.

As Haskell is a Full Type Erasure language like C then How does it know at Runtime which function to call. Is it something like we gonna maintain few information and pass in a Huge V-Table of Functions and at runtime it gonna figure out from V-Table? If it is so then what sort of Information it gonna store?

This is roughly correct. Briefly put, each time you apply constructor Worker, GHC infers the b type from the arguments of Worker, and then searches for an instance Buffer b. If that is found, GHC includes an additional pointer to the instance in the object. In its simplest form, this is not too different from the "pointer to vtable" which is added to each object in OOP when virtual functions are present.

In the general case, it can be much more complex, though. The compiler might use a different representation and add more pointers instead of a single one (say, directly adding the pointers to all the instance methods), if that speeds up code. Also, sometimes the compiler needs to use multiple instances to satisfy a constraint. E.g., if we need to store the instance for Eq [Int] ... then there is not one but two: one for Int and one for lists, and the two needs to be combined (at run time, barring optimizations).

It is hard to guess exactly what GHC does in each case: that depends on a ton of optimizations which might or might not trigger.

You could try googling for the "dictionary based" implementation of type classes to see more about what's going on. You can also ask GHC to print the internal optimized Core with -ddump-simpl and observe the dictionaries being constructed, stored, and passed around. I have to warn you: Core is rather low level, and can be hard to read at first.