Why is there "data" and "newtype" in Haskell?

According to Learn You a Haskell:

Instead of the data keyword, the newtype keyword is used. Now why is that? Well for one, newtype is faster. If you use the data keyword to wrap a type, there's some overhead to all that wrapping and unwrapping when your program is running. But if you use newtype, Haskell knows that you're just using it to wrap an existing type into a new type (hence the name), because you want it to be the same internally but have a different type. With that in mind, Haskell can get rid of the wrapping and unwrapping once it resolves which value is of what type.

So why not just use newtype all the time instead of data then? Well, when you make a new type from an existing type by using the newtype keyword, you can only have one value constructor and that value constructor can only have one field. But with data, you can make data types that have several value constructors and each constructor can have zero or more fields:

data Profession = Fighter | Archer | Accountant  

data Race = Human | Elf | Orc | Goblin  

data PlayerCharacter = PlayerCharacter Race Profession 

When using newtype, you're restricted to just one constructor with one field.

Now consider the following type:

data CoolBool = CoolBool { getCoolBool :: Bool } 

It's your run-of-the-mill algebraic data type that was defined with the data keyword. It has one value constructor, which has one field whose type is Bool. Let's make a function that pattern matches on a CoolBool and returns the value "hello" regardless of whether the Bool inside the CoolBool was True or False:

helloMe :: CoolBool -> String  
helloMe (CoolBool _) = "hello"  

Instead of applying this function to a normal CoolBool, let's throw it a curveball and apply it to undefined!

ghci> helloMe undefined  
"*** Exception: Prelude.undefined  

Yikes! An exception! Now why did this exception happen? Types defined with the data keyword can have multiple value constructors (even though CoolBool only has one). So in order to see if the value given to our function conforms to the (CoolBool _) pattern, Haskell has to evaluate the value just enough to see which value constructor was used when we made the value. And when we try to evaluate an undefined value, even a little, an exception is thrown.

Instead of using the data keyword for CoolBool, let's try using newtype:

newtype CoolBool = CoolBool { getCoolBool :: Bool }   

We don't have to change our helloMe function, because the pattern matching syntax is the same if you use newtype or data to define your type. Let's do the same thing here and apply helloMe to an undefined value:

ghci> helloMe undefined  
"hello"

It worked! Hmmm, why is that? Well, like we've said, when we use newtype, Haskell can internally represent the values of the new type in the same way as the original values. It doesn't have to add another box around them, it just has to be aware of the values being of different types. And because Haskell knows that types made with the newtype keyword can only have one constructor, it doesn't have to evaluate the value passed to the function to make sure that it conforms to the (CoolBool _) pattern because newtype types can only have one possible value constructor and one field!

This difference in behavior may seem trivial, but it's actually pretty important because it helps us realize that even though types defined with data and newtype behave similarly from the programmer's point of view because they both have value constructors and fields, they are actually two different mechanisms. Whereas data can be used to make your own types from scratch, newtype is for making a completely new type out of an existing type. Pattern matching on newtype values isn't like taking something out of a box (like it is with data), it's more about making a direct conversion from one type to another.

Here's another source. According to this Newtype article:

A newtype declaration creates a new type in much the same way as data. The syntax and usage of newtypes is virtually identical to that of data declarations - in fact, you can replace the newtype keyword with data and it'll still compile, indeed there's even a good chance your program will still work. The converse is not true, however - data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.

Some Examples:

newtype Fd = Fd CInt
-- data Fd = Fd CInt would also be valid

-- newtypes can have deriving clauses just like normal types
newtype Identity a = Identity a
  deriving (Eq, Ord, Read, Show)

-- record syntax is still allowed, but only for one field
newtype State s a = State { runState :: s -> (s, a) }

-- this is *not* allowed:
-- newtype Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- but this is:
data Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- and so is this:
newtype Pair' a b = Pair' (a, b)

Sounds pretty limited! So why does anyone use newtype?

The short version The restriction to one constructor with one field means that the new type and the type of the field are in direct correspondence:

State :: (s -> (a, s)) -> State s a
runState :: State s a -> (s -> (a, s))

or in mathematical terms they are isomorphic. This means that after the type is checked at compile time, at run time the two types can be treated essentially the same, without the overhead or indirection normally associated with a data constructor. So if you want to declare different type class instances for a particular type, or want to make a type abstract, you can wrap it in a newtype and it'll be considered distinct to the type-checker, but identical at runtime. You can then use all sorts of deep trickery like phantom or recursive types without worrying about GHC shuffling buckets of bytes for no reason.

See the article for the messy bits...


Both newtype and the single-constructor data introduce a single value constructor, but the value constructor introduced by newtype is strict and the value constructor introduced by data is lazy. So if you have

data D = D Int
newtype N = N Int

Then N undefined is equivalent to undefined and causes an error when evaluated. But D undefined is not equivalent to undefined, and it can be evaluated as long as you don't try to peek inside.

Couldn't the compiler handle this for itself.

No, not really—this is a case where as the programmer you get to decide whether the constructor is strict or lazy. To understand when and how to make constructors strict or lazy, you have to have a much better understanding of lazy evaluation than I do. I stick to the idea in the Report, namely that newtype is there for you to rename an existing type, like having several different incompatible kinds of measurements:

newtype Feet = Feet Double
newtype Cm   = Cm   Double

both behave exactly like Double at run time, but the compiler promises not to let you confuse them.