What are "sums-and-products" data structures?

Very detailed answers have already been given, but somehow they don't mention this simple fact.

Sum types are called so because the number of possible values of a sum type is the sum of the number of values of the two underlying types. Similarly for product types, the number of possible values is the product.

This stems from type theory defining a type as a set of values.

data MySumType = Foo Bool | Bar Char
data MyProductType = Baz (Bool, Char)
  • Bool is a set of 2 values.
  • Char is a set of 256 values.
  • MySumType is a set of 2 + 256 values.
  • MyProductType is a set of 2 * 256 values.

Now you will never forget which is which.


He is referring to the standard algebraic data types of functional programming languages.

Examples:

  • If a is of type A and b is of type B then (a, b) is of type A x B, which is a product type.

  • A list type with values of the form Nil or Cons x xs is a sum type.

Ensō apparently has a greater emphasis on graphs than these tree-like algebraic types.


What are the traditional sums-and-products data structures he is referring to?

In type theory, regular data structures can be described in terms of sums, products and recursive types. This leads to an algebra for describing data structures (and so-called algebraic data types). Such data types are common in statically typed functional languages, such as ML or Haskell.

Products

Products can be thought of as the type-theoretic view of "structs" or "tuples".

Formally, PFPL, Ch 14:

The binary product of two types consists of ordered pairs of values, one from each type in the order specified. The associated eliminatory forms are projections, which select the first and second component of a pair. The nullary product, or unit, type consists solely of the unique “null tuple” of no values, and has no associated eliminatory form.

Sums

Sum types express choice between variants of a data structure. Sometimes they are called "union types" (as in C). Many languages have no notion of sum types.

PFPL, ch 15:

Most data structures involve alternatives such as the distinction between a leaf and an interior node in a tree, or a choice in the outermost form of a piece of abstract syntax. Importantly, the choice determines the structure of the value. For example, nodes have children, but leaves do not, and so forth. These concepts are expressed by sum types, specifically the binary sum, which offers a choice of two things, and the nullary sum, which offers a choice of no things.

Recursive types

Along with products and sums, we can introduce recursion, so a type may be defined (partially) in terms of itself. Nice examples include trees and lists.

 data List a = Empty | a : List a

 data Tree a = Nil | Node a (Tree a) (Tree a)

Algebra of sums, products and recursion

Give a type, say Int, we can start building up a notation for algebraic expressions that describe data structures:

A lone variable:

Int

A product of two types (denoting a pair):

Int * Bool

A sum of two types (denoting a choice between two types):

Int + Bool

And some constants:

1 + Int

where 1 is the unit type, ().

Once you can describe types this way, you get some cool power for free. Firstly, a very concise notation for describing data types, secondly, some results transfer from other algebras (e.g. differentiation works on data structures).

Examples

The unit type, data () = ()

enter image description here

A tuple, the simplest product type: data (a,b) = (a,b)

enter image description here

A simple sum type, data Maybe a = Nothing | Just a

enter image description here

and its alternative,

enter image description here

and a recursive type, the type of linked lists: data [a] = [] | a : [a]

enter image description here

Given these, you can build quite complicated structures by combining sums, products and recursive types. E.g. the simple notation for a list of products of sums of products: [(Maybe ([Char], Double), Integer)] gives rise to some quite complicated trees:

enter image description here


References

  • Practical Foundations for Programming Language, Robert Harper, 2011, http://www.cs.cmu.edu/~rwh/plbook/book.pdf
  • Quick intro to the algebra of data types, http://blog.lab49.com/archives/3011 (Link appears to be dead. Internet Archive link: http://web-old.archive.org/web/20120321033340/http://blog.lab49.com/archives/3011)
  • Species and Functors and Types, Oh My!, Brent Yorgey, http://www.cis.upenn.edu/~byorgey/papers/species-pearl.pdf -- has a very good overview of the algebra, Haskell data types, and the connection with combinatorial species from mathematics.

From the lecture notes for the Coursera course, "Programming Languages", offered by Univ. of Washington:

"Why product and sum? It is related to the fact that in Boolean algebra where 0 is false and 1 is true, and works like multiply and or works like addition."