What is boxing and unboxing and what are the trade offs?

from C# 3.0 In a Nutshell:

Boxing is the act of casting a value type into a reference type:

int x = 9; 
object o = x; // boxing the int

unboxing is... the reverse:

// unboxing o
object o = 9; 
int x = (int)o;

Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.

Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.

In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.

Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.

* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.

** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach***.

*** Caveat: A sufficiently advanced compiler / JIT can in some cases actually detect that a value which is semantically boxed when looking at the source, can safely be an unboxed value at runtime. In essence, thanks to brilliant language implementors your boxes are sometimes free.

Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).

For example, in java, you may need to convert an int value into an Integer (boxing) if you want to store it in a Collection because primitives can't be stored in a Collection, only objects. But when you want to get it back out of the Collection you may want to get the value as an int and not an Integer so you would unbox it.

Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.

These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.

What is boxing and unboxing and what are the trade offs?

Tags:

Language Agnostic

Boxing

Glossary

Unboxing

Related

Recent Posts