Big-O for Eight Year Olds?

One way of thinking about it is this:

O(N^2) means for every element, you're doing something with every other element, such as comparing them. Bubble sort is an example of this.

O(N log N) means for every element, you're doing something that only needs to look at log N of the elements. This is usually because you know something about the elements that let you make an efficient choice. Most efficient sorts are an example of this, such as merge sort.

O(N!) means to do something for all possible permutations of the N elements. Traveling salesman is an example of this, where there are N! ways to visit the nodes, and the brute force solution is to look at the total cost of every possible permutation to find the optimal one.


This might be too mathematical, but here's my try. (I am a mathematician.)

If something is O(f(n)), then it's running time on n elements will be equal to A f(n) + B (measured in, say, clock cycles or CPU operations). It's key to understanding that you also have these constants A and B, which arise from the specific implementation. B represents essentially the "constant overhead" of your operation, for example some preprocessing that you do that doesn't depend on the size of the collection. A represents the speed of your actual item-processing algorithm.

The key, though, is that you use big O notation to figure out how well something will scale. So those constants won't really matter: if you're trying to figure out how to scale from 10 to 10000 items, who cares about the constant overhead B? Similarly, other concerns (see below) will certainly outweigh the weight of the multiplicative constant A.

So the real deal is f(n). If f grows not at all with n, e.g. f(n) = 1, then you'll scale fantastically---your running time will always just be A + B. If f grows linearly with n, i.e. f(n) = n, your running time will scale pretty much as best as can be expected---if your users are waiting 10 ns for 10 elements, they'll wait 10000 ns for 10000 elements (ignoring the additive constant). But if it grows faster, like n2, then you're in trouble; things will start slowing down way too much when you get larger collections. f(n) = n log(n) is a good compromise, usually: your operation can't be so simple as to give linear scaling, but you've managed to cut things down such that it'll scale much better than f(n) = n2.

Practically, here are some good examples:

  • O(1): retrieving an element from an array. We know exactly where it is in memory, so we just go get it. It doesn't matter if the collection has 10 items or 10000; it's still at index (say) 3, so we just jump to location 3 in memory.
  • O(n): retrieving an element from a linked list. Here, A = 0.5, because on average you''ll have to go through 1/2 of the linked list before you find the element you're looking for.
  • O(n2): various "dumb" sorting algorithms. Because generally their strategy involves, for each element (n), you look at all the other elements (so times another n, giving n2), then position yourself in the right place.
  • O(n log(n)): various "smart" sorting algorithms. It turns out that you only need to look at, say, 10 elements in a 1010-element collection to intelligently sort yourself relative to everyone else in the collection. Because everyone else is also going to look at 10 elements, and the emergent behavior is orchestrated just right so that this is enough to produce a sorted list.
  • O(n!): an algorithm that "tries everything," since there are (proportional to) n! possible combinations of n elements that might solve a given problem. So it just loops through all such combinations, tries them, then stops whenever it succeeds.

You might find it useful to visualize it:

Big O Analysis

Also, on LogY/LogX scale the functions n1/2, n, n2 all look like straight lines, while on LogY/X scale 2n, en, 10n are straight lines and n! is linearithmic (looks like n log n).


The big thing that Big-O notation means to your code is how it will scale when you double the amount of "things" it operates on. Here's a concrete example:

Big-O       |  computations for 10 things |  computations for 100 things
----------------------------------------------------------------------
O(1)        |   1                         |     1
O(log(n))   |   3                         |     7
O(n)        |  10                         |   100
O(n log(n)) |  30                         |   700
O(n^2)      | 100                         | 10000

So take quicksort which is O(n log(n)) vs bubble sort which is O(n^2). When sorting 10 things, quicksort is 3 times faster than bubble sort. But when sorting 100 things, it's 14 times faster! Clearly picking the fastest algorithm is important then. When you get to databases with million rows, it can mean the difference between your query executing in 0.2 seconds, versus taking hours.

Another thing to consider is that a bad algorithm is one thing that Moore's law cannot help. For example, if you've got some scientific calculation that's O(n^3) and it can compute 100 things a day, doubling the processor speed only gets you 125 things in a day. However, knock that calculation to O(n^2) and you're doing 1000 things a day.

clarification: Actually, Big-O says nothing about comparative performance of different algorithms at the same specific size point, but rather about comparative performance of the same algorithm at different size points:

                 computations     computations       computations
Big-O       |   for 10 things |  for 100 things |  for 1000 things
----------------------------------------------------------------------
O(1)        |        1        |        1        |         1
O(log(n))   |        1        |        3        |         7
O(n)        |        1        |       10        |       100
O(n log(n)) |        1        |       33        |       664
O(n^2)      |        1        |      100        |     10000