Intuitive Way To Understand Principal Component Analysis

Principal component analysis is a useful technique when dealing with large datasets. In some fields, (bioinformatics, internet marketing, etc) we end up collecting data which has many thousands or tens of thousands of dimensions. Manipulating the data in this form is not desirable, because of practical considerations like memory and CPU time. However, we can't just arbitrarily ignore dimensions either. We might lose some of the information we are trying to capture!

Principal component analysis is a common method used to manage this tradeoff. The idea is that we can somehow select the 'most important' directions, and keep those, while throwing away the ones that contribute mostly noise.

For example, this picture shows a 2D dataset being mapped to one dimension: alt text
Note that the dimension chosen was not one of the original two: in general, it won't be, because that would mean your variables were uncorrelated to begin with.
We can also see that the direction of the principal component is the one that maximizes the variance of the projected data. This is what we mean by 'keeping as much information as possible.'


Spent the day learning PCA, hope my cartoon translates the intuition over to you!

I have also tried to briefly explain the utility of PCA and related it to an analogy (no maths) to help give that feeling of "learning closure".

Visual Intuition (zoom in)

enter image description here

Intuition via Utility

I think the main usage for PCA is to be able to categorise different distinct "things" e.g. Shiny cells vs. Dark cells in a way that leads to least error (in terms of predicting the right colour cell). E.g. Imagine sam was hiding behind me and I pinched a cell off the left side of his body then asked you to guess the color of the cell, by looking at the winning photo, or even the winning line, you can make a very good guess it will be a "dark cell".

Intuition via Analogy

So my understanding is that PCA is like taking a "picture" in a lower dimension, but the various methods used out there attempt to make the picture as informative as possible by deciding which "angle" to take the picture from (notice for 1D the angle of "squishing line" also vary).

Good video

http://www.youtube.com/watch?v=UUxIXU_Ob6E


PCA basically is a projection of a higher-dimensional space into a lower dimensional space while preserving as much information as possible.

I wrote a blog post where I explain PCA via the projection of a 3D-teapot...

enter image description here

...onto a 2D-plane while preserving as much information as possible:

enter image description here

Details and full R-code can be found in the post:
http://blog.ephorie.de/intuition-for-principal-component-analysis-pca