Video Stabilization with OpenCV

I can suggest one of the following solutions:

  1. Using local high level features: OpenCV includes SURF, so: for each frame, extract SURF features. Then build feature Kd-Tree (also in OpenCV), then match each two consecutive frames to find pairs of corresponding features. Feed those pairs into cvFindHomography to compute the homography between those frames. Warp frames according to (combined..) homographies to stabilize. This is, to my knowledge, a very robust and sophisticated approach, however SURF extraction and matching can be quite slow
  2. You can try to do the above with "less robust" features, if you expect only minor movement between two frames, e.g. use Harris corner detection and build pairs of corners closest to each other in both frames, feed to cvFindHomography then as above. Probably faster but less robust.
  3. If you restrict movement to translation, you might be able to replace cvFindHomography with something more...simple, to just get the translation between feature-pairs (e.g. average)
  4. Use phase-correlation (ref. http://en.wikipedia.org/wiki/Phase_correlation), if you expect only translation between two frames. OpenCV includes DFT/FFT and IFFT, see the linked wikipedia article on formulas and explanation.

EDIT Three remarks I should better mention explicitly, just in case:

  1. The homography based approach is likely very exact, so stationary object will remain stationary. However, homographies include perspective distortion and zoom as well so the result might look a bit..uncommon (or even distorted for some fast movements). Although exact, this might be less visually pleasing; so use this rather for further processing or, like, forensics. But you should try it out, could be super-pleasing for some scenes/movements as well.
  2. To my knowledge, at least several free video-stabilization tools use the phase-correlation. If you just want to "un-shake" the camera, this might be preferable.
  3. There is quite some research going on in this field. You'll find some a lot more sophisticated approaches in some papers (although they likely require more than just OpenCV).

OpenCV has the functions estimateRigidTransform() and warpAffine() which handle this sort of problem really well.

Its pretty much as simple as this:

Mat M = estimateRigidTransform(frame1,frame2,0)
warpAffine(frame2,output,M,Size(640,480),INTER_NEAREST|WARP_INVERSE_MAP) 

Now output contains the contents of frame2 that is best aligned to fit to frame1. For large shifts, M will be a zero Matrix or it might not be a Matrix at all, depending on the version of OpenCV, so you'd have to filter those and not apply them. I'm not sure how large that is; maybe half the frame width, maybe more.

The third parameter to estimateRigidTransform is a boolean that tells it whether to also apply an arbitrary affine matrix or restrict it to translation/rotation/scaling. For the purposes of stabilizing an image from a camera you probably just want the latter. In fact, for camera image stabilization you might also want to remove any scaling from the returned matrix by normalizing it for only rotation and translation.

Also, for a moving camera, you'd probably want to sample M through time and calculate a mean.

Here are links to more info on estimateRigidTransform(), and warpAffine()