Chord detection algorithms?

The author of Capo, a transcription program for the Mac, has a pretty in-depth blog. The entry "A Note on Auto Tabbing" has some good jumping off points:

I started researching different methods of automatic transcription in mid-2009, because I was curious about how far along this technology was, and if it could be integrated into a future version of Capo.

Each of these automatic transcription algorithms start out with some kind of intermediate represenation of the audio data, and then they transfer that into a symbolic form (i.e. note onsets, and durations).

This is where I encountered some computationally expensive spectral representations (The Continuous Wavelet Transform (CWT), Constant Q Transform (CQT), and others.) I implemented all of these spectral transforms so that I could also implement the algorithms presented by the papers I was reading. This would give me an idea of whether they would work in practice.

Capo has some impressive technology. The standout feature is that its main view is not a frequency spectrogram like most other audio programs. It presents the audio like a piano roll, with the notes visible to the naked eye.


(source: supermegaultragroovy.com)

(Note: The hard note bars were drawn by a user. The fuzzy spots underneath are what Capo displays.)


This is quite a good Open Source Project: https://patterns.enm.bris.ac.uk/hpa-software-package

It detects chords based on a chromagram - a good solution, breaks down a window of the whole spectrum onto an array of pitch classes (size: 12) with float values. Then, chords can be detected by a Hidden Markov Model.

.. should provide you with everything you need. :)


See my answer to this question: How can I do real-time pitch detection in .Net?

The reference to this IEEE paper is mainly what you're looking for: http://ieeexplore.ieee.org/Xplore/login.jsp?reload=true&url=/iel5/89/18967/00876309.pdf?arnumber=876309

The harmonics are throwing you off. Plus, humans can find fundamentals in sound even when the fundamental isn't present! Think of reading, but by covering half of the letters. The brain fills in the gaps.

The context of other sounds in the mix, and what came before, is very important to how we perceive notes.


There's significant overlap between chord detection and key detection, and so you may find some of my previous answer to that question useful, as it has a few links to papers and theses. Getting a good polyphonic recogniser is incredibly difficult.

My own viewpoint on this is that applying polyphonic recognition to extract the notes and then trying to detect chords from the notes is the wrong way to go about it. The reason is that it's an ambiguous problem. If you have two complex tones exactly an octave apart then it's impossible to detect whether there are one or two notes playing (unless you have extra context such as knowing the harmonic profile). Every harmonic of C5 is also a harmonic of C4 (and of C3, C2, etc). So if you try a major chord in a polyphonic recogniser then you are likely to get out a whole sequence of notes that are harmonically related to your chord, but not necessarily the notes you played. If you use an autocorrelation-based pitch detection method then you'll see this effect quite clearly.

Instead, I think it's better to look for the patterns that are made by certain chord shapes (Major, Minor, 7th, etc).