Is there a "standard" dataset for music in symbolic form?

You can find a list of sites with sheet music in MusicXML and MusicXML-compatible formats at:

http://www.recordare.com/musicxml/music

Many of those sites include MIDI files and other formats as well.


The Sheet Music Project at www.gutenberg.org looks like what you're looking for. It uses MusicXML.


KernScores is a really good collection. It has a section for people looking for datasets with > 10,000 monophonic pieces and other categories.

Edit: It also allows you to download whole sections at a time, zipped, which is a huge benefit when you don't want to have to click every individual download link.


I'm not aware of a "standard" dataset. However, the places I know of for music scores in symbolic form are:

  • The Mutopia Project, a repository for free/libre music scores in Lilypond format. They standardise on Lilypond because it is a free/libre tool, it produces high-quality scores, and it’s possible to convert from many formats into Lilypond. They currently host over 1700 scores.
  • The aforementioned Gutenberg Sheet Music Project, an interesting one to watch. It hosts less than 100 scores now. However, it’s an offshoot of the tremendously successful Gutenburg Project for free ebooks (literature in plain text form), so they know how to run this sort of project. They have an excellent organised approach to content production.
  • MuseScore, a repository for music arrangements. They prefer MuseScore's own .mscz format, but support many others. [Added December 2019]
    • Wikifonia, a repository for lead sheets of songs. [As of December 2019, this site announces that it has closed.] A lead sheet is a simplified music score, perhaps enough to sing at a piano with friends, but not enough to publish a vocal score. They use MusicXML as their standard format. I estimate they have over 4000 scores. Interestingly, they have an arrangement to pay royalties for music they host. This is probably the best home for re-typeset scores of non-free/libre music. [This site was in operation in January 2012, when the answer was first written, but has ceased operation by December 2019, when this edit was made. Since the question is also old and closed, it's worth leaving this legacy entry in the answer.]