The dataset presented on this website served as basis for studying the analysis and classification of Western classical music recordings in several publications [1-5]. It is compiled from commercial audio recordings, totalling 2000 tracks, where a track refers to the movement level of a piece. The dataset is balanced with respect to timbral characteristics and contains each 1000 tracks for piano and orchestra (without singing voice / solo instruments), respectively. Furthermore, the set is balanced with respect to the historical periods by containing each 400 pieces that are representative for the four historical periods Baroque, Classical, Romantic, and Modern (20th century), as well as an "Add-On" class comprising transitional composers between those periods. We provide annotations including composer- and piece-specific information as well as global key labels for the 1200 pieces of the Baroque, Classical, and Romantic periods. Furthermore, chroma-based audio features and automatically computed chord labels are available.
See also Cross-Composer Dataset
For the experiments in [1-5], we were interested in the typical repertoire of Western classical music. Therefore, we focused on composers whose works frequently appear in concerts and on classical radio programs. At the same time, we tried to ensure a certain variety of countries, composers, musical forms, keys, or tempi. For every period, the dataset incorporates each 200 pieces of orchestra and piano music. To avoid a bias due to timbral characteristics, we only selected piano recordings performed on the modern grand piano (no harpsichord recordings in the piano_baroque class). Moreover, the orchestral data neither includes works featuring vocal parts nor solo concertos. Every category contains music from a minimum of five different composers from three different countries.
For the classification experiments in [2-4], we tried to avoid ambiguous tasks and only considered the classes Baroque, Classical, Romantic, and Modern were considered (1600 pieces). To this end, we did not include composers whose stylistic attribution is rather ambiguous. For example, we did not select works by Beethoven or Schubert since these composers show influences from both Classical and Romantic styles. As a consequence, the data does not show an equal distribution with respect to the composers' lifetimes but exhibits some historical gaps. To overcome this problem, we created an additional set of recordings comprising works by such transitional composers. This "Add-On" set includes each 200 piano and orchestra pieces and serves to fill the gaps between the historical periods. We end up with a more or less balanced distribution, which enables the analysis of style characteristics over history as published in [1]. The table gives an overview of the classes and the respective composers. A graphical overview can be found at the end of this site.
Class | Tracks | Composers |
---|---|---|
orchestra_baroque | 200 | Albinoni, T.; Bach, J. S.; Corelli, A.; Handel, G. F.; Lully, J.-B.; Purcell, H.; Rameau, J.-P.; Vivaldi, A. |
orchestra_classical | 200 | Bach, J. C.; Boccherini, L. R.; Haydn, J. M.; Haydn, J.; Mozart, W. A.; Pleyel, I. J.; Salieri, A. |
orchestra_romantic | 200 | Berlioz, H.; Borodin, A.; Brahms, J.; Bruckner, A.; Dvorak, A.; Grieg, E.; Liszt, F.; Mendelssohn Bartholdy, F.; Mussorgsky, M.; Rimsky-Korsakov, N.; Saint-Saëns, C.; Schumann, R. |
orchestra_modern | 200 | Antheil, G.; Bartók, B.; Berg, A.; Britten, B.; Hindemith, P.; Ives, C. E.; Messiaen, O.; Prokofiev, S.; Schönberg, A.; Shostakovich, D.; Stravinsky, I.; Varèse, E.; Webern, A.; Weill, K. |
orchestra_addon | 200 | Bach, C. P. E.; Beethoven, L. van; Debussy, C.; Mahler, G.; Mozart, Leopold; Ravel, M.; Rossini, G., Scarlatti, D.; Schubert, F.; Sibelius, J.; Stamitz, Johann; Strauss, R.; Telemann, G. P.; Weber, C. M. von |
piano_baroque | 200 | Bach, J. S.; Couperin, F.; Giustini, L.; Platti, G. B.; Rameau, J.-P. |
piano_classical | 200 | Cimarosa, D.; Clementi, M.; Dussek, J. L.; Haydn, J.; Mozart, W. A. |
piano_romantic | 200 | Brahms, J.; Chopin, F.; Faure, G.; Grieg, E.; Liszt, F.; Mendelssohn Bartholdy, F.; Schumann, C.; Schumann, R.; Tchaikovsky, P. I. |
piano_modern | 200 | Bartók, B.; Berg, A.; Boulez, P.; Hindemith, P.; Messiaen, O.; Milhaud, D.; Prokofiev, S.; Schönberg, A.; Shostakovich, D., Stravinsky, I.; Webern, A. |
piano_addon | 200 | Bach, C. P. E.; Beethoven, L. van; Debussy, C.; Ravel, M.; Scarlatti, D.; Schubert, F.; Sibelius, J.; Weber, C. M. von |
If you publish results obtained using these annotations, please cite [1].
We provide detailed annotations to the dataset comprising composer information (Country, Lifetime) as well as piece-related information. For the 1200 pieces from the periods Baroque, Classical, and Romantic, we also provide expert annotations of the global key. This annotations were used for evaluating global key detection in [5]. The annotations are given as a with delimiter "," (comma) comprising with the following fields:
Column | Content | Example |
---|---|---|
A | Class | orchestra_baroque |
B | Filename | CrossEra-0025_Bach_Brandenburg_concerto_in_a_minor_bwv_1044_alla_breve.mp3 |
C | CrossEra-ID | CrossEra-0025 |
D | Instrumentation | orchestra |
E | Key | A |
F | Mode | minor |
G | Composer | Bach; Johann Sebastian |
H | CompLifetime | 1685-1750 |
If you publish results obtained using these features, please cite [1].
Since the dataset consists of commercial recordings, we cannot make the audio files publicly available. In order to allow reproducibility of some of our experiments, we provide chroma features of the pieces. We use the NNLS chroma algorithm as published in [6], which is freely available as a VAMP plugin. Concerning the parameters, we used a window size of 8192 samples and a step size of 4410 samples leading to a chromagram resolution of 10 Hz. We use the NNLS approximate transcription and no normalization. The features are provided as a .zip file containing one .csv file for each of the 10 classes. The columns are used in the following order where the first column is only filled when a new file begins:
Column | Content | Example |
---|---|---|
A | Class/Filename | "orchestra_baroque/CrossEra-0025_Bach_Brandenburg_concerto_in_a_minor_bwv_1044_alla_breve.mp3" |
B | Time (seconds) | 0.10000 |
C-N | Chroma A-G# | 0.4876, 1.2604, 0.0633, 1.4429, 1.6179, 0.0915, 0.9377, 0.0023, 0.9897, 0.4835, 0.7726, 0.4993 |
Download NNLS Chroma Features (zip, 244 MB)
If you publish results obtained using these features, please cite [1].
We also provide chord sequences extracted from the audio files using the Chordino plugin based on NNLS chroma features [6]. The tool is part of the Chordino VAMP plugin. Concerning the parameters, we used a window size of 16384 samples and a step size of 4410 samples leading to a resolution of 10 Hz. We use the NNLS approximate transcription but do not make use of the bass chroma. The dictionary file for our chord analysis can be found here. The features are provided as a .zip file containing one .csv file for each of the 10 classes. The columns are used in the following order:
Column | Content | Example |
---|---|---|
A | Class/Filename | "orchestra_baroque/CrossEra-0025_Bach_Brandenburg_concerto_in_a_minor_bwv_1044_alla_breve.mp3" |
B | Time (seconds) | 0.30000 |
C | Chord Label | "E_maj_min7" |
The first column is only filled when a new file begins. New lines are only written at chord changes. In our nomenclature, the triad type (maj, min, dim, aug) is specified after the first underscore. If existing, the seventh type (maj7, min7, dim7) is specified after the second underscore. "N" indicates No-Chord regions.
Download Chordino Chord Features (zip, 1 MB)
This is an accompanying website to the paper "Investigating Style Evolution of Western Classical Music: A Computational Approach" [1], where further details on the dataset, the annotation process, and the applications are discussed.
@article{WeissMDM19_StyleEvolution_MusicaeScientiae, author = {Christof Wei{\ss} and Matthias Mauch and Simon Dixon and Meinard M{\"u}ller}, title = {Investigating Style Evolution of {W}estern Classical Music: A Computational Approach}, journal = {Musicae Scientiae}, volume = {23}, number = {4}, pages = {486--507}, year = {2019}, doi = {10.1177/1029864918757595}, url-pdf = {https://doi.org/10.1177/1029864918757595}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/cross-era/} }
@PhdThesis{Weiss17_StyleAnalysis_PhD, author = {Christof Wei{\ss}}, title = {Computational Methods for Tonality-Based Style Analysis of Classical Music Audio Recordings}, school = {Ilmenau University of Technology}, address = {Ilmenau, Germany}, year = {2017}, url = {http://www.db-thueringen.de/receive/dbt_mods_00032890}, url-pdf = {http://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00039054/ilm1-2017000293.pdf}, url-presentation = {http://www.audiolabs-erlangen.de/fau/assistant/weiss/publications/2017_Weiss_PhD-Defense_TUIlmenau.pdf} }
@inproceedings{WeissS15_KeyDetectionStyle_ISMIR, author = {Christof Wei{\ss} and Maximilian Schaab}, title = {On the Impact of Key Detection Performance for Identifying Classical Music Styles}, booktitle = {Proceedings of the 16th International Society for Music Information Retrieval Conference ({ISMIR})}, pages = {45--51}, address = {M{\'a}laga, Spain}, year = {2015}, url-pdf = {http://ismir2015.uma.es/articles/44_Paper.pdf} }
@inproceedings{WeissM15_TonalComplexity_ICASSP, author = {Christof Wei{\ss} and Meinard M{\"u}ller}, title = {Tonal Complexity Features for Style Classification of Classical Music}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, pages = {688--692}, address = {Brisbane, Australia}, year = {2015}, url-pdf = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7178057} }
@inproceedings{WeissMD14_StyleClassification_ICMC, author = {Christof Wei{\ss} and Matthias Mauch and Simon Dixon}, title = {Timbre-Invariant Audio Features for Style Analysis of Classical Music}, booktitle = {Proceedings of the Joint Conference 40th {ICMC} and 11th {SMC}}, pages = {1461--1468}, address = {Athens, Greece}, year = {2014}, url-pdf = {http://speech.di.uoa.gr/ICMC-SMC-2014/images/VOL_2/1461.pdf} }
@inproceedings{MauchD10_DifficultChords_ISMIR, author = {Matthias Mauch and Simon Dixon}, title = {Approximate Note Transcription for the Improved Identification of Difficult Chords}, booktitle = {Proceedings of the 11th International Society for Music Information Retrieval Conference ({ISMIR})}, year = {2010}, address = {Utrecht, The Netherlands}, pages = {135--140}, url-pdf = {http://ismir2010.ismir.net/proceedings/ismir2010-25.pdf} }
This overview is taken from [2]. A bar corresponds to the composer's lifetime. The color marks the class a composer belongs to. Yellow bars refer to the ``Add-On'' data. With the intensity of the color, we indicate the number of the composer's works considered in the dataset. More intense colors correspond to a higher number. Popular composers such as Johann Sebastian Bach, Wolfgang Amadeus Mozart, or Dmitri Shostakovich contribute more works than others. Following this principle, our dataset may represent the typical repertoire of Western classical music.
This dataset and some of the associated publications [2-5] were created at Fraunhofer Institute for Digital Media Technology in Ilmenau, Germany. The work was part of the PhD dissertation [2] by Christof Weiß, which was supported by the Foundation of German Business (Stiftung der Deutschen Wirtschaft). The paper [1] was written at AudioLabs Erlangen with Prof. Meinard Müller. Most of the scientific was carried out with in a research stay at Centre for Digital Music, Queen Mary University of London, UK, with Dr. Matthias Mauch and Dr. Simon Dixon. We thank Judith Wolff and Maximilian Schaab for contributing to the dataset and to the annotations.