This Jupyter notebook accompanies the Musical Theme Dataset (MTD) and demonstrates how to use it. The dataset is described in the following paper.
Frank Zalkow, Stefan Balke, Vlora Arifi-Müller, and Meinard Müller
MTD: A Multimodal Dataset of Musical Themes for MIR Research
Transactions of the International Society for Music Information Retrieval (TISMIR), 2020, under review.
The following website accompanies the paper and presents all links for accessing the MTD.
https://www.audiolabs-erlangen.de/resources/MIR/MTD
This notebook assumes that you are familiar with Python for music processing. In particular, we use Python and Jupyter with standard packages like pandas, pretty_midi, librosa, and matplotlib. If you want to familiarize yourself with Python for music processing, we recommend visiting the Python Notebooks for Fundamentals of Music Processing.
In the first code cell, we import some Python packages.
import os
import glob
import json
import IPython.display as ipd
import numpy as np
import pandas as pd
from pretty_midi import pretty_midi
import librosa
import librosa.display
from matplotlib import pyplot as plt
from matplotlib import patches
%matplotlib inline
We now start by specifying an identifier for a musical theme. As an example, we define the identifier for the first theme of Beethoven's Fifth Symphony. Alternatively, you may remove the comments below to get a random id from the dataset. If you like to search for a specific theme identifier, you can use the MTD overview website.
# specify identifier
mtd_id = '1066'
# or get random identifier
# files = glob.glob(os.path.join('MTD', 'data_EDM-corr_MID', '*.mid'))
# mtd_id = os.path.basename(np.random.choice(files)).split('_')[0][3:]
print(mtd_id)
The following code cell defines all file paths needed for the notebook. All paths are based on the MTD identifier and are printed at the end. This notebook will describe the files and show how to read their content.
def get_file(fn):
files = glob.glob(fn)
assert len(files) == 1, '{} does not exist.'.format(fn)
return files[0]
fn_corr_mid = get_file(os.path.join('MTD', 'data_EDM-corr_MID', f'MTD{mtd_id}_*.mid'))
fn_corr_csv = get_file(os.path.join('MTD', 'data_EDM-corr_CSV', f'MTD{mtd_id}_*.csv'))
fn_alig_mid = get_file(os.path.join('MTD', 'data_EDM-alig_MID', f'MTD{mtd_id}_*.mid'))
fn_alig_csv = get_file(os.path.join('MTD', 'data_EDM-alig_CSV', f'MTD{mtd_id}_*.csv'))
fn_score_pdf = get_file(os.path.join('MTD', 'data_SCORE_IMG', f'MTD{mtd_id}_*.pdf'))
fn_json = get_file(os.path.join('MTD', 'data_META', f'MTD{mtd_id}_*.json'))
fn_wp = get_file(os.path.join('MTD', 'data_ALIGNMENT', f'MTD{mtd_id}_*.csv'))
fn_wav = get_file(os.path.join('MTD', 'data_AUDIO', f'MTD{mtd_id}_*.wav'))
df = pd.DataFrame([
['EDM-corr (MIDI)', fn_corr_mid],
['EDM-corr (CSV)', fn_corr_csv],
['EDM-alig (MIDI)', fn_alig_mid],
['EDM-alig (CSV)', fn_alig_csv],
['SCORE (PDF)', fn_score_pdf],
['Metadata (JSON)', fn_json],
['Alignment (CSV)', fn_wp],
['Audio recording (WAV)', fn_wav],
], columns=['File Type', 'Path'])
ipd.display(ipd.HTML(df.to_html(index=False)))
We now display a score representation (pdf format) of the theme.
ipd.IFrame(fn_score_pdf, width=800, height=200)
We provide various metadata for the themes in JSON format. The following code cell loads the JSON into a pandas object and displays an HTML table with our theme's metadata.
df_metadata = pd.read_json(fn_json, typ='series').to_frame()
ipd.display(ipd.HTML(df_metadata.to_html(header=False)))
We now create an audio player for the annotated occurrence of the theme in an audio recording. The audio file is given in CD-quality, as a stereo file with a sample rate of 44,100 Hz. To embed the file in a space-efficient way, we resample to 8,000 Hz and convert to mono before creating the audio player.
Fs = 8000
x, _ = librosa.load(fn_wav, sr=Fs, mono=True)
ipd.Audio(data=x, rate=Fs)
As one representation of the symbolic theme, we provide MIDI files. The MIDI file have a static tempo of 60 BPM (given a beat in quarters), thus a quarter note has the duration of one second. We now load the MIDI file for the theme using the Python package pretty_midi. Then, we synthesize the theme using sinusoidals and present an audio player for the sonification.
Fs = 8000
cur_mid = pretty_midi.PrettyMIDI(fn_corr_mid)
x = cur_mid.synthesize(fs=Fs, wave=np.sin)
ipd.Audio(data=x, rate=Fs)
As another representation of the symbolic theme, we have CSV files that encode the start, duration, and the pitch of each note. Because of the static tempo of 60 BPM (given a beat in quarters), start and duration are in the musical units of quarter notes. The following code cell reads the CSV file and displays its content.
with open(fn_corr_csv, 'r') as stream:
csv_str = stream.read()
print(csv_str)
We can also use the Python library pandas to create a nice HTML representation of the content from the CSV file.
df = pd.read_csv(fn_corr_csv, sep=';')
ipd.display(ipd.HTML(df.to_html(index=False, float_format='%.5f')))
The following code cell visualizes a piano roll representation for the theme using the corresponding CSV file.
def plot_pianoroll(df, set_lims=True, centric=True, labels=True,
rect_args={'facecolor': 'gray', 'edgecolor': 'k'}):
pitch_min = df['Pitch'].min()
pitch_max = df['Pitch'].max()
time_min = df['Start'].min()
time_max = (df['Start'] + df['Duration']).max()
ax = plt.gca()
for i, (start, duration, pitch) in df.iterrows():
ypos = pitch - 0.5 if centric else pitch
rect = patches.Rectangle((start, ypos), duration, 1, **rect_args)
ax.add_patch(rect)
if set_lims:
plt.ylim([pitch_min - 1.5, pitch_max + 1.5])
plt.xlim([min(time_min, 0), time_max + 0.5])
if labels:
plt.xlabel('Time (quarter notes)')
plt.ylabel('Pitch')
plt.grid()
ax.set_axisbelow(True)
df = pd.read_csv(fn_corr_csv, sep=';')
fig, ax = plt.subplots(1, 1, figsize=(10, 3))
plot_pianoroll(df)
Furthermore, we provide CSV files containing alignments between the symbolic music representations and the audio recordings. The alignments are given as pairs of musical time points in the symbolic files (MIDI or CSV) and physical time points in the audio recording (WAV). The following code cell first shows the content of the CSV file. Then, we visualize a symbolic version as a piano roll (upper subplot), a waveform of the recording (left), and a path with the connecting time points (central subplot).
Fs = 8000
df_wp = pd.read_csv(fn_wp, sep=';')
ipd.display(ipd.HTML(df_wp.to_html(index=False, float_format='%.5f')))
x_wav, _ = librosa.load(fn_wav, sr=Fs, mono=True)
fig, ax = plt.subplots(2, 2, figsize=(10, 5), sharex='col', sharey='row',
gridspec_kw={'width_ratios': [0.25, 1.0], 'height_ratios': [0.25, 1.0]})
ax[0, 0].axis('off')
plt.sca(ax[0, 1])
plot_pianoroll(df, labels=False)
ax[0, 1].set_yticks([])
t_wav = np.arange(0, len(x_wav)) / Fs
ax[1, 0].plot(x_wav, t_wav, 'k')
ax[1, 0].set_xticks([])
ax[1, 0].set_ylabel('Time (seconds)')
ax[1, 0].grid()
ax[1, 0].set_axisbelow(True)
ax[1, 1].plot(df_wp.values[:, 0], df_wp.values[:, 1], 'ro:')
ax[1, 1].set_xlabel('Time (quarter notes)')
ax[1, 1].grid()
ax[1, 1].set_axisbelow(True)
plt.tight_layout()
Using the alignment files, we modified the symbolic files to be synchronous with the corresponding audio recordings. We denote the modified files as aligned and provide the aligned MIDI files in our dataset. The following code cell creates two audio players. The first one present a sonification of the aligned MIDI file, and the second one presents a stereo audio file. In this stereo file, the sonification is one channel, and the audio recording is the other channel.
Note that the aligned symbolic versions only temporally matches the audio recording. Still, there could be a difference in pitch due to a possible transposition. To compensate for that, we use our metadata from the JSON file.
Fs = 8000
cur_mid = pretty_midi.PrettyMIDI(fn_alig_mid)
for instrument in cur_mid.instruments:
for note in instrument.notes:
note.pitch = note.pitch + int(df_metadata.loc['MidiTransposition'])
x_mid = cur_mid.synthesize(fs=Fs, wave=np.sin)
ipd.display(ipd.Audio(data=x_mid, rate=Fs))
x_wav, _ = librosa.load(fn_wav, sr=Fs, mono=True)
x_wav = x_wav / np.abs(x_wav).max()
n_samples = min(x_mid.shape[0], x_wav.shape[0])
x_wav = x_wav[:n_samples]
x_mid = x_mid[:n_samples]
x_stereo = np.stack((x_mid, x_wav), axis=1).T
ipd.display(ipd.Audio(data=x_stereo, rate=Fs))
Furthermore, we also provide aligned CSV files. The following code cell visualizes a piano roll representation of the aligned symbolic version on top of a log-frequency spectrogram. Because the frequency bins in the spectrogram correspond to semitones, the piano roll representation matches with the spectrogram in time and frequency.
Fs = 22050
H = 512
x_wav, _ = librosa.load(fn_wav, sr=Fs, mono=True)
X = librosa.cqt(x_wav, sr=Fs, fmin=librosa.midi_to_hz(0), bins_per_octave=12, n_bins=9*12, hop_length=H)
D = librosa.amplitude_to_db(np.abs(X), ref=np.max)
df = pd.read_csv(fn_alig_csv, sep=';')
df['Pitch'] = df['Pitch'] + int(df_metadata.loc['MidiTransposition'])
fig = plt.figure(figsize=(10, 6))
librosa.display.specshow(D, cmap='gray_r', x_axis='s', sr=Fs, hop_length=H)
plt.colorbar()
plt.yticks(np.arange(0, D.shape[0] + 1, 12))
plt.xlabel('Time (seconds)')
plt.ylabel('Pitch')
plt.ylim(bottom=24)
plot_pianoroll(df, set_lims=False, centric=False, labels=False,
rect_args={'facecolor': 'red', 'edgecolor': 'k', 'alpha': 0.5})