There are several ways to read and write audio files in Python, using different packages. This notebooks lists some options and discusses advantages as well as disadvantages. For detailed explanations on how to integrate audio files into the notebooks, we refer to the FMP notebook on Multimedia.
One option to read audio is to use LibROSA's function librosa.load
.
librosa.load
resamples the audio to $22050~\mathrm{Hz}$. Setting sr=None
keeps the native sampling rate.librosa.load
is essentially a wrapper that uses either PySoundFile
or audioread
.librosa.load
first tries to use PySoundFile
. This works for many formats, such as WAV, FLAC, and OGG. However, MP3 is not supported. When PySoundFile
fails to read the audio file (e.g., for MP3), a warning is issued, and librosa.load
falls back to another library called audioread
. When ffmpeg
is available, this library can read MP3 files.import os
import numpy as np
from matplotlib import pyplot as plt
import IPython.display as ipd
import librosa
import pandas as pd
%matplotlib inline
def print_plot_play(x, Fs, text=''):
"""1. Prints information about an audio singal, 2. plots the waveform, and 3. Creates player
Notebook: C1/B_PythonAudio.ipynb
Args:
x: Input signal
Fs: Sampling rate of x
text: Text to print
"""
print('%s Fs = %d, x.shape = %s, x.dtype = %s' % (text, Fs, x.shape, x.dtype))
plt.figure(figsize=(8, 2))
plt.plot(x, color='gray')
plt.xlim([0, x.shape[0]])
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')
plt.tight_layout()
plt.show()
ipd.display(ipd.Audio(data=x, rate=Fs))
# Read wav
fn_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
x, Fs = librosa.load(fn_wav, sr=None)
print_plot_play(x=x, Fs=Fs, text='WAV file: ')
# Read mp3
fn_mp3 = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.mp3')
x, Fs = librosa.load(fn_mp3, sr=None)
print_plot_play(x=x, Fs=Fs, text='MP3 file: ')
The audio library PySoundFile
yields functions for reading and writing sound files. In particular, it contains the functions soundfile.read
and soundfile.write
.
dtype
keyword.subtype='PCM_16'
) as default.MP3
-files.import soundfile as sf
# Read wav with default
fn_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
x, Fs = sf.read(fn_wav)
print_plot_play(x=x,Fs=Fs,text='WAV file (default): ')
# Read wav with dtype= 'int16'
fn_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
x, Fs = sf.read(fn_wav, dtype= 'int16')
print_plot_play(x=x,Fs=Fs,text='WAV file (dtype=int16): ')
# Write 'int16'-signal and read with default
fn_out = os.path.join('..', 'output', 'B', 'FMP_B_Note-C4_Piano_int16.wav')
sf.write(fn_out, x, Fs)
x, Fs = sf.read(fn_out)
print_plot_play(x=x,Fs=Fs,text='Signal (int16) after writing and reading (default): ')
# Generate signal
Fs = 8000
x = 0.5 * np.cos(2 * np.pi * 440 * np.arange(0, Fs) / Fs)
x[2000:2200] = 2
print_plot_play(x=x,Fs=Fs,text='Generated signal: ')
# Write signal
# Default: 'PCM_16'
# Equivalent to pre-processing (dithering + quantization)
# x = np.int16(np.round(x*(2**15)))
#
print('Default for writing files:', sf.default_subtype('WAV'))
fn_out = os.path.join('..', 'output', 'B', 'FMP_B_PythonAudio_sine.wav')
sf.write(fn_out, x, Fs, subtype='PCM_16')
# Read generated signal
x, Fs = sf.read(fn_out)
print_plot_play(x=x,Fs=Fs,text='Signal after writing and reading: ')
Scipy offers the scipy.io.wavfile
module, which also has functionalities for reading and writing wav files. However, not all variants of the wav format are support. For example, $24$-bit integer WAV
-files are not allowed. Furthermore, certain metadata fields in a wav file may also lead to errors. Therefore, we do not recommend this option.
from scipy.io import wavfile
Fs, x = wavfile.read(fn_wav)
print_plot_play(x=x,Fs=Fs,text='Signal after writing and reading: ')
libfmp
¶Wrapper functions for reading and writing audio have also been included in libfmp
.
These wrappers reflect our default recommendations to use librosa
for loading audio, and PySoundFile
for writing audio.
In the following code cell, we call these functions:
import sys
sys.path.append('..')
import libfmp.b
fn_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
fn_out = os.path.join('..', 'output', 'B', 'FMP_B_Note-C4_Piano.wav')
x, Fs = libfmp.b.read_audio(fn_wav)
libfmp.b.write_audio(fn_out, x, Fs)
In the FMP notebook on multimedia, we introduced the class
IPython.display.Audio(data=None, filename=None, url=None, embed=None, rate=None, autoplay=False, normalize=True, *, element_id=None)
for audio playback (IPython
version 6.0 or higher), which is frequently used in the FMP notebooks. As default, this class normalizes the audio (dividing by the maximum over all sample values) before playback. This may be unwanted for certain applications, where the volume of the audio should be kept to its original level. To avoid normalization, one has to set the parameter normalize=False
. However, this requires that all samples of the audio lie within the range between $-1$ and $-1$. In the following code cell, we give an illustrative examples for the two options.
Fs = 8000
x = 0.1 * np.cos(2 * np.pi * 440 * np.arange(0, 2 * Fs) / Fs)
plt.figure(figsize=(6, 1.5))
plt.plot(x, color='gray')
plt.xlim([0, x.shape[0]])
plt.ylim([-1, 1])
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')
plt.tight_layout()
plt.show()
print('Audio playback with default settings (normalized audio)')
ipd.display(ipd.Audio(data=x, rate=Fs))
print('Audio playback without normalization (original audio) ')
ipd.display(ipd.Audio(data=x, rate=Fs, normalize=False))
In the following code cell, we provide a function for placing several audio players next to each other. Furthermore, the function allows for adapting the width and the height of the individual players. Note that, when the width of the audio player becomes too small, some playback information may be hidden or the playback bottom may be placed in a drop-down menu.
def audio_player_list(signals, rates, width=270, height=40, columns=None, column_align='center'):
"""Generate a list of HTML audio players tags for a given list of audio signals.
Notebook: B/B_PythonAudio.ipynb
Args:
signals (list): List of audio signals
rates (list): List of sample rates
width (int): Width of player (either number or list) (Default value = 270)
height (int): Height of player (either number or list) (Default value = 40)
columns (list): Column headings (Default value = None)
column_align (str): Left, center, right (Default value = 'center')
"""
pd.set_option('display.max_colwidth', None)
if isinstance(width, int):
width = [width] * len(signals)
if isinstance(height, int):
height = [height] * len(signals)
audio_list = []
for cur_x, cur_Fs, cur_width, cur_height in zip(signals, rates, width, height):
audio_html = ipd.Audio(data=cur_x, rate=cur_Fs)._repr_html_()
audio_html = audio_html.replace('\n', '').strip()
audio_html = audio_html.replace('<audio ', f'<audio style="width: {cur_width}px; height: {cur_height}px" ')
audio_list.append([audio_html])
df = pd.DataFrame(audio_list, index=columns).T
table_html = df.to_html(escape=False, index=False, header=bool(columns))
table_html = table_html.replace('<th>', f'<th style="text-align: {column_align}">')
ipd.display(ipd.HTML(table_html))
fn_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
x, Fs = librosa.load(fn_wav, sr=None)
audio_player_list([x, x, x, x], [Fs, Fs, Fs, Fs], width=120, height=20,
columns=['a', 'b', 'c', 'd'])
audio_player_list([x, x, x], [Fs, Fs, Fs], width=200, height=80,
columns=['a', 'b', 'c'], column_align='left')
audio_player_list([x, x, x, x], [Fs, Fs, Fs, Fs],
width=[40, 80, 150, 300], height=[20, 40, 60, 80],
columns=['a', 'b', 'c', 'd'], column_align='right')