Towards Differentiable Piano Synthesis based on Physical Modeling

Hans-Ulrich Berendes, Simon Schwär, Maximilian Schäfer, Meinard Müller

Abstract

We explore the concept of combining physical modeling of the piano with deep learning using methods from differentiable digital signal processing. The core of our proposed approach is a modal synthesis model for the piano string, which is combined with a linear filter to approx- imate the acoustic properties of a grand piano. In a preliminary experiment, we train a neural network to estimate an excitation signal for a string in an autoencoder setting and show that the system can match the spectral content of a given target note. Our differentiable piano model could be utilized in a multitude of music processing tasks, including sound matching, signal enhancement, or source separation.

Audio Examples

Deterministic Signal-based Excitation Model

For a perceptual evaluation of the piano model itself (without any deep learning involved at all) we use a simple signal-based excitation model based on low-pass filtering a single impulse. The cutoff frequency of the low-pass filter determines the attack of the resulting tone.

Excitation Learning Experiment

The goal of the excitation learning experiment is not to synthesize high-quality piano, but rather to understand the training process when using a neural network as excitation model. In the following we show three audio examples of the test set, comparing the reference signal and the learned version, i.e., the signal synthesized with the piano model but using the learned excitation signal of the respective reference signal.

Measurement Details of the Soundboard Impulse Responses

As part of this work. we measured impulse responses of the soundboard of a Yamaha C3 grand piano. The goal of our measurements is not to do a complete analysis of the soundboard vibration, but to provide a perceptually relevant impulse response for the soundboard filter block in our model. We use a contact speaker to excite the soundboard, which we attach to the lower side of the wooden board using double-sided adhesive tape (see pictures below). The excitation signal was a 10-second-long chirp signal with exponentially increasing frequency, ranging from 20 Hz to 20 kHz. We used a stereo pair of condenser microphones, positioned above the piano with the lid opened, to record the resulting sound in the air. The piano was in its normal state during the measurements (i.e., strings attached and under tension). There is also a small contribution of the impulse response from the room the piano was measured in, since it was not anechoic. We measured the impulse responses at four different excitation points of the soundboard, although our piano model in its current state only uses a single impulse response.

Acknowledgments

The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. This project is supported by the German Research Foundation (MU 2686/10-2 and MU 2686/13-2). .