This is the accompanying website to the article
@article{SchwaerM26_IntonationGradientDescent_TASLPRO,
author = {Simon Schw{\"a}r and Meinard M{\"u}ller},
title = {Multi-Voice Intonation Adaptation via Gradient Descent},
journal = {IEEE Transactions on Audio, Speech and Language Processing ({TASLPRO})},
year = {2026},
volume = {34},
pages = {2491--2503},
doi = {10.1109/TASLPRO.2026.3675812},
url-pdf = {https://ieeexplore.ieee.org/document/11447422},
}
Intonation in ensemble performances on instruments with flexible tuning involves a complex interaction between musicians shaped by musical context, acoustic conditions, and each performer's perception and preferences. In post-production, it is often desirable to compensate for unintended deviations while preserving expressive fluctuations and musically meaningful interaction between individual tracks and voices. In this paper, we formulate multi-voice intonation adaptation as a cost minimization problem, making three main contributions. First, we introduce a differentiable cost function that explicitly balances the adherence of each voice to an equal-tempered pitch grid and the harmonic fit between voices using sensory dissonance. Second, based on this differentiable cost function, we derive a gradient descent adaptation algorithm that produces smooth, time-varying pitch-shift curves without requiring score or note-level information. We show how a small set of interpretable hyperparameters, including initialization, stopping criterion, step size, and momentum, allows for a controlled trade-off between the compensation of unintended intonation deviations and the preservation of expressive fluctuations. Third, we evaluate our method on string, wind, and vocal quartet multi-track recordings through objective and subjective experiments, demonstrating quality comparable to a commercial pitch-correction baseline while offering particular advantages in handling intonation drift and modeling interactions between voices. Beyond these results, the focus of this work is conceptual, making a typically heuristic post-production task transparent and controllable through an explicit cost-based optimization framework.
The synthetic example cadence illustrates various aspects of ensemble intonation and includes a global intonation drift (all voices collectively end the cadence around 50 cents lower), note-level intonation deviations (e.g., the second note in the soprano is slightly higher than the first), and local expressive intonation (e.g., vibrato).
The Figure to the right shows (a) sheet music and chord annotations, (b) the original fundamental frequencies (F0) for each voice (S: soprano, A: alto, T: tenor, B: bass), and (c) the pitch shift curves for each voice obtained using our intonation adaptation method (hyperparameters $w = 0.33$, $L = 1$, $\mu = 50$, $\beta = 0.9$) with the goal to counteract the global drift while retaining local expressive intonation.
You can listen to this example cadence here:
$L \in \{ 1, 10, 100 \}$ with fixed $w = 1$, $\mu = 50$, $\beta = 0$, $p^{(0)}(n) = 0$
$p^{(0)}(n) = 0$ (zero init) vs. $p^{(0)}(n) = p^{(L)}(n-1)$ (prev init) with fixed $w = 1$, $L = 1$, $\mu = 50$, $\beta = 0$
$\mu \in \{ 5, 50, 500 \}$ with fixed $w = 1$, $L = 1$, $\beta = 0$, $p^{(0)}(n) = p^{(L)}(n-1)$
$\beta \in \{ 0, 0.9, 0.99 \}$ with fixed $w = 1$, $L = 1$, $\mu = 50$, $p^{(0)}(n) = p^{(L)}(n-1)$
The following three case studies of multi-voice intonation adaptation demonstrate the flexibility of the cost minimization approach in different scenarios.
12-TET vs. JI-like Intonation in ChoraleBricks
In the ChoraleBricks recording process, instruments were recorded separately, without any real-time interaction between musicians to coordinate intonation. By using multi-voice intonation adaptation, we can simulate such an interaction in post-production and modify the intonation of each individual track so that they can be combined to form a plausible ensemble sound.
Adapting Global Intonation Drift in Dagstuhl ChoirSet
The recording of a vocal quartet singing the motet Locus Iste by Anton Bruckner contains a global intonation drift, where the musicians end the performance around 100 cents lower than they started. Using hyperparameters $w = 1$, $L = 1$, $\mu = 5$, $\beta = 0.9$, we can counteract this drift while preserving the local intonation of the singers. This could be beneficial, for example, in post production, where a sound engineer might want to combine multiple takes to one coherent performance, while still retaining as much of the original performance as possible.
Modifying Global and Local Intonation in Virtuoso Strings
The excerpt from String Quartet Op. 74 No. 1 by Joseph Haydn contains two chords (D major and D major with added 7th). With the expressive intonation in the string ensemble performance (e.g., including notes with and without vibrato), we can compare the effects of a hyperparameter setting that targets only global adaptation towards JI-like intonation ($\mu = 50$, $\beta = 0.9$, $L = 1$) and a setting that enables a stronger local adaptation ($\mu = 50$, $\beta = 0.0$, $L = 100$).
Test Item 1: Synthetic Example Cadence
Test Item 2: Dagstuhl ChoirSet
Test Item 3: Virtuoso Strings
Test Item 4: ChoraleBricks
A Python implementation of this intonation adaptation method is available on GitHub.
@book{Sethares98_sound_BOOK,
author = {William A. Sethares},
title = {Tuning, Timbre, Spectrum, Scale},
year = {1998},
isbn = {1-85233-797-4},
address = {London},
publisher = {Springer},
}
@article{RosenzweigCWSGM20_DCS_TISMIR,
author = {Sebastian Rosenzweig and Helena Cuesta and Christof Wei{\ss} and Frank Scherbaum and Emilia G{\'o}mez and Meinard M{\"u}ller},
title = {{D}agstuhl {ChoirSet}: {A} Multitrack Dataset for {MIR} Research on Choral Singing},
journal = {Transactions of the International Society for Music Information Retrieval ({TISMIR})},
volume = {3},
number = {1},
year = {2020},
pages = {98--110},
publisher = {Ubiquity Press},
doi = {10.5334/tismir.48},
url-pdf = {2020_RosenzweigCWSGM_DagstuhlChoirSet_TISMIR_ePrint.pdf},
url-demo = {https://www.audiolabs-erlangen.de/resources/MIR/2020-DagstuhlChoirSet}
}
@inproceedings{TomczakLL_VirtuosoStrings_ISMIR-LBD,
author = {Maciej Tomczak and Min Susan Li and Massimiliano Di Luca},
title = {{Virtuoso Strings}: A Dataset of String Ensemble Recordings and Onset Annotations for Timing Analysis},
booktitle = {Late-Breaking Demos of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Milano, Italy},
year = {2023}
}
@article{BalkeBM25_ChoraleBricks_TISMIR,
author = {Stefan Balke and Axel Berndt and Meinard M{\"u}ller},
title = {{ChoraleBricks}: A Modular Multi-track Dataset for Wind Music Research},
journal = {Transactions of the International Society for Music Information Retrieval ({TISMIR})},
volume = {8},
number = {1},
pages = {39--54},
year = {2025},
doi = {10.5334/tismir.252},
}