{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "Following Section 8.2.1 of [Müller, FMP, Springer 2015], we introduce in this notebook the notation of instantaneous frequency and show how it can be estimated. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In many music processing tasks, the first step is to convert the audio signal into a [time–frequency representation using an STFT](../C2/C2_STFT-Basic.html), which introduces a linear frequency grid. In the following we introduce a technique referred to as **instantaneous frequency estimation**. The estimates, which are derived by looking at the STFT's phase information, allows for improving the frequency quantization introduced by the STFT. We have seen alternative approaches in the [FMP notebook on frequency grid density](../C2/C2_STFT-FreqGridDensity.html) and in the [FMP notebook on frequency interpolation](../C2/C2_STFT-FreqGridInterpol.html).\n", "\n", "We now summarize some properties of the [discrete STFT](../C2/C2_STFT-Basic.html), while fixing some notation. Let $x$ denote the given music signal sampled at a rate of $F_\\mathrm{s}$ Hertz. Furthermore, let $\\mathcal{X}$ be its STFT using a suitable window function of length $N\\in\\mathbb{N}$ and hop size $H\\in\\mathbb{N}$. Recall that, for the Fourier coefficient $\\mathcal{X}(n,k)$, the frame index $n\\in\\mathbb{Z}$ is associated to the physical time \n", "\n", "\\begin{equation}\n", " T_\\mathrm{coef}(n) := \\frac{n\\cdot H}{F_\\mathrm{s}}\n", "\\end{equation}\n", "\n", "(given in seconds) and the frequency index $k\\in[0:N/2]$ corresponds to the frequency\n", "\n", "\\begin{equation}\n", " F_\\mathrm{coef}(k) := \\frac{k\\cdot F_\\mathrm{s}}{N} \n", "\\end{equation}\n", "\n", "(given in Hertz). In particular, the discrete STFT introduces a linear sampling of the frequency axis with a resolution of $F_\\mathrm{s}/N$ Hz. This resolution may not suffice to accurately capture certain time–frequency patterns (e.\\,g., continuously changing patterns due to vibrato or glissando). Furthermore, because of the logarithmic perception of frequency, the linear sampling of the frequency axis becomes particularly problematic for the low-frequency part of the spectrum. Increasing the frequency resolution by simply increasing the window length $N$ is not a viable solution, since this process decreases the temporal resolution. In the following, we discuss a technique for obtaining an enhanced frequency estimation by exploiting the phase information encoded in the complex-valued STFT." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instantaneous Frequency\n", "\n", "In order to explain this technique, let us start by recalling the main ideas of expressing and measuring frequency. As our prototypical oscillations, we consider complex-valued [**exponential functions**](../C2/C2_ExponentialFunction.html) of the form\n", "\n", "\\begin{equation}\n", "\\mathbf{exp}_{\\omega,\\varphi}:\\mathbb{R}\\to\\mathbb{C}, \\quad \\mathbf{exp}_\\omega(t):= \\mathrm{exp}\\big(2\\pi i(\\omega t - \\varphi)\\big)\n", "\\end{equation}\n", "\n", "for a frequency parameter $\\omega\\in\\mathbb{R}$ (measured in $\\mathrm{Hz}$) and a phase parameter $\\varphi$ (measured in normalized radians with $1$ corresponding to an angle of $360^\\circ$). In the case $\\varphi=0$, we set\n", "\n", "$$\n", "\\mathbf{exp}_{\\omega} := \\mathbf{exp}_{\\omega,0}.\n", "$$\n", "\n", "Uniformly increasing the time parameter $t$, the exponential function describes a **circular motion** around the unit circle. When projected onto the real and imaginary axes, this yields two **sinusoidal motions** (described by a cosine and a sine function). Thinking of the circular motion as a uniformly rotating wheel, the frequency parameter $\\omega$ corresponds to the number of revolutions per unit time (in our case, the duration of one second). In other words, the frequency can be interpreted as the rate of rotation. Based on this interpretation, one can associate a frequency value with a rotating wheel for arbitrary time intervals $[t_1,t_2]$ with $t_1\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |