PCP Teaser

Lecture: Content-Based Audio Retrieval

After working through the material of this lecture, you should be able to answer the following questions:

  • What is meant by "content-based retrieval"? What is meant by "query-by-example retrieval"?
  • What is the difference between audio identification, audio matching, and version identification? How are these tasks arranged in the specificity–granularity plane? (See Fig. 7.22.)
  • What are the general requirements for an audio identification system?
  • What is the main idea of the Shazam fingerprinting system? What are the fingerprints used in the system? To which extent are they suited to meet the general requirements?
  • What does the term "constellation map" refer to?
  • How can the matching of constellation maps be accelerated?
  • What is the basic idea of the peak pairing strategy? (See Fig. 7.7.)
  • What is the acceleration when using the peak pairing strategy compared to the original procedure? (See Eq. 7.15)
  • What is the main idea of audio matching? What is the role of the matching function?
  • What is the difference between dynamic time warping (DTW) and subsequence DTW? (See Fig. 7.23.)
  • What is the main idea of version identification?
  • What is the difference between the identification procedure (common subsequence matching) and subsequence DTW? (See Fig. 7.23.)

Reading Assignments

  • Chapter 7, Müller, FMP, Springer 2021
    • Introduction to Chapter 7
  • Section 7.1: Audio Identification
    • Section 7.1.1: General Requirements
    • Section 7.1.2: Audio Fingerprints Based on Spectral Peaks
    • Section 7.1.3: Indexing, Retrieval, Inverted Lists
    • Section 7.1.4: Index-Based Audio Identification
  • Section 7.2: Audio Matching
    • 7.2.3: DTW-Based Matching (only main idea)
  • Section 7.3: Version Identification
    • Section 7.3.2: Identification Procedure (only main idea)
  • Section 7.4.4: Alignment Scenarios



Here is a selection of videos related to tempo and beat.

  • How Shazam Works (10:24)
    Database match; timbre; fundamental frequency; overtone; spectrogram; fingerprint; stand-out frequencies; match; hash function; equal distribution; collision avoidance; calculation time; anchor point
  • Tech Talk: What's that Sound? An Overview of Shazam's Audio Search Algorithm (11:01)
    Guiding principles; fingerprinting; combinatorial hashing; searching
  • Happy Birthday in the Styles of 10 Classical Composers (18:26)
    Bach (0:00); Beethoven (1:17); Schumann (3:34); Chopin (4:51); Liszt (6:34); Debussy (9:18); Satie (12:10); Rachmaninoff (13:30); Cage (14:29); Reich (16:11)
  • 7 Happy Songs in Horror Versions (4:44)
    Twinkle, twinkle little star (0:13); Happy birthday to you (0:37); Jesus bleibt meine Freude (1:23); Jingle bells (1:47); Bach's Toccata and Fuge in D minor (3:09); Ode an die Freude (3:35); Frère Jacques (3:56); Bach's Toccata and Fuge in D minor (4:17)

Question & Answer Session

MPA footer