Author: Thomas Sporer
Co-Author: Sascha Dick
Many of the audio excerpts used in this package are taken from EBU's legendary Sound Quality Assessment Material (SQAM) CD, for details on the content see EBU TECH 3253
Specifically the following excerpts were used (track / index numbers as per EBU TECH 3253):
We extend our sincere gratitude to EBU for providing this high-quality audio material.
This section provides insight on the subjective quality of some of the provided items when assessed by expert listeners according to the standardized MUSHRA listening test methodology. This informs unexperienced listeners about the outcome of a professionally conducted listening test for these items.
For the characterization of most of the audio samples, listening tests were conducted in 2001. The test method was an early version of the ITU-R Recommendation ITU-R BS.1534, nick-named MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors). Since 2001, MUSHRA has been improved in several aspects. Today, MUSHRA is the most frequently used scheme for characterization of audio quality.
The target of MUSHRA is to perform both a comparative ranking of the quality of different systems and to obtain absolute numbers. A scale ranging from 0 to 100 is used. Five intervals on that scale are labeled "bad", "poor", "fair", "good" and "excellent". The test subjects are given an open (known) reference: This reference defines the expected maximum quality. Pre-defined anchor conditions are used to stabilize the scale.
Several instances produced from the same original audio (the reference) item are presented to a listener in random order. Among these instances there are the predefined conditions "low pass at 7 kHz" and "low pass at 3.5 kHz" as hidden anchors, and the unprocessed original as hidden reference. The test subjects are allowed to switch between all instances and the known unprocessed original (open reference) in any order and as long as they desire. The listeners are asked to compare each instance to the open reference and to each other, and to rank them using the 5 interval quality scale (see Figure 1). Note that the hidden reference is scored by the subjects, too. The subjects are told to score the instance which they belief to be the open reference at top of the scale (=100). However, if there are one or more other instances which are very similar to the unprocessed original, the listeners might choose to give the 100 to the wrong item and a lower score to the hidden reference. Therefore the average score obtained for the hidden reference can drop below 100. Today the ability to detect the hidden reference is used to check the reliability of the subjective scores of a listener: If the listener grades the hidden reference down to less than 90, the data of this listener is deleted from further statistical analysis. This process is called post-screening.
The test panel consisted of 9 subjects, 5 female and 4 male, aged 21 to 39 years (average 26). The audio items
were played back from a PC using a digital I/O card with external DA converters and using STAX Lambda Pro headphones.
Note: The original listening tests for the AES CD have been conducted with only 9 listeners.
In 2001 the post-screening process was not yet defined. Revisiting the results we found that the data of
several of the listeners had to be rejected with today's criteria, which produces the results listed below.
This section lists available scores with the associated audio material where available. The links "detailed results" shows plots of the results of the statistical analysis with today's post screening criteria. In the text, the results have been rounded to full numbers.
Note: additional conditions in listening test
Note: score for the reference is context dependent!
Note: overlapping confidence intervals!
Note: generation #9 not evaluated in the listening test
Note 1: additional items in listening test
Note 2: 16kHz used as reference, 8kHz used as hidden anchors, no additional hidden anchor used.
[1] ITU-R SG6: Recommendation ITU-R BS.1534-3: Method for the subjective assessment of intermediate quality of audio systems. 10/2015