This is the accompanying website for the following paper:
@inproceedings{LeschanowskyKRBKFH25_SpeechCodecIntSITool_Interspeech,
author = {Anna Leschanowsky and Kishor Kayyar Lakshminarayana and Anjana Rajasekhar and Lyonel Behringer and Ibrahim Kilinc and Guillaume Fuchs and Emanu{\"e}l A.\ P. Habets},
title = {Benchmarking Neural Speech Codec Intelligibility with {SITool}},
booktitle = {Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH)},
address = {Rotterdam, The Netherlands},
year = {2025},
pages = {5488-5492},
doi = {10.21437/Interspeech.2025-984},
url-pdf = {https://www.isca-archive.org/interspeech_2025/leschanowsky25_interspeech.pdf},
}
Speech intelligibility assessment is essential for evaluating neural speech codecs, yet most evaluation efforts focus on overall quality rather than intelligibility. Only a few publicly available tools exist for conducting standardized intelligibility tests, like the Diagnostic Rhyme Test (DRT) and Modified Rhyme Test (MRT). We introduce the Speech Intelligibility Toolkit for Subjective Evaluation (SITool), a Flask-based web application for conducting DRT and MRT in laboratory and crowdsourcing settings. We use SITool to benchmark 13 neural and traditional speech codecs, analyzing phoneme-level degradations and comparing subjective DRT results with objective intelligibility metrics. Our findings show that, while neural speech codecs can outperform traditional ones in subjective intelligibility, only STOI and ESTOI – not WER – significantly correlate with subjective results, although they struggle to capture gender and wordlist-specific variations observed in subjective evaluations.
The SITool is open sourced at https://github.com/audiolabs/SITool.
| Condition | Peak(M) | Teak(F) | Jaws(M) | Gauze(F) |
|---|---|---|---|---|
| Original | ||||
| Anchor | ||||
| Codec2 700 bps | ||||
| Codec2 2400 bps | ||||
| AMR 4750 bps | ||||
| AMRWB 6600 bps | ||||
| EVS 8000 bps | ||||
| LPCnet 1600 bps | ||||
| Lyra 3200 bps | ||||
| DAC-S 1500 bps | ||||
| SNAC 980 bps | ||||
| Mimi 550 bps | ||||
| Mimi 1100 bps | ||||
| SemantiCodec 340 bps | ||||
| SemantiCodec 680 bps |