Vis enkel innførsel

dc.contributor.authorGelderblom, Femke B.
dc.contributor.authorTronstad, Tron Vedul
dc.contributor.authorViggen, Erlend Magnus
dc.date.accessioned2019-01-02T09:57:22Z
dc.date.available2019-01-02T09:57:22Z
dc.date.created2018-11-19T11:32:42Z
dc.date.issued2018
dc.identifier.citationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, pp. 13nb_NO
dc.identifier.issn2329-9290
dc.identifier.urihttp://hdl.handle.net/11250/2578746
dc.description.abstractSpeech enhancement systems aim to improve the quality and intelligibility of noisy speech. In this study, we compare two speech enhancement systems based on deep neural networks. The speech intelligibility and quality of both systems was evaluated subjectively, by a Speech Recognition Test based on Hagerman sentences and a translation of the ITU-T P.835 recommendation, respectively. Results were compared with the objective measures STOI and POLQA. Neither STOI nor POLQA reliably predicted subjective results. While STOI anticipated improvement, subjective results for both models showed degradation of speech intelligibility. POLQA results were overall hardly affected, while the subjective results showed significant changes in overall quality, both positive and negative, in many of the tests. One of the systems was trained to remove all noise; a strategy that is common in speech enhancement systems found in the literature. The other system was trained to only reduce the noise such that the signal-to-noise ratio increased with 10 dB. The latter system subjectively outperformed the system that attempted to remove noise completely. From this, we conclude that objective evaluation cannot replace subjective evaluation until a measure that reliably predicts intelligibility and quality for deep neural network based systems has been identified. Results further indicate that it may be beneficial to move away from more aggressive noise removal strategies towards noise reduction strategies that cause less speech distortion.nb_NO
dc.description.abstractSubjective evaluation of a noise-reduced training target for deep neural network-based speech enhancementnb_NO
dc.language.isoengnb_NO
dc.subjectArtificial Neural Networksnb_NO
dc.subjectTalepersespsjonnb_NO
dc.subjectSpeech perceptionnb_NO
dc.titleSubjective evaluation of a noise-reduced training target for deep neural network-based speech enhancementnb_NO
dc.title.alternativeSubjective evaluation of a noise-reduced training target for deep neural network-based speech enhancementnb_NO
dc.typeJournal articlenb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.subject.nsiVDP::Telekommunikasjon: 552nb_NO
dc.subject.nsiVDP::Telecommunication: 552nb_NO
dc.source.pagenumber13nb_NO
dc.source.journalIEEE/ACM Transactions on Audio, Speech, and Language Processingnb_NO
dc.identifier.cristin1632077
dc.relation.projectNorges forskningsråd: 237887nb_NO
cristin.unitcode7401,90,21,0
cristin.unitnameAkustikk
cristin.ispublishedfalse
cristin.fulltextpostprint
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel