Emotion recognition using speech and neural structured learning to facilitate edge intelligence

Uddin, Md Zia; Nilsson, Erik Gøsta

dc.contributor.author	Uddin, Md Zia
dc.contributor.author	Nilsson, Erik Gøsta
dc.date.accessioned	2022-02-04T12:54:21Z
dc.date.available	2022-02-04T12:54:21Z
dc.date.created	2020-09-01T17:30:16Z
dc.date.issued	2020
dc.identifier.issn	0952-1976
dc.identifier.uri	https://hdl.handle.net/11250/2977200
dc.description.abstract	Emotions are quite important in our daily communications and recent years have witnessed a lot of research works to develop reliable emotion recognition systems based on various types data sources such as audio and video. Since there is no apparently visual information of human faces, emotion analysis based on only audio data is a very challenging task. In this work, a novel emotion recognition is proposed based on robust features and machine learning from audio speech. For a person independent emotion recognition system, audio data is used as input to the system from which, Mel Frequency Cepstrum Coefficients (MFCC) are calculated as features. The MFCC features are then followed by discriminant analysis to minimize the inner-class scatterings while maximizing the inter-class scatterings. The robust discriminant features are then applied with an efficient and fast deep learning approach Neural Structured Learning (NSL) for emotion training and recognition. The proposed approach of combining MFCC, discriminant analysis and NSL generated superior recognition rates compared to other traditional approaches such as MFCC-DBN, MFCC-CNN, and MFCC-RNN during the experiments on an emotion dataset of audio speeches. The system can be adopted in smart environments such as homes or clinics to provide affective healthcare. Since NSL is fast and easy to implement, it can be tried on edge devices with limited datasets collected from edge sensors. Hence, we can push the decision-making step towards where data resides rather than conventionally processing of data and making decisions from far away of the data sources. The proposed approach can be applied in different practical applications such as understanding peoples’ emotions in their daily life and stress from the voice of the pilots or air traffic controllers in air traffic management systems.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.subject	Audio	en_US
dc.subject	Emotion	en_US
dc.subject	MFCC	en_US
dc.subject	LDA	en_US
dc.subject	NSL	en_US
dc.title	Emotion recognition using speech and neural structured learning to facilitate edge intelligence	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).	en_US
dc.source.pagenumber	11	en_US
dc.source.volume	94	en_US
dc.source.journal	Engineering Applications of Artificial Intelligence	en_US
dc.source.issue	103775	en_US
dc.identifier.doi	10.1016/j.engappai.2020.103775
dc.identifier.cristin	1826584
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: 1-s2.0-S095219762030172X-main.pdf
Størrelse:: 3.366Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin - SINTEF AS [5801]
SINTEF Digital [2501]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal