Vis enkel innførsel

dc.contributor.authorMamun, Md. Adyelullahil
dc.contributor.authorAbdullah, Hasnat Md.
dc.contributor.authorAlam, Md. Golam Rabiul
dc.contributor.authorHassan, Muhammad Mehedi
dc.contributor.authorUddin, Md Zia
dc.date.accessioned2024-05-29T12:39:14Z
dc.date.available2024-05-29T12:39:14Z
dc.date.created2023-03-17T14:40:44Z
dc.date.issued2023
dc.identifier.citationMultimedia Tools and Applications. 2023, 82, 35059-35090.en_US
dc.identifier.issn1380-7501
dc.identifier.urihttps://hdl.handle.net/11250/3131886
dc.description.abstractHuman conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system’s effectiveness.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleAffective social anthropomorphic intelligent systemen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.rights.holder© The Author(s) 2023en_US
dc.source.pagenumber35059-35090en_US
dc.source.volume82en_US
dc.source.journalMultimedia Tools and Applicationsen_US
dc.identifier.doi10.1007/s11042-023-14597-6
dc.identifier.cristin2134863
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal