Search Results for author: Joakim Gustafson

Found 19 papers, 1 papers with code

Evaluating Sampling-based Filler Insertion with Spontaneous TTS

no code implementations LREC 2022 Siyang Wang, Joakim Gustafson, Éva Székely

Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input.

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

no code implementations11 Jul 2023 Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech.

Self-Supervised Learning Speech Synthesis

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

no code implementations29 May 2023 Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.

Speech Synthesis

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

no code implementations5 Mar 2023 Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2. 0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms.

Self-Supervised Learning

Prosody-controllable spontaneous TTS with neural HMMs

no code implementations24 Nov 2022 Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS.

valid

Integrated Speech and Gesture Synthesis

1 code implementation25 Aug 2021 Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.

Speech Synthesis

Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding

no code implementations LREC 2020 Dimosthenis Kontogiorgos, Elena Sibirtseva, Joakim Gustafson

In this paper, we introduce a multimodal dataset in which subjects are instructing each other how to assemble IKEA furniture.

Machine Learning and Social Robotics for Detecting Early Signs of Dementia

no code implementations5 Sep 2017 Patrik Jonell, Joseph Mendelson, Thomas Storskog, Goran Hagman, Per Ostberg, Iolanda Leite, Taras Kucherenko, Olga Mikheeva, Ulrika Akenine, Vesna Jelic, Alina Solomon, Jonas Beskow, Joakim Gustafson, Miia Kivipelto, Hedvig Kjellstrom

This paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e. g., due to Alzheimer's disease.

BIG-bench Machine Learning

Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives

no code implementations LREC 2016 Jens Edlund, Joakim Gustafson

In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP).

Position

Cannot find the paper you are looking for? You can Submit a new open access paper.