no code implementations • LREC 2022 • Siyang Wang, Joakim Gustafson, Éva Székely
Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input.
no code implementations • 19 Jul 2024 • Dongyang Wu, Siyang Wang, Mehdi Kamal, Massoud Pedram
Additionally, to enhance pattern-matching effectiveness, we introduce a novel approach to augment the layout image using information extracted through Principal Component Analysis (PCA).
no code implementations • 16 May 2024 • Siyang Wang, Éva Székely
Our findings aim to serve as a benchmark for future advancements in generative SLMs for speech synthesis.
no code implementations • 11 Jul 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech.
no code implementations • 15 Jun 2023 • Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.
no code implementations • 29 May 2023 • Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze
Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.
no code implementations • 5 Mar 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2. 0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms.
1 code implementation • 25 Aug 2021 • Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely
Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.
no code implementations • 30 Oct 2020 • Dimitar Popmintchev, Siyang Wang, Xiaoshi Zhang, Tenio Popmintchev
We derive general analytic expressions for the chromatic dispersion orders valid to infinity, due to the k vector or phase {\phi} dependence on the wavelength.
Optics Applied Physics Atomic and Molecular Clusters
no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu
We tackle the problem of modeling sequential visual phenomena.
no code implementations • 6 Jun 2019 • Manjot Bilkhu, Siyang Wang, Tushar Dobhal
Video Captioning and Summarization have become very popular in the recent years due to advancements in Sequence Modelling, with the resurgence of Long-Short Term Memory networks (LSTMs) and introduction of Gated Recurrent Units (GRUs).
no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.