Search Results for author: Siyang Wang

Found 12 papers, 1 papers with code

Evaluating Sampling-based Filler Insertion with Spontaneous TTS

no code implementations LREC 2022 Siyang Wang, Joakim Gustafson, Éva Székely

Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input.

Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation

no code implementations19 Jul 2024 Dongyang Wu, Siyang Wang, Mehdi Kamal, Massoud Pedram

Additionally, to enhance pattern-matching effectiveness, we introduce a novel approach to augment the layout image using information extracted through Principal Component Analysis (PCA).

object-detection Object Detection

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

no code implementations11 Jul 2023 Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech.

Self-Supervised Learning Speech Synthesis

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

no code implementations15 Jun 2023 Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.

Denoising Speech Synthesis

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

no code implementations29 May 2023 Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.

Speech Synthesis

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

no code implementations5 Mar 2023 Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2. 0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms.

Self-Supervised Learning

Integrated Speech and Gesture Synthesis

1 code implementation25 Aug 2021 Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.

Speech Synthesis

Theory of the Chromatic Dispersion, Revisited

no code implementations30 Oct 2020 Dimitar Popmintchev, Siyang Wang, Xiaoshi Zhang, Tenio Popmintchev

We derive general analytic expressions for the chromatic dispersion orders valid to infinity, due to the k vector or phase {\phi} dependence on the wavelength.

Optics Applied Physics Atomic and Molecular Clusters

Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers

no code implementations6 Jun 2019 Manjot Bilkhu, Siyang Wang, Tushar Dobhal

Video Captioning and Summarization have become very popular in the recent years due to advancements in Sequence Modelling, with the resurgence of Long-Short Term Memory networks (LSTMs) and introduction of Gated Recurrent Units (GRUs).

Dense Video Captioning Dimensionality Reduction +1

Controllable Top-down Feature Transformer

no code implementations6 Dec 2017 Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.

Data Augmentation Style Transfer

Cannot find the paper you are looking for? You can Submit a new open access paper.