no code implementations • 9 Feb 2024 • Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger
We argue that this binary distinction is oversimplified.
no code implementations • 8 Jan 2024 • Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das
Most recent speech privacy efforts have focused on anonymizing acoustic speaker attributes but there has not been as much research into protecting information from speech content.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 30 Oct 2023 • Nicolas M. Müller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin Böttinger
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability.
no code implementations • 21 Jan 2023 • Jennifer Williams, Karla Pizzi, Shuvayanti Das, Paul-Gauthier Noe
Privacy in speech and audio has many facets.
no code implementations • 24 Nov 2022 • Nicolas M. Müller, Jochen Jacobs, Jennifer Williams, Konstantin Böttinger
This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand.
no code implementations • 28 Mar 2022 • Nicolas M. Müller, Franziska Dieckmann, Jennifer Williams
This is despite the fact that attribution (who created which fake?)
no code implementations • 28 Mar 2022 • Shuvayanti Das, Jennifer Williams, Catherine Lai
We found that speech synthesis quality degrades after increasing the number of language switches within an utterance and decreasing the number of words.
1 code implementation • 21 Feb 2022 • Mariya Toneva, Jennifer Williams, Anand Bollu, Christoph Dann, Leila Wehbe
It is then natural to ask: "Is the activity in these different brain zones caused by the stimulus properties in the same way?"
1 code implementation • 11 Dec 2021 • Jennifer Williams, Leila Wehbe
We hypothesize that individual differences in how information is encoded in the brain are task-specific and predict different behavior measures.
no code implementations • 13 Oct 2021 • Jennifer Williams, Junichi Yamagishi, Paul-Gauthier Noe, Cassia Valentini Botinhao, Jean-Francois Bonastre
In this paper, we discuss an important aspect of speech privacy: protecting spoken content.
no code implementations • 20 Jul 2021 • Nicolas M. Müller, Karla Pizzi, Jennifer Williams
The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research.
1 code implementation • 4 May 2021 • Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi
This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data.
1 code implementation • 21 Oct 2020 • Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi
Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions.
2 code implementations • 28 Feb 2020 • Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King
Our NN predicts MOS with a high correlation to human judgments.
no code implementations • 23 Sep 2019 • Jennifer Williams, Joanna Rownicka
Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features.
no code implementations • WS 2018 • Jennifer Williams, Steven Kleinegesse, Ramona Comanescu, Oana Radu
We present our system description of input-level multimodal fusion of audio, video, and text for recognition of emotions and their intensities for the 2018 First Grand Challenge on Computational Modeling of Human Multimodal Language.
no code implementations • WS 2018 • Jennifer Williams, Ramona Comanescu, Oana Radu, Leimin Tian
Our work also improves feature selection for unimodal sentiment analysis, while proposing a novel and effective multimodal fusion architecture for this task.
no code implementations • WS 2017 • Jennifer Williams, Charlie Dagli
We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID).
no code implementations • LREC 2012 • Jennifer Williams, Graham Katz
We describe in-progress work on the creation of a new lexical resource that contains a list of 486 verbs annotated with quantified temporal durations for the events that they describe.