no code implementations • 1 Nov 2023 • Adrian Bogdan Stânea, Vlad Striletchi, Cosmin Striletchi, Adriana Stan
Large speech models-derived features have recently shown increased performance over signal-based features across multiple downstream tasks, even when the networks are not finetuned towards the target task.
no code implementations • 11 Sep 2023 • Dan Oneata, Adriana Stan, Octavian Pascu, Elisabeta Oneata, Horia Cucu
Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors.
no code implementations • 19 Jul 2023 • Adriana Stan, Johannah O'Mahony
In this paper we introduce a first attempt on understanding how a non-autoregressive factorised multi-speaker speech synthesis architecture exploits the information present in different speaker embedding sets.
no code implementations • 6 Feb 2023 • Adriana Stan
This means that the embeddings are far from ideal, highly dependent on the training corpus and still include a degree of residual information pertaining to factors such as linguistic content, recording conditions or speaking style of the utterance.
no code implementations • 15 Jun 2022 • Adriana Stan
This paper introduces the ZevoMOS entry to the main track of the VoiceMOS Challenge 2022.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Jun 2022 • Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu
This modularity enables the easy replacement of each of its components, while also ensuring the fast adaptation to new speaker identities by disentangling or projecting the input features.
no code implementations • 3 Jun 2021 • Beata Lorincz, Adriana Stan, Mircea Giurgiu
Building multispeaker neural network-based text-to-speech synthesis systems commonly relies on the availability of large amounts of high quality recordings from each speaker and conditioning the training process on the speaker's identity or on a learned representation of it.
no code implementations • 3 Jun 2021 • Beata Lorincz, Adriana Stan, Mircea Giurgiu
The visualisation of the t-SNE projections of the natural and synthesised speaker embeddings show that the acoustic model shifts some of the speakers' neural representation, but not all of them.
1 code implementation • 20 May 2021 • Dan Oneata, Adriana Stan, Horia Cucu
The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal.
no code implementations • 14 Jan 2021 • Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu
In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Sep 2020 • Adriana Stan
RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications.
no code implementations • LREC 2014 • Tiberiu Boro{\textcommabelow{s}}, Adriana Stan, Oliver Watts, Stefan Daniel Dumitrescu
This paper introduces a recent development of a Romanian Speech corpus to include prosodic annotations of the speech data in the form of ToBI labels.