Search Results for author: Dan Oneata

Found 13 papers, 5 papers with code

Weakly-supervised deepfake localization in diffusion-generated images

1 code implementation8 Nov 2023 Dragos Tantaru, Elisabeta Oneata, Dan Oneata

The remarkable generative capabilities of denoising diffusion models have raised new concerns regarding the authenticity of the images we see every day on the Internet.

DeepFake Detection Denoising +1

Towards generalisable and calibrated synthetic speech detection with self-supervised representations

no code implementations11 Sep 2023 Dan Oneata, Adriana Stan, Octavian Pascu, Elisabeta Oneata, Horia Cucu

Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors.

Synthetic Speech Detection

Visually grounded few-shot word learning in low-resource settings

no code implementations20 Jun 2023 Leanne Nortje, Dan Oneata, Herman Kamper

Our approach involves using the given word-image example pairs to mine new unsupervised word-image training pairs from large collections of unlabelledspeech and images.

Few-Shot Learning

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

no code implementations10 Oct 2022 Kayode Olaleye, Dan Oneata, Herman Kamper

We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria.

Visual Grounding

FlexLip: A Controllable Text-to-Lip System

no code implementations7 Jun 2022 Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu

This modularity enables the easy replacement of each of its components, while also ensuring the fast adaptation to new speaker identities by disentangling or projecting the input features.

Audio Generation Text-to-Video Generation +1

Keyword localisation in untranscribed speech using visually grounded speech models

1 code implementation2 Feb 2022 Kayode Olaleye, Dan Oneata, Herman Kamper

Masked-based localisation gives some of the best reported localisation scores from a VGS model, with an accuracy of 57% when the system knows that a keyword occurs in an utterance and need to predict its location.

Keyword Spotting TAG

Speaker disentanglement in video-to-speech conversion

1 code implementation20 May 2021 Dan Oneata, Adriana Stan, Horia Cucu

The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal.

Disentanglement Speech Synthesis

The Quo Vadis submission at Traffic4cast 2019

1 code implementation27 Oct 2019 Dan Oneata, Cosmin George Alexandru, Marius Stanescu, Octavian Pascu, Alexandru Magan, Adrian Postelnicu, Horia Cucu

We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges.

regression

Kite: Automatic speech recognition for unmanned aerial vehicles

no code implementations2 Jul 2019 Dan Oneata, Horia Cucu

This paper addresses the problem of building a speech recognition system attuned to the control of unmanned aerial vehicles (UAVs).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Efficient Action Localization with Approximately Normalized Fisher Vectors

no code implementations CVPR 2014 Dan Oneata, Jakob Verbeek, Cordelia Schmid

Transformation of the FV by power and L2 normalizations has shown to significantly improve its performance, and led to state-of-the-art results for a range of image and video classification and retrieval tasks.

Action Recognition General Classification +4

Cannot find the paper you are looking for? You can Submit a new open access paper.