Search Results for author: Alexander Richard

Found 19 papers, 9 papers with code

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

no code implementations30 Jun 2022 Dejan Markovic, Alexandre Defossez, Alexander Richard

We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene.

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

1 code implementation CVPR 2022 Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard

Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts.

Speech Enhancement

LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

no code implementations15 Mar 2022 Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.

Face Model

Conditional Diffusion Probabilistic Model for Speech Enhancement

1 code implementation10 Feb 2022 Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

no code implementations7 Feb 2022 Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem.

Representation Learning

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

1 code implementation ICCV 2021 Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, Yaser Sheikh

To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.

3D Face Animation Disentanglement +1

Neural Synthesis of Binaural Audio

no code implementations ICLR 2021 Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime.

Neural Rendering

Audio- and Gaze-driven Facial Animation of Codec Avatars

no code implementations11 Aug 2020 Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

1 code implementation3 Jun 2019 Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field.

Action Recognition General Classification

Recurrent Residual Learning for Action Recognition

no code implementations27 Jun 2017 Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition.

Action Recognition Computer Vision +1

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

1 code implementation23 Mar 2017 Alexander Richard, Juergen Gall

In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training.

Action Recognition Computer Vision +2

Weakly supervised learning of actions from transcripts

no code implementations7 Oct 2016 Hilde Kuehne, Alexander Richard, Juergen Gall

Our system is based on the idea that, given a sequence of input data and a transcript, i. e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation.

Temporal Action Detection Using a Statistical Language Model

1 code implementation CVPR 2016 Alexander Richard, Juergen Gall

While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results.

Action Detection Action Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.