Search Results for author: Alexander Richard

Found 27 papers, 13 papers with code

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

no code implementations27 Mar 2024 Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.

Few-Shot Learning Pose Tracking +1

ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

no code implementations22 Jan 2024 Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant.

Generative Adversarial Network

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

1 code implementation3 Jan 2024 Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction.

Quantization

Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

1 code implementation NeurIPS 2023 Xudong Xu, Dejan Markovic, Jacob Sandakly, Todd Keebler, Steven Krenn, Alexander Richard

While 3D human body modeling has received much attention in computer vision, modeling the acoustic equivalent, i. e. modeling 3D spatial audio produced by body motion and speech, has fallen short in the community.

Position

AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

2 code implementations26 May 2023 Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, Alexander Richard

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i. e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i. e.\ encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal.

Novel-View Acoustic Synthesis

no code implementations CVPR 2023 Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

End-to-End Binaural Speech Synthesis

no code implementations8 Jul 2022 Wen Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb.

Speech Synthesis

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

no code implementations30 Jun 2022 Dejan Markovic, Alexandre Defossez, Alexander Richard

We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene.

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

1 code implementation CVPR 2022 Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard

Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts.

Speech Enhancement

LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

no code implementations15 Mar 2022 Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.

Face Model

Conditional Diffusion Probabilistic Model for Speech Enhancement

2 code implementations10 Feb 2022 Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

no code implementations7 Feb 2022 Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem.

Representation Learning

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

2 code implementations ICCV 2021 Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, Yaser Sheikh

To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.

3D Face Animation Disentanglement +1

Neural Synthesis of Binaural Audio

no code implementations ICLR 2021 Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime.

Neural Rendering Position

Audio- and Gaze-driven Facial Animation of Codec Avatars

no code implementations11 Aug 2020 Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

1 code implementation3 Jun 2019 Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field.

Action Recognition General Classification +1

Recurrent Residual Learning for Action Recognition

no code implementations27 Jun 2017 Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition.

Action Recognition Image Classification +1

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

1 code implementation23 Mar 2017 Alexander Richard, Juergen Gall

In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training.

Action Recognition General Classification +2

Weakly supervised learning of actions from transcripts

no code implementations7 Oct 2016 Hilde Kuehne, Alexander Richard, Juergen Gall

Our system is based on the idea that, given a sequence of input data and a transcript, i. e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation.

Weakly-supervised Learning

Temporal Action Detection Using a Statistical Language Model

1 code implementation CVPR 2016 Alexander Richard, Juergen Gall

While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results.

Action Detection Action Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.