Search Results for author: Alexander Richard

Found 27 papers, 13 papers with code

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

no code implementations • 27 Mar 2024 • Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.

Few-Shot Learning Pose Tracking +1

Paper
Add Code

ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

no code implementations • 22 Jan 2024 • Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant.

Generative Adversarial Network

Paper
Add Code

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

1 code implementation • 3 Jan 2024 • Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction.

Quantization

2,488

Paper
Code

Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

1 code implementation • NeurIPS 2023 • Xudong Xu, Dejan Markovic, Jacob Sandakly, Todd Keebler, Steven Krenn, Alexander Richard

While 3D human body modeling has received much attention in computer vision, modeling the acoustic equivalent, i. e. modeling 3D spatial audio produced by body motion and speech, has fallen short in the community.

Position

Paper
Code

AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

2 code implementations • 26 May 2023 • Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, Alexander Richard

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i. e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i. e.\ encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal.

332

Paper
Code

Novel-View Acoustic Synthesis

no code implementations • CVPR 2023 • Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

Paper
Add Code

Multiface: A Dataset for Neural Face Rendering

1 code implementation • 22 Jul 2022 • Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart, Autumn Trimble, Xinshuo Weng, David Whitewolf, Chenglei Wu, Shoou-I Yu, Yaser Sheikh

Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions.

Novel View Synthesis

702

Paper
Code

End-to-End Binaural Speech Synthesis

no code implementations • 8 Jul 2022 • Wen Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb.

Speech Synthesis

Paper
Add Code

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

no code implementations • 30 Jun 2022 • Dejan Markovic, Alexandre Defossez, Alexander Richard

We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene.

Paper
Add Code

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

1 code implementation • CVPR 2022 • Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard

Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts.

Speech Enhancement

Paper
Code

LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

no code implementations • 15 Mar 2022 • Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.

Face Model

Paper
Add Code

Conditional Diffusion Probabilistic Model for Speech Enhancement

2 code implementations • 10 Feb 2022 • Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

187

Paper
Code

Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

no code implementations • 7 Feb 2022 • Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem.

Representation Learning

Paper
Add Code

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

2 code implementations • ICCV 2021 • Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, Yaser Sheikh

To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.

Ranked #2 on 3D Face Animation on VOCASET

3D Face Animation Disentanglement +1

702

Paper
Code

Neural Synthesis of Binaural Audio

no code implementations • ICLR 2021 • Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime.

Neural Rendering Position

Paper
Add Code

Audio- and Gaze-driven Facial Animation of Codec Avatars

no code implementations • 11 Aug 2020 • Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.

Paper
Add Code

On Evaluating Weakly Supervised Action Segmentation Methods

no code implementations • 19 May 2020 • Yaser Souri, Alexander Richard, Luca Minciullo, Juergen Gall

Action segmentation is the task of temporally segmenting every frame of an untrimmed video.

Action Segmentation Segmentation

Paper
Add Code

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

1 code implementation • 3 Jun 2019 • Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field.

Action Recognition General Classification +1

Paper
Code

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation

no code implementations • 3 Jun 2019 • Hilde Kuehne, Alexander Richard, Juergen Gall

Action recognition has become a rapidly developing research field within the last decade.

Action Recognition Action Segmentation +1

Paper
Add Code

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

no code implementations • CVPR 2018 • Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall

Video learning is an important task in computer vision and has experienced increasing interest over the recent years.

Ranked #6 on Weakly Supervised Action Segmentation (Transcript) on Breakfast

Incremental Learning Segmentation +3

Paper
Add Code

When will you do what? - Anticipating Temporal Occurrences of Activities

1 code implementation • CVPR 2018 • Yazan Abu Farha, Alexander Richard, Juergen Gall

Analyzing human actions in videos has gained increased attention recently.

Paper
Code

Recurrent Residual Learning for Action Recognition

no code implementations • 27 Jun 2017 • Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition.

Action Recognition Image Classification +1

Paper
Add Code

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

1 code implementation • CVPR 2018 • Alexander Richard, Hilde Kuehne, Juergen Gall

Action detection and temporal segmentation of actions in videos are topics of increasing interest.

Action Detection Action Segmentation

Paper
Code

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

1 code implementation • CVPR 2017 • Alexander Richard, Hilde Kuehne, Juergen Gall

We present an approach for weakly supervised learning of human actions.

Action Segmentation Weakly-supervised Learning

Paper
Code

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

1 code implementation • 23 Mar 2017 • Alexander Richard, Juergen Gall

In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training.

Action Recognition General Classification +2

Paper
Code

Weakly supervised learning of actions from transcripts

no code implementations • 7 Oct 2016 • Hilde Kuehne, Alexander Richard, Juergen Gall

Our system is based on the idea that, given a sequence of input data and a transcript, i. e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation.

Weakly-supervised Learning

Paper
Add Code

Temporal Action Detection Using a Statistical Language Model

1 code implementation • CVPR 2016 • Alexander Richard, Juergen Gall

While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results.

Action Detection Action Recognition +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.