1 code implementation • 13 Nov 2024 • Ross Cutler, Babak Naderi, Vishak Gopal, Dharmendar Palle
We provide an open source test framework to subjectively measure photorealistic avatar performance in ten dimensions: realism, trust, comfortableness using, comfortableness interacting with, appropriateness for work, creepiness, formality, affinity, resemblance to the person, and emotion accuracy.
1 code implementation • 29 Oct 2024 • Yaran Fan, Jamie Pool, Senja Filipi, Ross Cutler
Workplace meetings are vital to organizational collaboration, yet a large percentage of meetings are rated as ineffective.
1 code implementation • 25 Jan 2024 • Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets
The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.
no code implementations • 23 Sep 2023 • Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler
The cloned policy can then be seamlessly tailored to end user network conditions through online finetuning.
1 code implementation • 22 Sep 2023 • Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner
This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as well as including a full-band version of AECMOS.
no code implementations • 15 Sep 2023 • Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka
We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing.
2 code implementations • 14 Sep 2023 • Babak Naderi, Ross Cutler, Nicolae-Catalin Ristea
The commonly used standard ITU-T Rec.
1 code implementation • 14 Sep 2023 • Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi, Henrik Turbell, Albert Sadovnikov, Quan Zhou
We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing.
no code implementations • 2 Sep 2023 • Abrar Majeedi, Babak Naderi, Yasaman Hosseinkashi, Juhee Cho, Ruben Alvarez Martinez, Ross Cutler
We also propose a new full reference video quality assessment (FRVQA) model that achieves a Pearson Correlation Coefficient (PCC) of 0. 99 and a Spearman's Rank Correlation Coefficient (SRCC) of 0. 99 at the model level.
no code implementations • 5 Jun 2023 • Evgenii Indenbom, Nicolae-Catalin Ristea, Ando Saabas, Tanel Parnamaa, Jegor Guzvin, Ross Cutler
Acoustic echo cancellation (AEC), noise suppression (NS) and dereverberation (DR) are an integral part of modern full-duplex communication systems.
no code implementations • 2 Apr 2023 • Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler
In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a `failed interruption').
1 code implementation • 22 Mar 2023 • Gabriel Mittag, Babak Naderi, Vishak Gopal, Ross Cutler
Using these features together with VMAF core features, our proposed model achieves a PCC of 0. 99 on the validation set.
no code implementations • 12 Mar 2023 • Ross Cutler, Ando Saabas, Babak Naderi, Nicolae-Cătălin Ristea, Sebastian Braun, Solomiya Branets
The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.
no code implementations • 24 Oct 2022 • Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler
Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.
1 code implementation • 14 Apr 2022 • Babak Naderi, Ross Cutler
P. 910 is slow, expensive, and requires a lab, which all create barriers to usage.
1 code implementation • 27 Feb 2022 • Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner
This is the third AEC challenge and it is enhanced by including mobile scenarios, adding speech recognition rate in the challenge goal metrics, and making the default sample rate 48 kHz.
1 code implementation • 27 Feb 2022 • Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner
We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P. 835 to rate and rank-order the challenge entries.
no code implementations • 8 Oct 2021 • Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler
We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model.
no code implementations • 8 Oct 2021 • Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner
With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo.
1 code implementation • 8 Oct 2021 • Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant Gupchup, Ross Cutler
Noise suppression models running in production environments are commonly trained on publicly available datasets.
1 code implementation • 6 Oct 2021 • Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler
Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests.
1 code implementation • 5 Oct 2021 • Chandan K A Reddy, Vishak Gopal, Ross Cutler
In this work, we train an objective metric based on P. 835 human ratings that outputs 3 scores: i) speech quality (SIG), ii) background noise quality (BAK), and iii) the overall quality (OVRL) of the audio.
2 code implementations • 6 Jan 2021 • Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.
no code implementations • 23 Nov 2020 • Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler, Johannes Gehrke
Large software systems tune hundreds of 'constants' to optimize their runtime performance.
no code implementations • 28 Oct 2020 • Chandan K A Reddy, Vishak Gopal, Ross Cutler
The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community.
1 code implementation • 25 Oct 2020 • Babak Naderi, Ross Cutler
The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec.
1 code implementation • 25 Oct 2020 • Ross Cutler, Babak Naderi, Markus Loide, Sten Sootla, Ando Saabas
The quality of acoustic echo cancellers (AECs) in real-time communication systems is typically evaluated using objective metrics like ERLE and PESQ, and less commonly with lab-based subjective tests like ITU-T Rec.
1 code implementation • 10 Sep 2020 • Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan
In this challenge, we open source two large datasets to train AEC models under both single talk and double talk scenarios.
Acoustic echo cancellation
Audio and Speech Processing
Sound
1 code implementation • 23 Jun 2020 • Jamie Pool, Ebrahim Beyrami, Vishak Gopal, Ashkan Aazami, Jayant Gupchup, Jeff Rowland, Binlong Li, Pritesh Kanani, Ross Cutler, Johannes Gehrke
Web-scale applications can ship code on a daily to weekly cadence.
1 code implementation • 17 May 2020 • Babak Naderi, Ross Cutler
We provide an open-source implementation of the ITU-T Rec.
1 code implementation • 16 May 2020 • Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
In this challenge, we open-sourced a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.
no code implementations • 10 Feb 2020 • Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle
Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video.
1 code implementation • 23 Jan 2020 • Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
In this challenge, we open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.
no code implementations • 4 Dec 2019 • Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke
Bandwidth estimation and congestion control for real-time communications (i. e., audio and video conferencing) remains a difficult problem, despite many years of research.
no code implementations • 17 Sep 2019 • Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke
Our subjective MOS evaluation is the first large scale evaluation of Speech Enhancement algorithms that we are aware of.
no code implementations • 3 Jul 2019 • Chandan K. A. Reddy, Ross Cutler, Johannes Gehrke
The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset.
no code implementations • 19 Aug 2018 • Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler
Based on 900, 000 calls gathered using a randomized controlled experiment from a live system, we find that the order bias can be significantly reduced by randomizing the display order of tokens.