Search Results for author: Ross Cutler

Found 35 papers, 17 papers with code

ICASSP 2024 Speech Signal Improvement Challenge

no code implementations • 25 Jan 2024 • Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets

The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.

Paper
Add Code

Real-time Bandwidth Estimation from Offline Expert Demonstrations

no code implementations • 23 Sep 2023 • Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler

In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems; however, in contrast to previous works, we leverage the vast efforts of prior heuristic-based BWE methods and synergize these approaches with deep learning-based techniques.

Paper
Add Code

ICASSP 2023 Acoustic Echo Cancellation Challenge

1 code implementation • 22 Sep 2023 • Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner

This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as well as including a full-band version of AECMOS.

Acoustic echo cancellation Speech Enhancement

349

Paper
Code

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

no code implementations • 15 Sep 2023 • Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka

We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing.

Edge-computing

Paper
Add Code

VCD: A Video Conferencing Dataset for Video Compression

1 code implementation • 14 Sep 2023 • Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi, Henrik Turbell, Albert Sadovnikov, Quan Zhou

We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing.

Video Compression

Paper
Code

Multi-dimensional Speech Quality Assessment in Crowdsourcing

2 code implementations • 14 Sep 2023 • Babak Naderi, Ross Cutler, Nicolae-Catalin Ristea

The commonly used standard ITU-T Rec.

Speech Enhancement

195

Paper
Code

Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs

no code implementations • 2 Sep 2023 • Abrar Majeedi, Babak Naderi, Yasaman Hosseinkashi, Juhee Cho, Ruben Alvarez Martinez, Ross Cutler

We also propose a new full reference video quality assessment (FRVQA) model that achieves a Pearson Correlation Coefficient (PCC) of 0. 99 and a Spearman's Rank Correlation Coefficient (SRCC) of 0. 99 at the model level.

Video Quality Assessment

Paper
Add Code

DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation

no code implementations • 5 Jun 2023 • Evgenii Indenbom, Nicolae-Catalin Ristea, Ando Saabas, Tanel Parnamaa, Jegor Guzvin, Ross Cutler

Acoustic echo cancellation (AEC), noise suppression (NS) and dereverberation (DR) are an integral part of modern full-duplex communication systems.

Acoustic echo cancellation

Paper
Add Code

Improving Meeting Inclusiveness using Speech Interruption Analysis

no code implementations • 2 Apr 2023 • Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler

In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a `failed interruption').

Paper
Add Code

LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls

1 code implementation • 22 Mar 2023 • Gabriel Mittag, Babak Naderi, Vishak Gopal, Ross Cutler

Using these features together with VMAF core features, our proposed model achieves a PCC of 0. 99 on the validation set.

Paper
Code

ICASSP 2023 Speech Signal Improvement Challenge

no code implementations • 12 Mar 2023 • Ross Cutler, Ando Saabas, Babak Naderi, Nicolae-Cătălin Ristea, Sebastian Braun, Solomiya Branets

The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.

Paper
Add Code

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

no code implementations • 24 Oct 2022 • Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

Paper
Add Code

A crowdsourcing approach to video quality assessment

1 code implementation • 14 Apr 2022 • Babak Naderi, Ross Cutler

P. 910 is slow, expensive, and requires a lab, which all create barriers to usage.

Video Quality Assessment

Paper
Code

ICASSP 2022 Acoustic Echo Cancellation Challenge

1 code implementation • 27 Feb 2022 • Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner

This is the third AEC challenge and it is enhanced by including mobile scenarios, adding speech recognition rate in the challenge goal metrics, and making the default sample rate 48 kHz.

Acoustic echo cancellation Speech Enhancement +2

349

Paper
Code

ICASSP 2022 Deep Noise Suppression Challenge

1 code implementation • 27 Feb 2022 • Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner

We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P. 835 to rate and rank-order the challenge entries.

994

Paper
Code

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

no code implementations • 8 Oct 2021 • Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo.

Paper
Add Code

Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement

1 code implementation • 8 Oct 2021 • Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant Gupchup, Ross Cutler

Noise suppression models running in production environments are commonly trained on publicly available datasets.

Privacy Preserving Speech Enhancement

Paper
Code

Performance optimizations on deep noise suppression models

no code implementations • 8 Oct 2021 • Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler

We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model.

Paper
Add Code

AECMOS: A speech quality assessment metric for echo impairment

1 code implementation • 6 Oct 2021 • Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler

Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests.

Paper
Code

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

no code implementations • 5 Oct 2021 • Chandan K A Reddy, Vishak Gopal, Ross Cutler

In this work, we train an objective metric based on P. 835 human ratings that outputs 3 scores: i) speech quality (SIG), ii) background noise quality (BAK), and iii) the overall quality (OVRL) of the audio.

Paper
Add Code

Interspeech 2021 Deep Noise Suppression Challenge

2 code implementations • 6 Jan 2021 • Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.

Denoising

994

Paper
Code

Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

no code implementations • 23 Nov 2020 • Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler, Johannes Gehrke

Large software systems tune hundreds of 'constants' to optimize their runtime performance.

Friction Multi-Armed Bandits

Paper
Add Code

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

no code implementations • 28 Oct 2020 • Chandan K A Reddy, Vishak Gopal, Ross Cutler

The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community.

Paper
Add Code

Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing

1 code implementation • 25 Oct 2020 • Babak Naderi, Ross Cutler

The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec.

195

Paper
Code

Crowdsourcing approach for subjective evaluation of echo impairment

1 code implementation • 25 Oct 2020 • Ross Cutler, Babak Naderi, Markus Loide, Sten Sootla, Ando Saabas

The quality of acoustic echo cancellers (AECs) in real-time communication systems is typically evaluated using objective metrics like ERLE and PESQ, and less commonly with lab-based subjective tests like ITU-T Rec.

195

Paper
Code

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets and Testing Framework

1 code implementation • 10 Sep 2020 • Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

In this challenge, we open source two large datasets to train AEC models under both single talk and double talk scenarios.

Acoustic echo cancellation Audio and Speech Processing Sound

349

Paper
Code

Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications

1 code implementation • 23 Jun 2020 • Jamie Pool, Ebrahim Beyrami, Vishak Gopal, Ashkan Aazami, Jayant Gupchup, Jeff Rowland, Binlong Li, Pritesh Kanani, Ross Cutler, Johannes Gehrke

Web-scale applications can ship code on a daily to weekly cadence.

119

Paper
Code

An Open source Implementation of ITU-T Recommendation P.808 with Validation

1 code implementation • 17 May 2020 • Babak Naderi, Ross Cutler

We provide an open-source implementation of the ITU-T Rec.

195

Paper
Code

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

1 code implementation • 16 May 2020 • Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

In this challenge, we open-sourced a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.

Speech Enhancement

994

Paper
Code

Multimodal active speaker detection and virtual cinematography for video conferencing

no code implementations • 10 Feb 2020 • Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video.

4k BIG-bench Machine Learning

Paper
Add Code

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

1 code implementation • 23 Jan 2020 • Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

In this challenge, we open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.

Speech Enhancement

994

Paper
Code

Reinforcement learning for bandwidth estimation and congestion control in real-time communications

no code implementations • 4 Dec 2019 • Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke

Bandwidth estimation and congestion control for real-time communications (i. e., audio and video conferencing) remains a difficult problem, despite many years of research.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A scalable noisy speech dataset and online subjective test framework

no code implementations • 17 Sep 2019 • Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

Our subjective MOS evaluation is the first large scale evaluation of Speech Enhancement algorithms that we are aware of.

Speech Enhancement

Paper
Add Code

Supervised Classifiers for Audio Impairments with Noisy Labels

no code implementations • 3 Jul 2019 • Chandan K. A. Reddy, Ross Cutler, Johannes Gehrke

The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset.

Paper
Add Code

On Design of Problem Token Questions in Quality of Experience Surveys

no code implementations • 19 Aug 2018 • Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler

Based on 900, 000 calls gathered using a randomized controlled experiment from a live system, we find that the order bias can be significantly reduced by randomizing the display order of tokens.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.