Search Results for author: Ross Cutler

Found 35 papers, 15 papers with code

ICASSP 2024 Speech Signal Improvement Challenge

no code implementations25 Jan 2024 Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets

The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.

Real-time Bandwidth Estimation from Offline Expert Demonstrations

no code implementations23 Sep 2023 Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler

In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems; however, in contrast to previous works, we leverage the vast efforts of prior heuristic-based BWE methods and synergize these approaches with deep learning-based techniques.

ICASSP 2023 Acoustic Echo Cancellation Challenge

1 code implementation22 Sep 2023 Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner

This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as well as including a full-band version of AECMOS.

Acoustic echo cancellation Speech Enhancement

VCD: A Video Conferencing Dataset for Video Compression

no code implementations14 Sep 2023 Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi, Henrik Turbell, Albert Sadovnikov, Quan Zhou

We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing.

Video Compression

Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs

no code implementations2 Sep 2023 Abrar Majeedi, Babak Naderi, Yasaman Hosseinkashi, Juhee Cho, Ruben Alvarez Martinez, Ross Cutler

We also propose a new full reference video quality assessment (FRVQA) model that achieves a Pearson Correlation Coefficient (PCC) of 0. 99 and a Spearman's Rank Correlation Coefficient (SRCC) of 0. 99 at the model level.

Video Quality Assessment

Improving Meeting Inclusiveness using Speech Interruption Analysis

no code implementations2 Apr 2023 Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler

In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a `failed interruption').

LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls

1 code implementation22 Mar 2023 Gabriel Mittag, Babak Naderi, Vishak Gopal, Ross Cutler

Using these features together with VMAF core features, our proposed model achieves a PCC of 0. 99 on the validation set.

ICASSP 2023 Speech Signal Improvement Challenge

no code implementations12 Mar 2023 Ross Cutler, Ando Saabas, Babak Naderi, Nicolae-Cătălin Ristea, Sebastian Braun, Solomiya Branets

The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

no code implementations24 Oct 2022 Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

A crowdsourcing approach to video quality assessment

1 code implementation14 Apr 2022 Babak Naderi, Ross Cutler

P. 910 is slow, expensive, and requires a lab, which all create barriers to usage.

Video Quality Assessment

ICASSP 2022 Acoustic Echo Cancellation Challenge

1 code implementation27 Feb 2022 Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner

This is the third AEC challenge and it is enhanced by including mobile scenarios, adding speech recognition rate in the challenge goal metrics, and making the default sample rate 48 kHz.

Acoustic echo cancellation Speech Enhancement +2

ICASSP 2022 Deep Noise Suppression Challenge

1 code implementation27 Feb 2022 Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner

We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P. 835 to rate and rank-order the challenge entries.

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

no code implementations8 Oct 2021 Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo.

Performance optimizations on deep noise suppression models

no code implementations8 Oct 2021 Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler

We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model.

AECMOS: A speech quality assessment metric for echo impairment

no code implementations6 Oct 2021 Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler

Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests.

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

no code implementations5 Oct 2021 Chandan K A Reddy, Vishak Gopal, Ross Cutler

In this work, we train an objective metric based on P. 835 human ratings that outputs 3 scores: i) speech quality (SIG), ii) background noise quality (BAK), and iii) the overall quality (OVRL) of the audio.

Interspeech 2021 Deep Noise Suppression Challenge

2 code implementations6 Jan 2021 Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.

Denoising

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

no code implementations28 Oct 2020 Chandan K A Reddy, Vishak Gopal, Ross Cutler

The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community.

Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing

1 code implementation25 Oct 2020 Babak Naderi, Ross Cutler

The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec.

Crowdsourcing approach for subjective evaluation of echo impairment

1 code implementation25 Oct 2020 Ross Cutler, Babak Naderi, Markus Loide, Sten Sootla, Ando Saabas

The quality of acoustic echo cancellers (AECs) in real-time communication systems is typically evaluated using objective metrics like ERLE and PESQ, and less commonly with lab-based subjective tests like ITU-T Rec.

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets and Testing Framework

1 code implementation10 Sep 2020 Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

In this challenge, we open source two large datasets to train AEC models under both single talk and double talk scenarios.

Acoustic echo cancellation Audio and Speech Processing Sound

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

1 code implementation16 May 2020 Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

In this challenge, we open-sourced a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.

Speech Enhancement

Multimodal active speaker detection and virtual cinematography for video conferencing

no code implementations10 Feb 2020 Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video.

BIG-bench Machine Learning

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

1 code implementation23 Jan 2020 Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

In this challenge, we open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings.

Speech Enhancement

A scalable noisy speech dataset and online subjective test framework

no code implementations17 Sep 2019 Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

Our subjective MOS evaluation is the first large scale evaluation of Speech Enhancement algorithms that we are aware of.

Speech Enhancement

Supervised Classifiers for Audio Impairments with Noisy Labels

no code implementations3 Jul 2019 Chandan K. A. Reddy, Ross Cutler, Johannes Gehrke

The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset.

On Design of Problem Token Questions in Quality of Experience Surveys

no code implementations19 Aug 2018 Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler

Based on 900, 000 calls gathered using a randomized controlled experiment from a live system, we find that the order bias can be significantly reduced by randomizing the display order of tokens.

Cannot find the paper you are looking for? You can Submit a new open access paper.