Search Results for author: Hakan Erdogan

Found 20 papers, 1 papers with code

Binaural Angular Separation Network

no code implementations • 16 Jan 2024 • Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones.

Paper
Add Code

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

no code implementations • 21 Aug 2023 • Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Guided Speech Enhancement Network

no code implementations • 13 Mar 2023 • Yang Yang, Shao-Fu Shih, Hakan Erdogan, Jamie Menjay Lin, Chehung Lee, Yunpeng Li, George Sung, Matthias Grundmann

Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output.

Denoising Speech Enhancement

Paper
Add Code

CycleGAN-Based Unpaired Speech Dereverberation

no code implementations • 29 Mar 2022 • Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey

Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance.

Speech Dereverberation

Paper
Add Code

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

no code implementations • 30 Jun 2021 • Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani

To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers.

Computational Efficiency Denoising +1

Paper
Add Code

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

no code implementations • 1 Jun 2021 • Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R. Hershey

The best performance is achieved using larger numbers of output sources, enabled by our efficient MixIT loss, combined with sparsity losses to prevent over-separation.

Paper
Add Code

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

no code implementations • 5 May 2021 • Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.

Clustering Speaker Identification +1

Paper
Add Code

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

no code implementations • 17 Dec 2020 • Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen

Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years.

Clustering Speech Separation

Paper
Add Code

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations • 2 Nov 2020 • Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

Paper
Add Code

Unsupervised Sound Separation Using Mixture Invariant Training

no code implementations • NeurIPS 2020 • Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey

In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources.

Speech Enhancement Speech Separation +1

Paper
Add Code

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations • 18 Nov 2019 • Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

Paper
Add Code

Universal Sound Separation

no code implementations • 8 May 2019 • Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey

For learnable bases, shorter windows (2. 5 ms) work best on all tasks.

Speech Enhancement Speech Separation

Paper
Add Code

Low-Latency Speaker-Independent Continuous Speech Separation

no code implementations • 13 Apr 2019 • Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment.

speech-recognition Speech Recognition +1

Paper
Add Code

SDR - half-baked or well done?

1 code implementation • 6 Nov 2018 • Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey

In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality.

Sound Audio and Speech Processing

Paper
Code

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

no code implementations • 8 Oct 2018 • Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped.

speech-recognition Speech Recognition +1

Paper
Add Code

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

no code implementations • 21 Nov 2017 • Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan

Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients.

Robust Speech Recognition speech-recognition

Paper
Add Code

Deep neural networks for single channel source separation

no code implementations • 12 Nov 2013 • Emad M. Grais, Mehmet Umut Sen, Hakan Erdogan

In the training stage, the training data for the source signals are used to train a DNN.

Paper
Add Code

SUTAV: A Turkish Audio-Visual Database

no code implementations • LREC 2012 • Ibrahim Saygin Topkaya, Hakan Erdogan

The main aim of collecting SUTAV database was to obtain a large audio-visual collection of spoken words, numbers and sentences in Turkish language.

Audio-Visual Speech Recognition Person Identification +2

Paper
Add Code

Confidence-Based Dynamic Classifier Combination For Mean-Shift Tracking

no code implementations • 29 Jul 2011 • Ibrahim Saygin Topkaya, Hakan Erdogan

We use two different classifiers, where one comes from a background modeling method, to generate the weight image and we calculate contributions of the classifiers dynamically using their confidences to generate a final weight image to be used in tracking.

Object Tracking

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.