Search Results for author: Zhuo Chen

Found 73 papers, 21 papers with code

Disentangled Ontology Embedding for Zero-shot Learning

1 code implementation8 Jun 2022 Yuxia Geng, Jiaoyan Chen, Wen Zhang, Yajing Xu, Zhuo Chen, Jeff Z. Pan, Yufeng Huang, Feiyu Xiong, Huajun Chen

In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects.

Image Classification Zero-Shot Image Classification +1

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +2

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Speech Separation

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

no code implementations30 Mar 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Speaker Diarization +1

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer

no code implementations17 Feb 2022 Yin Fang, Qiang Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen

Machine learning, especially deep learning, has greatly advanced molecular studies in the biochemical domain.

Molecular Property Prediction

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

no code implementations2 Feb 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition

A New Image Codec Paradigm for Human and Machine Uses

no code implementations19 Dec 2021 Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang, Zhengguang Li, Huaxiang Zhang

Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.

Decision Making Image Compression +7

Molecular Contrastive Learning with Chemical Element Knowledge Graph

1 code implementation1 Dec 2021 Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen

To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning.

Contrastive Learning Molecular Property Prediction +1

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations28 Oct 2021 Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

Speech Recognition Speech Separation

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

no code implementations20 Oct 2021 Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang

Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.

Automatic Speech Recognition Speech Enhancement

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

no code implementations18 Oct 2021 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement Speech Recognition

All-neural beamformer for continuous speech separation

no code implementations13 Oct 2021 Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Speech Extraction

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

no code implementations7 Oct 2021 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +3

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations17 Sep 2021 Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations7 Aug 2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +2

Spacetime Neural Network for High Dimensional Quantum Dynamics

no code implementations4 Aug 2021 Jiangran Wang, Zhuo Chen, Di Luo, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark

We develop a spacetime neural network method with second order optimization for solving quantum dynamics from the high dimensional Schr\"{o}dinger equation.

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

no code implementations8 Jul 2021 Yikang Zhang, Zhuo Chen, Zhao Zhong

Our method achieves the state-of-the-art performance on ImageNet, 80. 7% top-1 accuracy with 194M FLOPs.

Image Classification

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations6 Jul 2021 Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Representation Learning +2

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations5 Jul 2021 Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Model Compression +1

Modeling and Reasoning in Event Calculus using Goal-Directed Constraint Answer Set Programming

no code implementations28 Jun 2021 Joaquín Arias, Manuel Carro, Zhuo Chen, Gopal Gupta

Automated commonsense reasoning is essential for building human-like AI systems featuring, for example, explainable AI.

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

no code implementations5 Jun 2021 Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.

Automatic Speech Recognition Speech Enhancement

End-to-End Speaker-Attributed ASR with Transformer

no code implementations5 Apr 2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Speaker Identification

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations31 Mar 2021 Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations3 Mar 2021 Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

Speech Recognition Speech Separation

Knowledge-aware Zero-Shot Learning: Survey and Perspective

1 code implementation26 Feb 2021 Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Ian Horrocks, Jeff Z. Pan, Huajun Chen

Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a.

Zero-Shot Learning

OntoZSL: Ontology-enhanced Zero-shot Learning

1 code implementation15 Feb 2021 Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Jeff Z. Pan, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen

The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e. g., features) from training classes (i. e., seen classes) to unseen classes.

Image Classification Knowledge Graph Completion +2

Gauge Invariant Autoregressive Neural Networks for Quantum Lattice Models

no code implementations18 Jan 2021 Di Luo, Zhuo Chen, Kaiwen Hu, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark

Gauge invariance plays a crucial role in quantum mechanics from condensed matter physics to high energy physics.

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation3 Nov 2020 Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Speaker Identification

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation23 Oct 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations20 Oct 2020 Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +1

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations7 Sep 2020 Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

Speech Recognition Speech Separation

Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network

no code implementations30 Aug 2020 Shuqiang Wang, Zhuo Chen, Wen Yu, Baiying Lei

The assistant network and the discriminator are employed to jointly decide whether the segmentation results are real or fake.

Lesion Segmentation

Continuous Speech Separation with Conformer

1 code implementation13 Aug 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

Speech Separation

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

1 code implementation11 Aug 2020 Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Automatic Speech Recognition Speaker Identification

Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles

no code implementations27 Jul 2020 Yuxin Lu, Peng Cheng, Zhuo Chen, Wai Ho Mow, Yonghui Li, Branka Vucetic

We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.

Multi-Task Learning

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

no code implementations19 Jun 2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Automatic Speech Recognition Speaker Identification

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

no code implementations CVPR 2020 Zhuo Chen, Chaoyue Wang, Bo Yuan, Dacheng Tao

Portrait animation, which aims to animate a still portrait to life using poses extracted from target frames, is an important technique for many real-world entertainment applications.

Semantic Segmentation

Neural Speech Separation Using Spatially Distributed Microphones

no code implementations28 Apr 2020 Dongmei Wang, Zhuo Chen, Takuya Yoshioka

The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.

Speech Recognition Speech Separation

Generative Adversarial Zero-shot Learning via Knowledge Graphs

no code implementations7 Apr 2020 Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen

However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes.

Image Classification Knowledge Graphs +1

Continuous speech separation: dataset and analysis

1 code implementation30 Jan 2020 Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Speech Separation

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations18 Nov 2019 Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +2

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

2 code implementations30 Oct 2019 Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.

Speech Separation

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

6 code implementations14 Oct 2019 Yi Luo, Zhuo Chen, Takuya Yoshioka

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.

Speech Separation

A Learning-Based Two-Stage Spectrum Sharing Strategy with Multiple Primary Transmit Power Levels

no code implementations21 Jul 2019 Rui Zhang, Peng Cheng, Zhuo Chen, Yonghui Li, Branka Vucetic

Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation.

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

1 code implementation12 Jul 2019 Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

Speech Recognition

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

1 code implementation19 Jun 2019 Zhuo Chen, Jiyuan Zhang, Ruizhou Ding, Diana Marculescu

In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound.

General Classification Image Classification +2

Low-Latency Speaker-Independent Continuous Speech Separation

no code implementations13 Apr 2019 Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment.

Speech Recognition Speech Separation

Understanding the Impact of Label Granularity on CNN-based Image Classification

1 code implementation21 Jan 2019 Zhuo Chen, Ruizhou Ding, Ting-Wu Chin, Diana Marculescu

In this paper, we conduct extensive experiments using various datasets to demonstrate and analyze how and why training based on fine-grain labeling, such as "Persian cat" can improve CNN accuracy on classifying coarse-grain classes, in this case "cat."

General Classification Image Classification

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

no code implementations8 Oct 2018 Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped.

Speech Recognition Speech Separation

Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing

no code implementations17 Sep 2018 Zhuo Chen, Weisi Lin, Shiqi Wang, Ling-Yu Duan, Alex C. Kot

The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical.

Data Compression

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations14 Apr 2018 Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Speaker-Invariant Training via Adversarial Learning

no code implementations2 Apr 2018 Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.

General Classification Multi-Task Learning

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

no code implementations21 Nov 2017 Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.

Automatic Speech Recognition General Classification +2

Image Quality Assessment Guided Deep Neural Networks Training

3 code implementations13 Aug 2017 Zhuo Chen, Weisi Lin, Shiqi Wang, Long Xu, Leida Li

For many computer vision problems, the deep neural networks are trained and validated based on the assumption that the input images are pristine (i. e., artifact-free).

Data Augmentation Image Classification +1

Improving Adherence to Heart Failure Management Guidelines via Abductive Reasoning

no code implementations16 Jul 2017 Zhuo Chen, Elmer Salazar, Kyle Marple, Gopal Gupta, Lakshman Tamil, Sandeep Das, Alpesh Amin

A standard approach to managing chronic diseases by medical community is to have a committee of experts develop guidelines that all physicians should follow.

Speaker-independent Speech Separation with Deep Attractor Network

no code implementations12 Jul 2017 Yi Luo, Zhuo Chen, Nima Mesgarani

A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space.

Speech Separation

End-to-End Attention based Text-Dependent Speaker Verification

no code implementations3 Jan 2017 Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong

A new type of End-to-End system for text-dependent speaker verification is presented in this paper.

Text-Dependent Speaker Verification

Deep attractor network for single-microphone speaker separation

no code implementations27 Nov 2016 Zhuo Chen, Yi Luo, Nima Mesgarani

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Speaker Separation Speech Separation

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

no code implementations18 Nov 2016 Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks.

Deep Clustering Multi-Task Learning +2

A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns

no code implementations25 Oct 2016 Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil

In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming.

Single-Channel Multi-Speaker Separation using Deep Clustering

2 code implementations7 Jul 2016 Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey

In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.

Automatic Speech Recognition Deep Clustering +2

Deep clustering: Discriminative embeddings for segmentation and separation

7 code implementations18 Aug 2015 John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe

The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources.

Deep Clustering Semantic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.