Search Results for author: Zhuo Chen

Found 112 papers, 40 papers with code

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

no code implementations15 Sep 2023 Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations14 Aug 2023 Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

1 code implementation13 Aug 2023 Yichi Zhang, Zhuo Chen, Wen Zhang

Nevertheless, existing methods emphasize the design of elegant KGC models to facilitate modality interaction, neglecting the real-life problem of missing modalities in KGs.

Multi-modal Knowledge Graph

ELFNet: Evidential Local-global Fusion for Stereo Matching

1 code implementation1 Aug 2023 Jieming Lou, Weide Liu, Zhuo Chen, Fayao Liu, Jun Cheng

Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation.

Domain Generalization Stereo Matching

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

1 code implementation30 Jul 2023 Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z. Pan, Yangning Li, Huajun Chen, Wen Zhang

As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information.

 Ranked #1 on Multi-modal Entity Alignment on UMVM-oea-d-w-v2 (using extra training data)

Benchmarking Knowledge Graph Embeddings +2

On decoder-only architecture for speech-to-text and large language model integration

no code implementations8 Jul 2023 Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology

no code implementations16 Jun 2023 Haixing Dai, Yiwei Li, Zhengliang Liu, Lin Zhao, Zihao Wu, Suhang Song, Ye Shen, Dajiang Zhu, Xiang Li, Sheng Li, Xiaobai Yao, Lu Shi, Quanzheng Li, Zhuo Chen, Donglan Zhang, Gengchen Mai, Tianming Liu

In this pioneering study, inspired by AutoGPT, the state-of-the-art open-source application based on the GPT-4 large language model, we develop a novel tool called AD-AutoGPT which can conduct data collection, processing, and analysis about complex health narratives of Alzheimer's Disease in an autonomous manner via users' textual prompts.

Language Modelling Large Language Model

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

1 code implementation13 Jun 2023 Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields.

Catalytic activity prediction Chemical-Disease Interaction Extraction +14

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

no code implementations30 May 2023 Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

no code implementations25 May 2023 Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei

Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.

Language Modelling Multi-Task Learning +3

Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

no code implementations24 May 2023 Lingbing Guo, Weiqing Wang, Zhuo Chen, Ningyu Zhang, Zequn Sun, Yixuan Lai, Qiang Zhang, Huajun Chen

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies.

Revisit and Outstrip Entity Alignment: A Perspective of Generative Models

1 code implementation24 May 2023 Lingbing Guo, Zhuo Chen, Jiaoyan Chen, Huajun Chen

We then reveal that their incomplete objective limits the capacity on both entity alignment and entity synthesis (i. e., generating new entities).

Entity Alignment

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks

no code implementations19 Apr 2023 Zhuo Chen, Xudong Xu, Yichao Yan, Ye Pan, Wenhan Zhu, Wayne Wu, Bo Dai, Xiaokang Yang

While the use of 3D-aware GANs bypasses the requirement of 3D data, we further alleviate the necessity of style images with the CLIP model being the stylization guidance.

ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation

no code implementations4 Apr 2023 Zhuo Chen, Laker Newhouse, Eddie Chen, Di Luo, Marin Soljačić

Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology.

Inductive Bias Tensor Networks

Target Sound Extraction with Variable Cross-modality Clues

1 code implementation15 Mar 2023 Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

Target Sound Extraction

View Consistency Aware Holistic Triangulation for 3D Human Pose Estimation

no code implementations22 Feb 2023 Xiaoyue Wan, Zhuo Chen, Xu Zhao

The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction.

3D Human Pose Estimation 3D Reconstruction +1

Speaker Change Detection for Transformer Transducer ASR

no code implementations16 Feb 2023 Jian Wu, Zhuo Chen, Min Hu, Xiong Xiao, Jinyu Li

Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Domain-Agnostic Molecular Generation with Self-feedback

1 code implementation26 Jan 2023 Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen

To this end, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation.

Language Modelling Molecular Docking +1

Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo

1 code implementation CVPR 2023 Yuesong Wang, Zhaojie Zeng, Tao Guan, Wei Yang, Zhuo Chen, Wenkai Liu, Luoyuan Xu, Yawei Luo

To detect more anchor pixels to ensure better adaptive patch deformation, we propose to evaluate the matching ambiguity of a certain pixel by checking the convergence of the estimated depth as optimization proceeds.

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

1 code implementation29 Dec 2022 Zhuo Chen, Jiaoyan Chen, Wen Zhang, Lingbing Guo, Yin Fang, Yufeng Huang, Yichi Zhang, Yuxia Geng, Jeff Z. Pan, Wenting Song, Huajun Chen

Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) whose entities are associated with relevant images.

 Ranked #1 on Entity Alignment on FBYG15k (using extra training data)

Knowledge Graphs Multi-modal Entity Alignment

BEATs: Audio Pre-Training with Acoustic Tokenizers

1 code implementation18 Dec 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

 Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Audio Classification Self-Supervised Learning

Simulating 2+1D Lattice Quantum Electrodynamics at Finite Density with Neural Flow Wavefunctions

no code implementations14 Dec 2022 Zhuo Chen, Di Luo, Kaiwen Hu, Bryan K. Clark

We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions.


Exploring WavLM on Speech Enhancement

no code implementations18 Nov 2022 Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

An Adapter based Multi-label Pre-training for Speech Separation and Enhancement

no code implementations11 Nov 2022 Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu

In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data.

Denoising Pseudo Label +4

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

no code implementations11 Nov 2022 Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speech separation with large-scale self-supervised learning

no code implementations9 Nov 2022 Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Simulating realistic speech overlaps improves multi-talker ASR

no code implementations27 Oct 2022 Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

no code implementations24 Oct 2022 Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

Tele-Knowledge Pre-training for Fault Analysis

1 code implementation20 Oct 2022 Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation

1 code implementation19 Sep 2022 Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao

Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.

Monocular Depth Estimation

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

no code implementations12 Sep 2022 Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph

1 code implementation19 Aug 2022 Yufeng Huang, Zhuo Chen, Jiaoyan Chen, Jeff Z. Pan, Zhen Yao, Wen Zhang

Multi-modal aspect-based sentiment classification (MABSC) is task of classifying the sentiment of a target entity mentioned in a sentence and an image.

Image Captioning Sentiment Analysis +1

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

1 code implementation4 Jul 2022 Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Huajun Chen

Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

Contrastive Learning Image Classification +3

Disentangled Ontology Embedding for Zero-shot Learning

1 code implementation8 Jun 2022 Yuxia Geng, Jiaoyan Chen, Wen Zhang, Yajing Xu, Zhuo Chen, Jeff Z. Pan, Yufeng Huang, Feiyu Xiong, Huajun Chen

In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects.

Image Classification Ontology Embedding +2

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Speech Separation

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

1 code implementation30 Mar 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer

no code implementations17 Feb 2022 Yin Fang, Zhuo Chen, Xiaohui Fan, Ningyu Zhang

To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models.

Molecular Property Prediction Property Prediction

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

1 code implementation2 Feb 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A New Image Codec Paradigm for Human and Machine Uses

no code implementations19 Dec 2021 Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang, Zhengguang Li, Huaxiang Zhang

Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.

Decision Making Image Compression +7

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

no code implementations18 Dec 2021 Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Jeff Z. Pan, Yuan He, Wen Zhang, Ian Horrocks, Huajun Chen

Machine learning especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for supervision.

Data Augmentation Few-Shot Learning +10

Molecular Contrastive Learning with Chemical Element Knowledge Graph

1 code implementation1 Dec 2021 Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen

To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning.

Contrastive Learning Molecular Property Prediction +3

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations28 Oct 2021 Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

speech-recognition Speech Recognition +1

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

no code implementations20 Oct 2021 Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang

Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

no code implementations18 Oct 2021 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement speech-recognition +1

All-neural beamformer for continuous speech separation

no code implementations13 Oct 2021 Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

no code implementations7 Oct 2021 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +6

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations17 Sep 2021 Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations7 Aug 2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Spacetime Neural Network for High Dimensional Quantum Dynamics

no code implementations4 Aug 2021 Jiangran Wang, Zhuo Chen, Di Luo, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark

We develop a spacetime neural network method with second order optimization for solving quantum dynamics from the high dimensional Schr\"{o}dinger equation.

Vocal Bursts Intensity Prediction

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

no code implementations8 Jul 2021 Yikang Zhang, Zhuo Chen, Zhao Zhong

Our method achieves the state-of-the-art performance on ImageNet, 80. 7% top-1 accuracy with 194M FLOPs.

Image Classification

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations6 Jul 2021 Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations5 Jul 2021 Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Modeling and Reasoning in Event Calculus using Goal-Directed Constraint Answer Set Programming

no code implementations28 Jun 2021 Joaquín Arias, Manuel Carro, Zhuo Chen, Gopal Gupta

Automated commonsense reasoning is essential for building human-like AI systems featuring, for example, explainable AI.

End-to-End Speaker-Attributed ASR with Transformer

no code implementations5 Apr 2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations31 Mar 2021 Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations3 Mar 2021 Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Knowledge-aware Zero-Shot Learning: Survey and Perspective

1 code implementation26 Feb 2021 Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Ian Horrocks, Jeff Z. Pan, Huajun Chen

Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a.

BIG-bench Machine Learning Zero-Shot Learning

OntoZSL: Ontology-enhanced Zero-shot Learning

1 code implementation15 Feb 2021 Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Jeff Z. Pan, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen

The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e. g., features) from training classes (i. e., seen classes) to unseen classes.

Image Classification Knowledge Graph Completion +2

Gauge Invariant and Anyonic Symmetric Autoregressive Neural Networks for Quantum Lattice Models

no code implementations18 Jan 2021 Di Luo, Zhuo Chen, Kaiwen Hu, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark

Symmetries such as gauge invariance and anyonic symmetry play a crucial role in quantum many-body physics.

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation3 Nov 2020 Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation23 Oct 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations20 Oct 2020 Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +2

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations7 Sep 2020 Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

speech-recognition Speech Recognition +1

Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network

no code implementations30 Aug 2020 Shuqiang Wang, Zhuo Chen, Wen Yu, Baiying Lei

The assistant network and the discriminator are employed to jointly decide whether the segmentation results are real or fake.

Lesion Segmentation

Continuous Speech Separation with Conformer

1 code implementation13 Aug 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

 Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

1 code implementation11 Aug 2020 Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles

no code implementations27 Jul 2020 Yuxin Lu, Peng Cheng, Zhuo Chen, Wai Ho Mow, Yonghui Li, Branka Vucetic

We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.

Multi-Task Learning

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

no code implementations19 Jun 2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

no code implementations CVPR 2020 Zhuo Chen, Chaoyue Wang, Bo Yuan, Dacheng Tao

Portrait animation, which aims to animate a still portrait to life using poses extracted from target frames, is an important technique for many real-world entertainment applications.

Semantic Segmentation

Neural Speech Separation Using Spatially Distributed Microphones

no code implementations28 Apr 2020 Dongmei Wang, Zhuo Chen, Takuya Yoshioka

The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.

speech-recognition Speech Recognition +1

Generative Adversarial Zero-shot Learning via Knowledge Graphs

no code implementations7 Apr 2020 Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen

However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes.

Image Classification Knowledge Graphs +1

Continuous speech separation: dataset and analysis

1 code implementation30 Jan 2020 Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations18 Nov 2019 Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

2 code implementations30 Oct 2019 Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.

Speech Separation

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

8 code implementations14 Oct 2019 Yi Luo, Zhuo Chen, Takuya Yoshioka

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.

Speech Separation

A Learning-Based Two-Stage Spectrum Sharing Strategy with Multiple Primary Transmit Power Levels

no code implementations21 Jul 2019 Rui Zhang, Peng Cheng, Zhuo Chen, Yonghui Li, Branka Vucetic

Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation.

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

1 code implementation12 Jul 2019 Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

speech-recognition Speech Recognition

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

1 code implementation19 Jun 2019 Zhuo Chen, Jiyuan Zhang, Ruizhou Ding, Diana Marculescu

In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound.

General Classification Image Classification +2

Low-Latency Speaker-Independent Continuous Speech Separation

no code implementations13 Apr 2019 Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment.

speech-recognition Speech Recognition +1

Understanding the Impact of Label Granularity on CNN-based Image Classification

1 code implementation21 Jan 2019 Zhuo Chen, Ruizhou Ding, Ting-Wu Chin, Diana Marculescu

In this paper, we conduct extensive experiments using various datasets to demonstrate and analyze how and why training based on fine-grain labeling, such as "Persian cat" can improve CNN accuracy on classifying coarse-grain classes, in this case "cat."

General Classification Image Classification

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

no code implementations8 Oct 2018 Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped.

speech-recognition Speech Recognition +1

Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing

no code implementations17 Sep 2018 Zhuo Chen, Weisi Lin, Shiqi Wang, Ling-Yu Duan, Alex C. Kot

The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical.

Data Compression Feature Compression

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations14 Apr 2018 Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Speaker-Invariant Training via Adversarial Learning

no code implementations2 Apr 2018 Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.

General Classification Multi-Task Learning

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

no code implementations21 Nov 2017 Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Image Quality Assessment Guided Deep Neural Networks Training

3 code implementations13 Aug 2017 Zhuo Chen, Weisi Lin, Shiqi Wang, Long Xu, Leida Li

For many computer vision problems, the deep neural networks are trained and validated based on the assumption that the input images are pristine (i. e., artifact-free).

Data Augmentation Image Classification +1

Improving Adherence to Heart Failure Management Guidelines via Abductive Reasoning

no code implementations16 Jul 2017 Zhuo Chen, Elmer Salazar, Kyle Marple, Gopal Gupta, Lakshman Tamil, Sandeep Das, Alpesh Amin

A standard approach to managing chronic diseases by medical community is to have a committee of experts develop guidelines that all physicians should follow.


Speaker-independent Speech Separation with Deep Attractor Network

no code implementations12 Jul 2017 Yi Luo, Zhuo Chen, Nima Mesgarani

A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space.

Speech Separation

End-to-End Attention based Text-Dependent Speaker Verification

no code implementations3 Jan 2017 Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong

A new type of End-to-End system for text-dependent speaker verification is presented in this paper.

Text-Dependent Speaker Verification

Deep attractor network for single-microphone speaker separation

1 code implementation27 Nov 2016 Zhuo Chen, Yi Luo, Nima Mesgarani

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Speaker Separation Speech Separation

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

no code implementations18 Nov 2016 Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks.

Clustering Deep Clustering +3

A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns

no code implementations25 Oct 2016 Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil

In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming.


Single-Channel Multi-Speaker Separation using Deep Clustering

2 code implementations7 Jul 2016 Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey

In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Deep clustering: Discriminative embeddings for segmentation and separation

8 code implementations18 Aug 2015 John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe

The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources.

Clustering Deep Clustering +3

Cannot find the paper you are looking for? You can Submit a new open access paper.