no code implementations • 15 Sep 2023 • Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li
Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 Aug 2023 • Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.
1 code implementation • 13 Aug 2023 • Yichi Zhang, Zhuo Chen, Wen Zhang
Nevertheless, existing methods emphasize the design of elegant KGC models to facilitate modality interaction, neglecting the real-life problem of missing modalities in KGs.
1 code implementation • 1 Aug 2023 • Jieming Lou, Weide Liu, Zhuo Chen, Fayao Liu, Jun Cheng
Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation.
1 code implementation • 30 Jul 2023 • Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z. Pan, Yangning Li, Huajun Chen, Wen Zhang
As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information.
Ranked #1 on
Multi-modal Entity Alignment
on UMVM-oea-d-w-v2
(using extra training data)
no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.
no code implementations • 16 Jun 2023 • Haixing Dai, Yiwei Li, Zhengliang Liu, Lin Zhao, Zihao Wu, Suhang Song, Ye Shen, Dajiang Zhu, Xiang Li, Sheng Li, Xiaobai Yao, Lu Shi, Quanzheng Li, Zhuo Chen, Donglan Zhang, Gengchen Mai, Tianming Liu
In this pioneering study, inspired by AutoGPT, the state-of-the-art open-source application based on the GPT-4 large language model, we develop a novel tool called AD-AutoGPT which can conduct data collection, processing, and analysis about complex health narratives of Alzheimer's Disease in an autonomous manner via users' textual prompts.
1 code implementation • 13 Jun 2023 • Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields.
Catalytic activity prediction
Chemical-Disease Interaction Extraction
+14
no code implementations • 30 May 2023 • Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng
State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 25 May 2023 • Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei
Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.
no code implementations • 24 May 2023 • Lingbing Guo, Weiqing Wang, Zhuo Chen, Ningyu Zhang, Zequn Sun, Yixuan Lai, Qiang Zhang, Huajun Chen
Reasoning system dynamics is one of the most important analytical approaches for many scientific studies.
1 code implementation • 24 May 2023 • Lingbing Guo, Zhuo Chen, Jiaoyan Chen, Huajun Chen
We then reveal that their incomplete objective limits the capacity on both entity alignment and entity synthesis (i. e., generating new entities).
no code implementations • 6 May 2023 • Yufeng Huang, Jiji Tang, Zhuo Chen, Rongsheng Zhang, Xinfeng Zhang, WeiJie Chen, Zeng Zhao, Zhou Zhao, Tangjie Lv, Zhipeng Hu, Wen Zhang
Moreover, a Knowledge-Enhance Encoder (KEE) is proposed to leverage SGK as input to further enhance structured representations.
no code implementations • 19 Apr 2023 • Zhuo Chen, Xudong Xu, Yichao Yan, Ye Pan, Wenhan Zhu, Wayne Wu, Bo Dai, Xiaokang Yang
While the use of 3D-aware GANs bypasses the requirement of 3D data, we further alleviate the necessity of style images with the CLIP model being the stylization guidance.
no code implementations • 4 Apr 2023 • Zhuo Chen, Laker Newhouse, Eddie Chen, Di Luo, Marin Soljačić
Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology.
1 code implementation • 15 Mar 2023 • Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng
Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.
no code implementations • 7 Mar 2023 • Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.
no code implementations • 22 Feb 2023 • Xiaoyue Wan, Zhuo Chen, Xu Zhao
The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction.
no code implementations • 16 Feb 2023 • Jian Wu, Zhuo Chen, Min Hu, Xiong Xiao, Jinyu Li
Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 26 Jan 2023 • Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen
To this end, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation.
5 code implementations • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
1 code implementation • CVPR 2023 • Yuesong Wang, Zhaojie Zeng, Tao Guan, Wei Yang, Zhuo Chen, Wenkai Liu, Luoyuan Xu, Yawei Luo
To detect more anchor pixels to ensure better adaptive patch deformation, we propose to evaluate the matching ambiguity of a certain pixel by checking the convergence of the estimated depth as optimization proceeds.
1 code implementation • 29 Dec 2022 • Zhuo Chen, Jiaoyan Chen, Wen Zhang, Lingbing Guo, Yin Fang, Yufeng Huang, Yichi Zhang, Yuxia Geng, Jeff Z. Pan, Wenting Song, Huajun Chen
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) whose entities are associated with relevant images.
Ranked #1 on
Entity Alignment
on FBYG15k
(using extra training data)
1 code implementation • 18 Dec 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei
In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.
Ranked #1 on
Audio Classification
on ESC-50
(using extra training data)
no code implementations • 14 Dec 2022 • Zhuo Chen, Di Luo, Kaiwen Hu, Bryan K. Clark
We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions.
no code implementations • 18 Nov 2022 • Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu
There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.
no code implementations • 11 Nov 2022 • Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu
In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data.
no code implementations • 11 Nov 2022 • Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka
Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.
no code implementations • 9 Nov 2022 • Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez
Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.
no code implementations • 27 Oct 2022 • Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka
Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 24 Oct 2022 • Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler
Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.
1 code implementation • 20 Oct 2022 • Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen
In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.
1 code implementation • 19 Sep 2022 • Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao
Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.
no code implementations • 12 Sep 2022 • Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka
To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 19 Aug 2022 • Yufeng Huang, Zhuo Chen, Jiaoyan Chen, Jeff Z. Pan, Zhen Yao, Wen Zhang
Multi-modal aspect-based sentiment classification (MABSC) is task of classifying the sentiment of a target entity mentioned in a sentence and an image.
1 code implementation • 26 Jul 2022 • Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Yin Fang, Jeff Pan, Ningyu Zhang, Wen Zhang
Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge.
Ranked #14 on
Visual Question Answering (VQA)
on OK-VQA
1 code implementation • 4 Jul 2022 • Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Huajun Chen
Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.
Ranked #1 on
Zero-Shot Learning
on CUB-200-2011
1 code implementation • 8 Jun 2022 • Yuxia Geng, Jiaoyan Chen, Wen Zhang, Yajing Xu, Zhuo Chen, Jeff Z. Pan, Yufeng Huang, Feiyu Xiong, Huajun Chen
In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects.
no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).
no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei
Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.
1 code implementation • 30 Mar 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 17 Feb 2022 • Yin Fang, Zhuo Chen, Xiaohui Fan, Ningyu Zhang
To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models.
1 code implementation • 2 Feb 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 19 Dec 2021 • Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang, Zhengguang Li, Huaxiang Zhang
Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
no code implementations • 18 Dec 2021 • Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Jeff Z. Pan, Yuan He, Wen Zhang, Ian Horrocks, Huajun Chen
Machine learning especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for supervision.
1 code implementation • 1 Dec 2021 • Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen
To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning.
no code implementations • 28 Oct 2021 • Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li
In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.
no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei
Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.
5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
no code implementations • 20 Oct 2021 • Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang
Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Oct 2021 • Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang
Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.
no code implementations • 13 Oct 2021 • Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez
Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu
We integrate the proposed methods into the HuBERT framework.
no code implementations • 12 Oct 2021 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda
Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.
no code implementations • 7 Oct 2021 • Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.
no code implementations • 17 Sep 2021 • Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li
Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.
no code implementations • 7 Aug 2021 • Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.
no code implementations • 4 Aug 2021 • Jiangran Wang, Zhuo Chen, Di Luo, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark
We develop a spacetime neural network method with second order optimization for solving quantum dynamics from the high dimensional Schr\"{o}dinger equation.
2 code implementations • 12 Jul 2021 • Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan, Huajun Chen
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need.
Ranked #1 on
Visual Question Answering (VQA)
on F-VQA
no code implementations • 8 Jul 2021 • Yikang Zhang, Zhuo Chen, Zhao Zhong
Our method achieves the state-of-the-art performance on ImageNet, 80. 7% top-1 accuracy with 194M FLOPs.
Ranked #531 on
Image Classification
on ImageNet
no code implementations • 6 Jul 2021 • Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 5 Jul 2021 • Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li
Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 29 Jun 2021 • Yuxia Geng, Jiaoyan Chen, Xiang Zhuang, Zhuo Chen, Jeff Z. Pan, Juan Li, Zonggang Yuan, Huajun Chen
different ZSL methods.
no code implementations • 28 Jun 2021 • Joaquín Arias, Manuel Carro, Zhuo Chen, Gopal Gupta
Automated commonsense reasoning is essential for building human-like AI systems featuring, for example, explainable AI.
no code implementations • 5 Jun 2021 • Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 5 Apr 2021 • Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 31 Mar 2021 • Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 3 Mar 2021 • Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng
Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.
1 code implementation • 26 Feb 2021 • Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Ian Horrocks, Jeff Z. Pan, Huajun Chen
Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a.
no code implementations • 23 Feb 2021 • Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
A transformer-based dual-path system is proposed, which integrates transform layers for global modeling.
1 code implementation • 15 Feb 2021 • Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Jeff Z. Pan, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen
The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e. g., features) from training classes (i. e., seen classes) to unseen classes.
no code implementations • 18 Jan 2021 • Di Luo, Zhuo Chen, Kaiwen Hu, Zhizhen Zhao, Vera Mikyoung Hur, Bryan K. Clark
Symmetries such as gauge invariance and anyonic symmetry play a crucial role in quantum many-body physics.
no code implementations • 17 Dec 2020 • Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years.
1 code implementation • 3 Nov 2020 • Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka
Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 23 Oct 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li
With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.
no code implementations • 20 Oct 2020 • Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong
We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.
no code implementations • 7 Sep 2020 • Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie
Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.
no code implementations • 30 Aug 2020 • Shuqiang Wang, Zhuo Chen, Wen Yu, Baiying Lei
The assistant network and the discriminator are employed to jointly decide whether the segmentation results are real or fake.
1 code implementation • 13 Aug 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou
Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.
Ranked #1 on
Speech Separation
on LibriCSS
(using extra training data)
1 code implementation • 11 Aug 2020 • Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 27 Jul 2020 • Yuxin Lu, Peng Cheng, Zhuo Chen, Wai Ho Mow, Yonghui Li, Branka Vucetic
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
no code implementations • 19 Jun 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka
We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • CVPR 2020 • Zhuo Chen, Chaoyue Wang, Bo Yuan, Dacheng Tao
Portrait animation, which aims to animate a still portrait to life using poses extracted from target frames, is an important technique for many real-world entertainment applications.
no code implementations • 28 Apr 2020 • Dongmei Wang, Zhuo Chen, Takuya Yoshioka
The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.
no code implementations • 7 Apr 2020 • Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen
However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes.
1 code implementation • 30 Jan 2020 • Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li
In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou
This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.
no code implementations • 18 Nov 2019 • Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.
2 code implementations • 30 Oct 2019 • Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka
An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.
8 code implementations • 14 Oct 2019 • Yi Luo, Zhuo Chen, Takuya Yoshioka
Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.
Ranked #15 on
Speech Separation
on WSJ0-2mix
no code implementations • 21 Jul 2019 • Rui Zhang, Peng Cheng, Zhuo Chen, Yonghui Li, Branka Vucetic
Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation.
1 code implementation • 12 Jul 2019 • Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong
While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.
1 code implementation • 19 Jun 2019 • Zhuo Chen, Jiyuan Zhang, Ruizhou Ding, Diana Marculescu
In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound.
no code implementations • 3 May 2019 • Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng
The speaker-attributed WER (SAWER) is 26. 7%.
no code implementations • 13 Apr 2019 • Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment.
1 code implementation • 21 Jan 2019 • Zhuo Chen, Ruizhou Ding, Ting-Wu Chin, Diana Marculescu
In this paper, we conduct extensive experiments using various datasets to demonstrate and analyze how and why training based on fine-grain labeling, such as "Persian cat" can improve CNN accuracy on classifying coarse-grain classes, in this case "cat."
no code implementations • 8 Oct 2018 • Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva
The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped.
no code implementations • 17 Sep 2018 • Zhuo Chen, Weisi Lin, Shiqi Wang, Ling-Yu Duan, Alex C. Kot
The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical.
no code implementations • 14 Apr 2018 • Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong
In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.
no code implementations • 2 Apr 2018 • Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang
We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.
no code implementations • 21 Nov 2017 • Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong
Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
3 code implementations • 13 Aug 2017 • Zhuo Chen, Weisi Lin, Shiqi Wang, Long Xu, Leida Li
For many computer vision problems, the deep neural networks are trained and validated based on the assumption that the input images are pristine (i. e., artifact-free).
no code implementations • 16 Jul 2017 • Zhuo Chen, Elmer Salazar, Kyle Marple, Gopal Gupta, Lakshman Tamil, Sandeep Das, Alpesh Amin
A standard approach to managing chronic diseases by medical community is to have a committee of experts develop guidelines that all physicians should follow.
no code implementations • 12 Jul 2017 • Yi Luo, Zhuo Chen, Nima Mesgarani
A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space.
no code implementations • 3 Jan 2017 • Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong
A new type of End-to-End system for text-dependent speaker verification is presented in this paper.
1 code implementation • 27 Nov 2016 • Zhuo Chen, Yi Luo, Nima Mesgarani
We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.
no code implementations • 18 Nov 2016 • Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani
Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks.
no code implementations • 25 Oct 2016 • Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil
In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming.
2 code implementations • 7 Jul 2016 • Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey
In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
8 code implementations • 18 Aug 2015 • John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe
The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources.
Ranked #25 on
Speech Separation
on WSJ0-2mix