no code implementations • Signal Processing Magazine 2012 • Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.
no code implementations • 30 Oct 2015 • Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass
In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.
no code implementations • 30 Oct 2015 • Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu
In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.
2 code implementations • KDD 2016 • Ying Shan, T. Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, JC Mao
Manually crafted combinatorial features have been the “secret sauce” behind many successful models.
1 code implementation • 1 Jul 2016 • Dong Yu, Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen
We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.
no code implementations • 15 Mar 2017 • Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, Yi Zhang
Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware.
no code implementations • 17 Mar 2017 • Wenpeng Li, Bin-Bin Zhang, Lei Xie, Dong Yu
Deep learning models (DLMs) are state-of-the-art techniques in speech recognition.
3 code implementations • 18 Mar 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet).
no code implementations • 22 Mar 2017 • Dong Yu, Xuankai Chang, Yanmin Qian
Our technique is based on permutation invariant training (PIT) for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 19 Jul 2017 • Yanmin Qian, Xuankai Chang, Dong Yu
Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • SEMEVAL 2017 • Yukun Feng, Dong Yu, Jian Xu, Chunhua Liu
This paper explores the automatic learning of distributed representations of the target{'}s context for semantic frame labeling with target-based neural model.
no code implementations • 31 Aug 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs).
Sound
no code implementations • 25 Apr 2018 • Dong Yu, Jinyu Li
In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.
no code implementations • SEMEVAL 2018 • Meiqian Zhao, Chunhua Liu, Lu Liu, Yan Zhao, Dong Yu
To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind.
1 code implementation • EMNLP 2018 • Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang
Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data.
no code implementations • 14 Oct 2018 • Lisa Fan, Dong Yu, Lu Wang
Sequence-to-sequence (seq2seq) neural models have been actively investigated for abstractive summarization.
1 code implementation • NAACL 2019 • Kai Sun, Dian Yu, Dong Yu, Claire Cardie
Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge.
Ranked #7 on Question Answering on StoryCloze
no code implementations • 8 Nov 2018 • Chao Weng, Dong Yu
In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR).
no code implementations • ICLR 2019 • Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu
We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping corpus.
1 code implementation • 8 Jan 2019 • Chunhua Liu, Shan Jiang, Hainan Yu, Dong Yu
The inference of each turn is performed on the current matching feature and the memory.
no code implementations • 8 Jan 2019 • Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu
From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.
no code implementations • PACLIC 2018 • Chunhua Liu, Haiou Zhang, Shan Jiang, Dong Yu
We divide a complete story into three narrative segments: an \textit{exposition}, a \textit{climax}, and an \textit{ending}.
no code implementations • 11 Jan 2019 • Yan Zhao, Lu Liu, Chunhua Liu, Ruoyao Yang, Dong Yu
We introduce a new task named Story Ending Generation (SEG), whic-h aims at generating a coherent story ending from a sequence of story plot.
1 code implementation • 1 Feb 2019 • Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, Claire Cardie
DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.
1 code implementation • WS 2019 • Xiaoman Pan, Kai Sun, Dian Yu, Jianshu Chen, Heng Ji, Claire Cardie, Dong Yu
We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus.
1 code implementation • CONLL 2019 • Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu, David Mcallester, Dan Roth
Remarkable success has been achieved in the last few years on some limited machine reading comprehension (MRC) tasks.
no code implementations • TACL 2019 • Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, Claire Cardie
We present DREAM, the first dialogue-based multiple-choice reading comprehension data set.
no code implementations • 7 Apr 2019 • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.
Audio and Speech Processing Sound
1 code implementation • TACL 2020 • Kai Sun, Dian Yu, Dong Yu, Claire Cardie
Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document.
no code implementations • 11 May 2019 • Shi-Xiong Zhang, Yifan Gong, Dong Yu
One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way.
no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.
no code implementations • 16 May 2019 • Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu
We propose a novel method which simultaneously models both the sequence discriminative training and the feature discriminative learning within a single network architecture, so that it can learn discriminative deep features in sequence training that obviates the need for presegmented training data.
no code implementations • 17 May 2019 • Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu
We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.
1 code implementation • ACL 2019 • Kun Xu, Li-Wei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu
Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs.
no code implementations • SEMEVAL 2019 • Ruoyao Yang, Wanying Xie, Chunhua Liu, Dong Yu
Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7.
no code implementations • SEMEVAL 2019 • Wanying Xie, Mengxi Que, Ruoyao Yang, Chunhua Liu, Dong Yu
For contextual knowledge enhancement, we extend the training set of subtask A, use several features to improve the results of our system and adapt the input formats to be more suitable for this task.
1 code implementation • 7 Jun 2019 • Guoyin Wang, Yan Song, Yue Zhang, Dong Yu
Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge.
1 code implementation • ACL 2019 • Ying Lin, Liyuan Liu, Heng Ji, Dong Yu, Jiawei Han
We design a set of word frequency-based reliability signals to indicate the quality of each word embedding.
1 code implementation • ACL 2019 • Hongming Zhang, Yan Song, Yangqiu Song, Dong Yu
Resolving pronoun coreference requires knowledge support, especially for particular domains (e. g., medicine).
no code implementations • 9 Jul 2019 • Zhao You, Dan Su, Dong Yu
First, for each domain, a teacher model (domain-dependent model) is trained by fine-tuning a multi-condition model with domain-specific subset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Jul 2019 • Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha
We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.
no code implementations • IJCAI 2019 • Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He, Dong Yu
Aspect extraction relies on identifying aspects by discovering coherence among words, which is challenging when word meanings are diversified and processing on short texts.
Aspect Extraction Aspect Term Extraction and Sentiment Classification +1
2 code implementations • ICCV 2019 • Zhengyuan Yang, Boqing Gong, Li-Wei Wang, Wenbing Huang, Dong Yu, Jiebo Luo
We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.
2 code implementations • 30 Aug 2019 • Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, Dong Yu
End-to-end speech synthesis methods already achieve close-to-human quality performance.
4 code implementations • 4 Sep 2019 • Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.
no code implementations • 16 Sep 2019 • Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu
Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.
Audio and Speech Processing Sound Signal Processing
no code implementations • 19 Sep 2019 • Yiheng Huang, Jinchuan Tian, Lei Han, Guangsen Wang, Xingcheng Song, Dan Su, Dong Yu
One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data.
no code implementations • 26 Sep 2019 • Hai Wang, Dian Yu, Kai Sun, Janshu Chen, Dong Yu
However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language.
no code implementations • WS 2019 • Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel
We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs.
no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng
Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.
no code implementations • WS 2019 • Sangwoo Cho, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu
Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select the most probable set of sentences to form a summary according to a probability measure defined by modeling sentence prominence and pairwise repulsion.
no code implementations • 28 Oct 2019 • Zhao You, Dan Su, Jie Chen, Chao Weng, Dong Yu
Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 Oct 2019 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions.
no code implementations • CONLL 2019 • Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu
However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language.
no code implementations • WS 2019 • Chunhua Liu, Dong Yu
This paper describes our system for COIN Shared Task 1: Commonsense Inference in Everyday Narrations.
2 code implementations • 23 Nov 2019 • Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, xiangyang xue, Chen Li, Dong Yu, Fei Liu
If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged.
Ranked #27 on Text Summarization on GigaWord
no code implementations • 28 Nov 2019 • Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu
In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.
1 code implementation • 30 Nov 2019 • Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu
Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.
no code implementations • 4 Dec 2019 • Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.
no code implementations • 17 Dec 2019 • Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu
The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.
no code implementations • 20 Dec 2019 • Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu
The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.
no code implementations • 27 Dec 2019 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
This paper presents a method that generates expressive singing voice of Peking opera.
no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu
Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.
Ranked #4 on Audio-Visual Speech Recognition on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • IJCNLP 2019 • Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu
Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words.
no code implementations • 23 Jan 2020 • Kun Xu, Linfeng Song, Yansong Feng, Yan Song, Dong Yu
Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity.
1 code implementation • 6 Mar 2020 • Mutian He, Yangqiu Song, Kun Xu, Dong Yu
Commonsense knowledge graphs (CKGs) like Atomic and ASER are substantially different from conventional KGs as they consist of much larger number of nodes formed by loosely-structured text, which, though, enables them to handle highly diverse queries in natural language related to commonsense, leads to unique challenges for automatic KG construction methods.
no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.
no code implementations • 16 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu
Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.
3 code implementations • ACL 2020 • Dian Yu, Kai Sun, Claire Cardie, Dong Yu
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue.
Ranked #6 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)
no code implementations • ACL 2020 • Zhenyi Wang, Xiaoyang Wang, Bang An, Dong Yu, Changyou Chen
Text generation from a knowledge base aims to translate knowledge triples to natural language descriptions.
1 code implementation • 8 May 2020 • Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).
Audio and Speech Processing Sound
1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.
Ranked #5 on Video Captioning on ActivityNet Captions
1 code implementation • ACL 2020 • Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu
In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.
no code implementations • 18 May 2020 • Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, LianWu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng
Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 11 Jun 2020 • Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng
Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.
no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng
Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.
no code implementations • ACL 2020 • Linfeng Song, Kun Xu, Yue Zhang, Jianshu Chen, Dong Yu
Zero pronoun recovery and resolution aim at recovering the dropped pronoun and pointing out its anaphoric mentions, respectively.
1 code implementation • CVPR 2021 • Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu
Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.
1 code implementation • ECCV 2020 • Yiwu Zhong, Li-Wei Wang, Jianshu Chen, Dong Yu, Yin Li
We address the challenging problem of image captioning by revisiting the representation of image scene graph.
no code implementations • 7 Aug 2020 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.
1 code implementation • 16 Aug 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu
Speech separation algorithms are often used to separate the target speech from other interfering sources.
1 code implementation • 21 Aug 2020 • Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen
Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.
no code implementations • 25 Aug 2020 • Liqiang He, Dan Su, Dong Yu
Extensive experiments show that: (i) the architecture searched on the small proxy dataset can be transferred to the large dataset for the speech recognition tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • ACL 2022 • Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Claire Cardie
In this paper, we aim to extract commonsense knowledge to improve machine reading comprehension.
no code implementations • EMNLP 2020 • Kun Xu, Haochen Tan, Linfeng Song, Han Wu, Haisong Zhang, Linqi Song, Dong Yu
For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting rid of the noises is essential to improve its performance.
1 code implementation • EMNLP 2020 • Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu
The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.
2 code implementations • 12 Oct 2020 • Linchao Bao, Xiangkai Lin, Yajing Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, HaoZhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, Zhengyou Zhang
We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human heads with a consumer RGB-D selfie camera.
1 code implementation • EMNLP 2020 • Sangwoo Cho, Kaiqiang Song, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu
Amongst the best means to summarize is highlighting.
no code implementations • 23 Oct 2020 • Saurabh Kataria, Shi-Xiong Zhang, Dong Yu
We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean.
2 code implementations • 28 Oct 2020 • Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng
This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks.
no code implementations • 30 Oct 2020 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu
The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Nov 2020 • Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu
Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points.
no code implementations • 16 Nov 2020 • Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu
Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Nov 2020 • Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.
Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing
no code implementations • SEMEVAL 2020 • Chang Liu, Dong Yu
We demonstrate the effectiveness of our approaches, which achieves 0. 95 of subtask 1 in F1 while using only a subset of giving training set to fine-tune the BERT model, and our official submission achieves F1 0. 802, which ranks us 16th in the competition.
no code implementations • SEMEVAL 2020 • Shike Wang, Yuchen Fan, Xiangying Luo, Dong Yu
In our system, we expand the number of external constraints in multiple languages to obtain more specialised multilingual word embeddings.
1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu
In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.
1 code implementation • 13 Dec 2020 • Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu
First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.
no code implementations • 24 Dec 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 29 Dec 2020 • Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu
The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.
no code implementations • 31 Dec 2020 • Haisong Zhang, Lemao Liu, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Jianchen Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi
This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.
2 code implementations • 13 Jan 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.
Ranked #14 on Speech Separation on WSJ0-2mix
no code implementations • Findings (EMNLP) 2021 • Dian Yu, Kai Sun, Dong Yu, Claire Cardie
In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks.
1 code implementation • ACL 2020 • Linfeng Song, Ante Wang, Jinsong Su, Yue Zhang, Kun Xu, Yubin Ge, Dong Yu
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs.
Ranked #10 on Data-to-Text Generation on WebNLG
no code implementations • 16 Feb 2021 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC).
2 code implementations • 1 Mar 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.
Ranked #8 on Speech Separation on WSJ0-3mix
no code implementations • 2 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference.
no code implementations • 3 Mar 2021 • Xiaoyang Wang, Chen Li, Jianqiao Zhao, Dong Yu
To facilitate the research on this corpus, we provide results of several benchmark models.
no code implementations • 16 Mar 2021 • Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu
This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).
no code implementations • 31 Mar 2021 • Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu
In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.
no code implementations • 2 Apr 2021 • Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu
The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.
1 code implementation • NAACL 2021 • Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo
We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.
no code implementations • 11 Apr 2021 • Kun Xu, Han Wu, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu
Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence.
no code implementations • 17 Apr 2021 • Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu
The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 7 May 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains.
no code implementations • 8 May 2021 • Liqiang He, Shulin Feng, Dan Su, Dong Yu
Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Jun 2021 • Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu
End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization.
1 code implementation • AKBC 2021 • Tianqing Fang, Haojie Pan, Hongming Zhang, Yangqiu Song, Kun Xu, Dong Yu
To evaluate the inference capability of different methods, we also propose a new evaluation metric based on CODC.
1 code implementation • ACL 2021 • Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu
Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.
no code implementations • ACL 2021 • Lemao Liu, Haisong Zhang, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Dick Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi
This paper introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.
no code implementations • 26 Aug 2021 • Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu
In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples.
no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu
The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.
1 code implementation • EMNLP 2021 • Xintong Yu, Hongming Zhang, Yangqiu Song, ChangShui Zhang, Kun Xu, Dong Yu
Resolving pronouns to their referents has long been studied as a fundamental natural language understanding problem.
no code implementations • 29 Sep 2021 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren
Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.
2 code implementations • 7 Oct 2021 • Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu
We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • EMNLP 2021 • Wenlin Yao, Xiaoman Pan, Lifeng Jin, Jianshu Chen, Dian Yu, Dong Yu
We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.
no code implementations • 9 Nov 2021 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.
no code implementations • 14 Nov 2021 • Songxiang Liu, Dan Su, Dong Yu
The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data.
no code implementations • 22 Nov 2021 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu
Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Nov 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition.
no code implementations • 29 Nov 2021 • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.
1 code implementation • 5 Dec 2021 • Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou
Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 6 Jan 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 28 Jan 2022 • Songxiang Liu, Dan Su, Dong Yu
Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems.
no code implementations • 14 Feb 2022 • Jianqiao Zhao, Yanyang Li, Wanyu Du, Yangfeng Ji, Dong Yu, Michael R. Lyu, LiWei Wang
Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng
Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.
no code implementations • 1 Mar 2022 • Jian Jin, Dong Yu, Weisi Lin, Lili Meng, Hao Wang, Huaxiang Zhang
Besides, the JND of the red and blue channels are larger than that of the green one according to the experimental results of the proposed model, which demonstrates that more changes can be tolerated in the red and blue channels, in line with the well-known fact that the human visual system is more sensitive to the green channel in comparison with the red and blue ones.
1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen
And with our pretrained reader, the entire system improves by up to 4% in exact match.
1 code implementation • ACL 2022 • Chao Zhao, Wenlin Yao, Dian Yu, Kaiqiang Song, Dong Yu, Jianshu Chen
Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations.
1 code implementation • ACL 2022 • Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu
Summarization of podcast transcripts is of practical benefit to both content providers and consumers.
1 code implementation • ICLR 2022 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective.
Ranked #1 on Speech Synthesis on LJSpeech
1 code implementation • 29 Mar 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 30 Mar 2022 • Jiachen Lian, Chunlei Zhang, Dong Yu
A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.
1 code implementation • 7 Apr 2022 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR.
Ranked #2 on Speech Recognition on WenetSpeech
2 code implementations • 21 Apr 2022 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao
Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.
Ranked #7 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
no code implementations • 27 Apr 2022 • Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu
Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics.
1 code implementation • 11 May 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu
In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.
no code implementations • 20 May 2022 • Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu
Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.
1 code implementation • 5 Jun 2022 • Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Jun 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu
We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.
1 code implementation • 16 Jun 2022 • Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu
Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability.
1 code implementation • 22 Jun 2022 • Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea
HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context.
1 code implementation • 20 Jul 2022 • Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
Ranked #13 on Audio Generation on AudioCaps
no code implementations • 15 Aug 2022 • Chunlei Zhang, Dong Yu
On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i. e., DINO) to fine-tune the speaker embedding network.
1 code implementation • 1 Oct 2022 • Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji
Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3. 9x smaller in model size.
1 code implementation • 11 Oct 2022 • Ben Zhou, Dian Yu, Dong Yu, Dan Roth
Speaker identification, determining which character said each utterance in literary text, benefits many downstream tasks.
no code implementations • 14 Oct 2022 • Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe
Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units.
1 code implementation • 21 Oct 2022 • Yue Yang, Wenlin Yao, Hongming Zhang, Xiaoyang Wang, Dong Yu, Jianshu Chen
Large-scale pretrained language models have made significant advances in solving downstream language understanding tasks.
Ranked #2 on Visual Commonsense Tests on ViComTe-color
1 code implementation • 22 Oct 2022 • Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo
While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.
3 code implementations • 22 Oct 2022 • Yinya Huang, Hongming Zhang, Ruixin Hong, Xiaodan Liang, ChangShui Zhang, Dong Yu
To this end, we propose a comprehensive logical reasoning explanation form.
1 code implementation • 22 Oct 2022 • Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu
Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.
Ranked #7 on Abstractive Text Summarization on CNN / Daily Mail
no code implementations • 28 Oct 2022 • Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen
In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory.
Ranked #5 on Question Answering on StoryCloze
1 code implementation • 28 Oct 2022 • Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Fei Liu, Dong Yu
The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings.
Ranked #5 on Text Summarization on Pubmed
no code implementations • 8 Nov 2022 • Wenyue Hua, Lifeng Jin, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong Yu
Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors.
1 code implementation • 9 Nov 2022 • Hongming Zhang, Wenlin Yao, Dong Yu
We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately.
no code implementations • 22 Nov 2022 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.
1 code implementation • 6 Dec 2022 • Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen
However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base.
no code implementations • 12 Dec 2022 • Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu
Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse.
1 code implementation • 19 Dec 2022 • Xianjun Yang, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Xiaoman Pan, Linda Petzold, Dong Yu
Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model.
no code implementations • 29 Jan 2023 • Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang
The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.
no code implementations • 31 Jan 2023 • Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu
Current self-training methods such as standard self-training, co-training, tri-training, and others often focus on improving model performance on a single task, utilizing differences in input features, model architectures, and training processes.
1 code implementation • 31 Jan 2023 • Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu
Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.
1 code implementation • 16 Feb 2023 • Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, Dong Yu
We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation.
no code implementations • 18 Feb 2023 • Hao Zhang, Meng Yu, Dong Yu
In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.
no code implementations • 2 May 2023 • Hao Zhang, Meng Yu, Dong Yu
In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.
no code implementations • 4 May 2023 • Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu
During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).
1 code implementation • 4 May 2023 • Ruixin Hong, Hongming Zhang, Hong Zhao, Dong Yu, ChangShui Zhang
In this paper, we propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.
no code implementations • 22 May 2023 • Siyi Liu, Hongming Zhang, Hongwei Wang, Kaiqiang Song, Dan Roth, Dong Yu
However, none of the existing methods have explicitly addressed the issue of framing bias that is inherent in news articles.
1 code implementation • 24 May 2023 • Keming Lu, Xiaoman Pan, Kaiqiang Song, Hongming Zhang, Dong Yu, Jianshu Chen
In particular, we construct INSTRUCTOPENWIKI, a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.
1 code implementation • 24 May 2023 • James Y. Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu
Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.
no code implementations • 8 Jul 2023 • Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu
Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57. 6% of the correctly detected hallucinations.
no code implementations • 16 Jul 2023 • Zhenwen Liang, Dian Yu, Xiaoman Pan, Wenlin Yao, Qingkai Zeng, Xiangliang Zhang, Dong Yu
Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model.
no code implementations • 1 Aug 2023 • Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen
Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i. e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence.
Ranked #17 on Math Word Problem Solving on MATH
1 code implementation • 19 Aug 2023 • Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe
While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng
Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Sep 2023 • Haopeng Zhang, Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Hongwei Wang, Jiawei Zhang, Dong Yu
SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners.
1 code implementation • 15 Sep 2023 • Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, Dong Yu
We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon. com.
no code implementations • 16 Sep 2023 • Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu
Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing.
no code implementations • 18 Sep 2023 • Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu
Existing propositions often rely on logical constants for classification.
no code implementations • 18 Sep 2023 • Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu
Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge.
no code implementations • 27 Sep 2023 • Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu
In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.
no code implementations • 27 Sep 2023 • Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu
Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.
1 code implementation • 28 Sep 2023 • Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu
We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM.
1 code implementation • 30 Sep 2023 • Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu
In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes.