Search Results for author: Dong Yu

Found 159 papers, 55 papers with code

Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution

no code implementations ACL 2022 Irene Li, Linfeng Song, Kun Xu, Dong Yu

Coreference resolution over semantic graphs like AMRs aims to group the graph nodes that represent the same entity.

Coreference Resolution

结合深度学习和语言难度特征的句子可读性计算方法(The method of calculating sentence readability combined with deep learning and language difficulty characteristics)

no code implementations CCL 2020 Yuling Tang, Dong Yu

本文提出了可读性语料库构建的改进方法, 基于该方法, 构建了规模更大的汉语句子可读性语料库。该语料库在句子绝对难度评估任务上的准确率达到0. 7869, 相对前人工作提升了0. 15以上, 证明了改进方法的有效性。将深度学习方法应用于汉语可读性评估, 探究了不同深度学习方法自动捕获难度特征的能力, 并进仛步探究了向深度学习特征中融入不同层面的语难度特征对模型整体性能的影响。实验结果显示, 不同深度学习模型的难度特征捕获能力不尽相同, 语言难度特征可以不同程度地提高深度学习模型的难度表征能力。

面向人工智能伦理计算的中文道德词典构建方法研究(Construction of a Chinese Moral Dictionary for Artificial Intelligence Ethical Computing)

no code implementations CCL 2020 Hongrui Wang, Chang Liu, Dong Yu

道德词典资源的建设是人工智能伦理计算的一个研究重点。由于道德行为复杂多样, 现有的英文道德词典分类体系并不完善, 而中文方面目前尚未有相关的词典资源, 理论体系和构建方法仍待探究。针对以上问题, 该文提出了面向人工智能伦理计算的中文道德词典构建任务, 设计了四类标签和四种类型, 得到包含25, 012个词的中文道德词典资源。实验结果表明, 该词典资源不仅能够使机器学会道德知识, 判断词的道德标签和类型, 而且能够为句子级别的道德文本分析提供数据支持。

Instance-adaptive training with noise-robust losses against noisy labels

no code implementations EMNLP 2021 Lifeng Jin, Linfeng Song, Kun Xu, Dong Yu

In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data.

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging

no code implementations EMNLP 2021 Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.

Dialogue Rewriting Text Generation

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations20 May 2022 Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +1

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

no code implementations11 May 2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.

Voice Conversion

Distant finetuning with discourse relations for stance classification

no code implementations27 Apr 2022 Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu

Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics.

Classification Stance Classification

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

1 code implementation21 Apr 2022 Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Denoising Speech Synthesis +1

Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

no code implementations30 Mar 2022 Jiachen Lian, Chunlei Zhang, Dong Yu

A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.

Data Augmentation Disentanglement +2

Integrate Lattice-Free MMI into End-to-End Speech Recognition

1 code implementation29 Mar 2022 Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.

Automatic Speech Recognition

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

1 code implementation ICLR 2022 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective.

Image Generation Speech Synthesis

Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension

1 code implementation ACL 2022 Chao Zhao, Wenlin Yao, Dian Yu, Kaiqiang Song, Dong Yu, Jianshu Chen

Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations.

Full RGB Just Noticeable Difference (JND) Modelling

no code implementations1 Mar 2022 Jian Jin, Dong Yu, Weisi Lin, Lili Meng, Hao Wang, Huaxiang Zhang

Besides, the JND of the red and blue channels are larger than that of the green one according to the experimental results of the proposed model, which demonstrates that more changes can be tolerated in the red and blue channels, in line with the well-known fact that the human visual system is more sensitive to the green channel in comparison with the red and blue ones.

Image Quality Assessment

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations18 Feb 2022 Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

no code implementations14 Feb 2022 Jianqiao Zhao, Yanyang Li, Wanyu Du, Yangfeng Ji, Dong Yu, Michael R. Lyu, LiWei Wang

Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Dialogue Evaluation

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

1 code implementation28 Jan 2022 Songxiang Liu, Dan Su, Dong Yu

Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems.

Denoising Speech Synthesis

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

1 code implementation6 Jan 2022 Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM.

Automatic Speech Recognition

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

1 code implementation5 Dec 2021 Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou

Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks.

Automatic Speech Recognition

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations29 Nov 2021 Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

Frame Speech Recognition

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

no code implementations23 Nov 2021 Zhao You, Shulin Feng, Dan Su, Dong Yu

Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition.

Speech Recognition

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

no code implementations22 Nov 2021 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.

Automatic Speech Recognition Speech Separation

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

no code implementations14 Nov 2021 Songxiang Liu, Dan Su, Dong Yu

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data.

Disentanglement Meta-Learning +1

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer

no code implementations9 Nov 2021 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) is a technique used in full-duplex communication systems to eliminate acoustic feedback of far-end speech.

Acoustic echo cancellation Denoising +2

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

2 code implementations EMNLP 2021 Wenlin Yao, Xiaoman Pan, Lifeng Jin, Jianshu Chen, Dian Yu, Dong Yu

We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.

Word Sense Disambiguation

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations7 Oct 2021 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition

SynCLR: A Synthesis Framework for Contrastive Learning of out-of-domain Speech Representations

no code implementations29 Sep 2021 Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren

Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.

Contrastive Learning Data Augmentation +4

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations8 Sep 2021 Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Style Transfer

Bilateral Denoising Diffusion Models

no code implementations26 Aug 2021 Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu

In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples.


Importance-based Neuron Allocation for Multilingual Neural Machine Translation

1 code implementation ACL 2021 Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.

Machine Translation Translation

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

no code implementations8 Jun 2021 Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu

End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization.

Speech Recognition

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

no code implementations8 May 2021 Liqiang He, Shulin Feng, Dan Su, Dong Yu

Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively.

Automatic Speech Recognition Neural Architecture Search

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

1 code implementation7 May 2021 Zhao You, Shulin Feng, Dan Su, Dong Yu

Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains.

Speech Recognition

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

no code implementations17 Apr 2021 Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu

The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices.

Automatic Speech Recognition Speech Separation

Conversational Semantic Role Labeling

no code implementations11 Apr 2021 Kun Xu, Han Wu, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence.

Coreference Resolution Dialogue Understanding +2

Video-aided Unsupervised Grammar Induction

1 code implementation NAACL 2021 Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.

Optical Character Recognition

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

no code implementations2 Apr 2021 Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations31 Mar 2021 Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Speech Dereverberation

Towards Robust Speaker Verification with Target Speaker Enhancement

no code implementations16 Mar 2021 Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).

Speaker Verification Speech Enhancement

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

no code implementations3 Mar 2021 Xiaoyang Wang, Chen Li, Jianqiao Zhao, Dong Yu

To facilitate the research on this corpus, we provide results of several benchmark models.

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

no code implementations2 Mar 2021 Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference.

Speaker Verification Speech Separation

Contrastive Separative Coding for Self-supervised Representation Learning

no code implementations1 Mar 2021 Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC).

Representation Learning Self-Supervised Learning +1

Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

2 code implementations1 Mar 2021 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.

Speech Separation

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

no code implementations16 Feb 2021 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.

Automatic Speech Recognition Multi-Label Classification

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data

no code implementations Findings (EMNLP) 2021 Dian Yu, Kai Sun, Dong Yu, Claire Cardie

In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks.

Machine Reading Comprehension Multiple-choice +1

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks

2 code implementations13 Jan 2021 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.

Speech Separation

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

no code implementations31 Dec 2020 Haisong Zhang, Lemao Liu, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Jianchen Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi

This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.

Named Entity Recognition NER

Robust Dialogue Utterance Rewriting as Sequence Tagging

1 code implementation29 Dec 2020 Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.

Dialogue Rewriting Text Generation

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

no code implementations24 Dec 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.

Automatic Speech Recognition Frame +1

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

1 code implementation13 Dec 2020 Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu

First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.

Contrastive Learning Representation Learning +1

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

BLCU-NLP at SemEval-2020 Task 5: Data Augmentation for Efficient Counterfactual Detecting

no code implementations SEMEVAL 2020 Chang Liu, Dong Yu

We demonstrate the effectiveness of our approaches, which achieves 0. 95 of subtask 1 in F1 while using only a subset of giving training set to fine-tune the BERT model, and our official submission achieves F1 0. 802, which ranks us 16th in the competition.

Common Sense Reasoning Data Augmentation

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

no code implementations26 Nov 2020 Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.

Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing

Automatic Summarization of Open-Domain Podcast Episodes

no code implementations9 Nov 2020 Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu

Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points.

Abstractive Text Summarization

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations30 Oct 2020 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition

Replay and Synthetic Speech Detection with Res2net Architecture

2 code implementations28 Oct 2020 Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng

This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks.

Feature Engineering Synthetic Speech Detection

Multi-Channel Speaker Verification for Single and Multi-talker Speech

no code implementations23 Oct 2020 Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean.

Action Detection Activity Detection +2

High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

2 code implementations12 Oct 2020 Linchao Bao, Xiangkai Lin, Yajing Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, HaoZhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, Zhengyou Zhang

We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human heads with a consumer RGB-D selfie camera.


Token-level Adaptive Training for Neural Machine Translation

1 code implementation EMNLP 2020 Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu

The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.

Machine Translation Translation

Semantic Role Labeling Guided Multi-turn Dialogue ReWriter

no code implementations EMNLP 2020 Kun Xu, Haochen Tan, Linfeng Song, Han Wu, Haisong Zhang, Linqi Song, Dong Yu

For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting rid of the noises is essential to improve its performance.

Dialogue Rewriting Semantic Role Labeling

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

no code implementations25 Aug 2020 Liqiang He, Dan Su, Dong Yu

Extensive experiments show that: (i) the architecture searched on the small proxy dataset can be transferred to the large dataset for the speech recognition tasks.

Automatic Speech Recognition Neural Architecture Search

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

1 code implementation21 Aug 2020 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.

Speech Enhancement Speech Separation

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

1 code implementation16 Aug 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu

Speech separation algorithms are often used to separate the target speech from other interfering sources.

Frame Speech Separation

Peking Opera Synthesis via Duration Informed Attention Network

no code implementations7 Aug 2020 Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.

Comprehensive Image Captioning via Scene Graph Decomposition

1 code implementation ECCV 2020 Yiwu Zhong, Li-Wei Wang, Jianshu Chen, Dong Yu, Yin Li

We address the challenging problem of image captioning by revisiting the representation of image scene graph.

Image Captioning

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

no code implementations CVPR 2021 Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

Contrastive Learning Knowledge Distillation +3

ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT

no code implementations ACL 2020 Linfeng Song, Kun Xu, Yue Zhang, Jianshu Chen, Dong Yu

Zero pronoun recovery and resolution aim at recovering the dropped pronoun and pointing out its anaphoric mentions, respectively.

Multi-Task Learning

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations20 Jun 2020 Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

no code implementations11 Jun 2020 Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.

Data Augmentation Speaker Verification

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension

1 code implementation ACL 2020 Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu

In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.

Chunking Machine Reading Comprehension

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation ACL 2020 Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

Neural Spatio-Temporal Beamformer for Target Speech Separation

1 code implementation8 May 2020 Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).

Audio and Speech Processing Sound

Dialogue-Based Relation Extraction

2 code implementations ACL 2020 Dian Yu, Kai Sun, Claire Cardie, Dong Yu

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue.

Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction

Multi-modal Multi-channel Target Speech Separation

no code implementations16 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Speech Separation

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations9 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

Speech Separation

On the Role of Conceptualization in Commonsense Knowledge Graph Construction

1 code implementation6 Mar 2020 Mutian He, Yangqiu Song, Kun Xu, Dong Yu

Commonsense knowledge graphs (CKGs) like Atomic and ASER are substantially different from conventional KGs as they consist of much larger number of nodes formed by loosely-structured text, which, though, enables them to handle highly diverse queries in natural language related to commonsense, leads to unique challenges for automatic KG construction methods.

graph construction Knowledge Graphs +2

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

no code implementations23 Jan 2020 Kun Xu, Linfeng Song, Yansong Feng, Yan Song, Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity.

Entity Alignment

Multiplex Word Embeddings for Selectional Preference Acquisition

1 code implementation IJCNLP 2019 Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu

Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words.

Word Embeddings Word Similarity

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations6 Jan 2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #7 on Lipreading on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

Learning Singing From Speech

no code implementations20 Dec 2019 Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.

Speech Synthesis Voice Conversion

A Unified Framework for Speech Separation

no code implementations17 Dec 2019 Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu

The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.

Speech Separation

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

no code implementations4 Dec 2019 Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.

Music Generation Translation +1

Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

1 code implementation30 Nov 2019 Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu

Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.

Machine Translation Translation

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

no code implementations28 Nov 2019 Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.

Speech Recognition

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

no code implementations CONLL 2019 Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu

However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language.

Language Modelling Machine Reading Comprehension +3

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

no code implementations28 Oct 2019 Zhao You, Dan Su, Jie Chen, Chao Weng, Dong Yu

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency.

Automatic Speech Recognition

Mixup-breakdown: a consistency training method for improving generalization of speech separation models

no code implementations28 Oct 2019 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions.

Speech Separation

Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations

no code implementations WS 2019 Sangwoo Cho, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu

Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select the most probable set of sentences to form a summary according to a probability measure defined by modeling sentence prominence and pairwise repulsion.

Document Summarization Extractive Summarization +2

Generating Diverse Story Continuations with Controllable Semantics

no code implementations WS 2019 Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel

We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs.

Improving Pre-Trained Multilingual Models with Vocabulary Expansion

no code implementations26 Sep 2019 Hai Wang, Dian Yu, Kai Sun, Janshu Chen, Dong Yu

However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language.

Language Modelling Machine Reading Comprehension +3

A Random Gossip BMUF Process for Neural Language Modeling

no code implementations19 Sep 2019 Yiheng Huang, Jinchuan Tian, Lei Han, Guangsen Wang, Xingcheng Song, Dan Su, Dong Yu

One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data.

Speech Recognition

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

no code implementations16 Sep 2019 Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu

Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.

Audio and Speech Processing Sound Signal Processing

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

3 code implementations4 Sep 2019 Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.

Speech Synthesis

Maximizing Mutual Information for Tacotron

2 code implementations30 Aug 2019 Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, Dong Yu

End-to-end speech synthesis methods already achieve close-to-human quality performance.

Frame Speech Synthesis

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

no code implementations9 Jul 2019 Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha

We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.

Keyword Spotting Speech Recognition

Teach an all-rounder with experts in different domains

no code implementations9 Jul 2019 Zhao You, Dan Su, Dong Yu

First, for each domain, a teacher model (domain-dependent model) is trained by fine-tuning a multi-condition model with domain-specific subset.

Automatic Speech Recognition

Knowledge-aware Pronoun Coreference Resolution

1 code implementation ACL 2019 Hongming Zhang, Yan Song, Yangqiu Song, Dong Yu

Resolving pronoun coreference requires knowledge support, especially for particular domains (e. g., medicine).

Coreference Resolution Knowledge Graphs

Learning Word Embeddings with Domain Awareness

1 code implementation7 Jun 2019 Guoyin Wang, Yan Song, Yue Zhang, Dong Yu

Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge.

Learning Word Embeddings

BLCU\_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation

no code implementations SEMEVAL 2019 Ruoyao Yang, Wanying Xie, Chunhua Liu, Dong Yu

Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7.

Rumour Detection

BLCU\_NLP at SemEval-2019 Task 8: A Contextual Knowledge-enhanced GPT Model for Fact Checking

no code implementations SEMEVAL 2019 Wanying Xie, Mengxi Que, Ruoyao Yang, Chunhua Liu, Dong Yu

For contextual knowledge enhancement, we extend the training set of subtask A, use several features to improve the results of our system and adapt the input formats to be more suitable for this task.

Community Question Answering Fact Checking

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

2 code implementations ACL 2019 Kun Xu, Li-Wei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu

Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs.

Entity Embeddings Graph Attention +1

A comprehensive study of speech separation: spectrogram vs waveform separation

no code implementations17 May 2019 Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.

Speech Recognition Speech Separation

Learning discriminative features in sequence training without requiring framewise labelled data

no code implementations16 May 2019 Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu

We propose a novel method which simultaneously models both the sequence discriminative training and the feature discriminative learning within a single network architecture, so that it can learn discriminative deep features in sequence training that obviates the need for presegmented training data.

End-to-End Multi-Channel Speech Separation

no code implementations15 May 2019 Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Encrypted Speech Recognition using Deep Polynomial Networks

no code implementations11 May 2019 Shi-Xiong Zhang, Yifan Gong, Dong Yu

One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way.

Frame Speech Recognition +1

Time Domain Audio Visual Speech Separation

no code implementations7 Apr 2019 Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

Improving Question Answering with External Knowledge

1 code implementation WS 2019 Xiaoman Pan, Kai Sun, Dian Yu, Jianshu Chen, Heng Ji, Claire Cardie, Dong Yu

We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus.

Multiple-choice Question Answering

DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

1 code implementation1 Feb 2019 Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, Claire Cardie

DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.

Dialogue Understanding Multiple-choice +1

From Plots to Endings: A Reinforced Pointer Generator for Story Ending Generation

no code implementations11 Jan 2019 Yan Zhao, Lu Liu, Chunhua Liu, Ruoyao Yang, Dong Yu

We introduce a new task named Story Ending Generation (SEG), whic-h aims at generating a coherent story ending from a sequence of story plot.

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

no code implementations8 Jan 2019 Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu

From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.

Reading Comprehension

DEMN: Distilled-Exposition Enhanced Matching Network for Story Comprehension

no code implementations PACLIC 2018 Chunhua Liu, Haiou Zhang, Shan Jiang, Dong Yu

We divide a complete story into three narrative segments: an \textit{exposition}, a \textit{climax}, and an \textit{ending}.

Cloze Test

Multi-turn Inference Matching Network for Natural Language Inference

1 code implementation8 Jan 2019 Chunhua Liu, Shan Jiang, Hainan Yu, Dong Yu

The inference of each turn is performed on the current matching feature and the memory.

Natural Language Inference

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching

no code implementations ICLR 2019 Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu

We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping corpus.

Speech Recognition Unsupervised Speech Recognition

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models

no code implementations8 Nov 2018 Chao Weng, Dong Yu

In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR).

Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information

no code implementations14 Oct 2018 Lisa Fan, Dong Yu, Lu Wang

Sequence-to-sequence (seq2seq) neural models have been actively investigated for abstractive summarization.

Abstractive Text Summarization

XL-NBT: A Cross-lingual Neural Belief Tracking Framework

1 code implementation EMNLP 2018 Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang

Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data.

Transfer Learning

Recent Progresses in Deep Learning based Acoustic Models (Updated)

no code implementations25 Apr 2018 Dong Yu, Jinyu Li

In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.

General Classification Speech Enhancement +1

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

no code implementations31 Aug 2017 Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen

We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs).


Semantic Frame Labeling with Target-based Neural Model

no code implementations SEMEVAL 2017 Yukun Feng, Dong Yu, Jian Xu, Chunhua Liu

This paper explores the automatic learning of distributed representations of the target{'}s context for semantic frame labeling with target-based neural model.

Feature Engineering Frame +2

Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training

no code implementations19 Jul 2017 Yanmin Qian, Xuankai Chang, Dong Yu

Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech.

Automatic Speech Recognition Speech Separation

Recognizing Multi-talker Speech with Permutation Invariant Training

no code implementations22 Mar 2017 Dong Yu, Xuankai Chang, Yanmin Qian

Our technique is based on permutation invariant training (PIT) for automatic speech recognition (ASR).

Automatic Speech Recognition

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

3 code implementations18 Mar 2017 Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen

We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet).

Deep Clustering Frame +1

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

no code implementations15 Mar 2017 Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, Yi Zhang

Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware.

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 code implementation1 Jul 2016 Dong Yu, Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.

Deep Clustering Speech Separation

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.

Speech Recognition Transfer Learning

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.

Distant Speech Recognition Frame

Deep Neural Networks for Acoustic Modeling in Speech Recognition

no code implementations Signal Processing Magazine 2012 Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.

Frame Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.