Search Results for author: Xu Tan

Found 131 papers, 61 papers with code

Machine Translation With Weakly Paired Bilingual Documents

no code implementations ICLR 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu

Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.

Sentence Translation +1

Non-Autoregressive Sequence Generation

no code implementations ACL 2022 Jiatao Gu, Xu Tan

Non-autoregressive sequence generation (NAR) attempts to generate the entire or partial output sequences in parallel to speed up the generation process and avoid potential issues (e. g., label bias, exposure bias) in autoregressive generation.

ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation

no code implementations ACL 2022 Chang Liu, Xu Tan, Chongyang Tao, Zhenxin Fu, Dongyan Zhao, Tie-Yan Liu, Rui Yan

To enable the chatbot to foresee the dialogue future, we design a beam-search-like roll-out strategy for dialogue future simulation using a typical dialogue generation model and a dialogue selector.

Dialogue Generation Response Generation

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations5 Mar 2024 Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

no code implementations1 Mar 2024 Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, Yujiu Yang

Accordingly, permutation on the training data is considered as a potential solution, since this can make the model predict antecedent words or tokens.

Language Modelling

Beyond Language Models: Byte Models are Digital World Simulators

no code implementations29 Feb 2024 Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing Li, Maosong Sun

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format.

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

no code implementations11 Jan 2024 Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework.

Protein Language Model

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

1 code implementation11 Jan 2024 Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang

EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i. e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents.

CoMoSVC: Consistency Model-based Singing Voice Conversion

no code implementations3 Jan 2024 Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre.

Voice Conversion

Unraveling Key Factors of Knowledge Distillation

no code implementations14 Dec 2023 Jingxuan Wei, Linzhuang Sun, Xu Tan, Bihui Yu, Ruifeng Guo

Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT).

Knowledge Distillation Machine Translation +3

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

no code implementations6 Dec 2023 Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu

Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process.

Speech Synthesis Text-To-Speech Synthesis

GAIA: Zero-shot Talking Avatar Generation

no code implementations26 Nov 2023 Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image.

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

1 code implementation18 Oct 2023 Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks.

Music Classification

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

1 code implementation19 Sep 2023 Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, ZiHao Wang, Lingyun Sun

In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0. 82, 0. 87, 0. 78, and 0. 94 in consistency, rhythmicity, structure, and overall quality, respectively.

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

1 code implementation15 Sep 2023 Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort.

Evolutionary Algorithms

PromptTTS 2: Describing and Generating Voices with Text Prompt

no code implementations5 Sep 2023 Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.

Language Modelling Large Language Model

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

no code implementations9 Aug 2023 Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner.

Style Transfer Talking Face Generation

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations3 Jul 2023 Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Sentence

EmoGen: Eliminating Subjective Bias in Emotional Music Generation

1 code implementation3 Jul 2023 Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

In this paper, we propose EmoGen, an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music, and divides the generation into two stages: emotion-to-attribute mapping with supervised clustering, and attribute-to-music generation with self-supervised learning.

Attribute Clustering +2

Extract and Attend: Improving Entity Translation in Neural Machine Translation

no code implementations4 Jun 2023 Zixin Zeng, Rui Wang, Yichong Leng, Junliang Guo, Xu Tan, Tao Qin, Tie-Yan Liu

Inspired by this translation process, we propose an Extract-and-Attend approach to enhance entity translation in NMT, where the translation candidates of source entities are first extracted from a dictionary and then attended to by the NMT model to generate the target sentence.

Machine Translation NMT +2

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

no code implementations31 May 2023 Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu

Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.

Text Generation

MuseCoco: Generating Symbolic Music from Text

1 code implementation31 May 2023 Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian

In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.

Attribute Audio Generation +1

TranSFormer: Slow-Fast Transformer for Machine Translation

no code implementations26 May 2023 Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu

Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems.

Machine Translation Translation

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation

no code implementations22 May 2023 Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds.

Neural Architecture Search

DiffusionNER: Boundary Diffusion for Named Entity Recognition

2 code implementations22 May 2023 Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans.

Chinese Named Entity Recognition Denoising +4

GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

1 code implementation18 May 2023 Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.

Denoising Music Generation

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

1 code implementation11 May 2023 Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo

In this paper, we propose a "Co"nsistency "Mo"del-based "Speech" synthesis method, CoMoSpeech, which achieve speech synthesis through a single diffusion sampling step while achieving high audio quality.

Denoising Singing Voice Synthesis +1

ResiDual: Transformer with Dual Residual Connections

1 code implementation28 Apr 2023 Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations.

Machine Translation

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

2 code implementations21 Apr 2023 Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun

We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss.

Data Augmentation Information Retrieval +4

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

1 code implementation18 Apr 2023 Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor.

In-Context Learning Speech Synthesis

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

1 code implementation NeurIPS 2023 Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.

Philosophy

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

no code implementations6 Mar 2023 Ruiqing Xue, Yanqing Liu, Lei He, Xu Tan, Linquan Liu, Edward Lin, Sheng Zhao

Neural text-to-speech (TTS) generally consists of cascaded architecture with separately optimized acoustic model and vocoder, or end-to-end architecture with continuous mel-spectrograms or self-extracted speech frames as the intermediate representations to bridge acoustic model and vocoder, which suffers from two limitations: 1) the continuous acoustic frames are hard to predict with phoneme only, and acoustic information like duration or pitch is also needed to solve the one-to-many problem, which is not easy to scale on large scale and noise datasets; 2) to achieve diverse speech output based on continuous speech features, complex VAE or flow-based models are usually required.

Language Modelling Large Language Model +1

A Study on ReLU and Softmax in Transformer

no code implementations13 Feb 2023 Kai Shen, Junliang Guo, Xu Tan, Siliang Tang, Rui Wang, Jiang Bian

This paper sheds light on the following points: 1) Softmax and ReLU use different normalization methods over elements which lead to different variances of results, and ReLU is good at dealing with a large number of key-value slots; 2) FFN and key-value memory are equivalent, and thus the Transformer can be viewed as a memory network where FFNs and self-attention networks are both key-value memories.

Document Translation

N-Gram Nearest Neighbor Machine Translation

no code implementations30 Jan 2023 Rui Lv, Junliang Guo, Rui Wang, Xu Tan, Qi Liu, Tao Qin

Nearest neighbor machine translation augments the Autoregressive Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the similarity between the token-level context representations of the target tokens in the query and the datastore.

Domain Adaptation Machine Translation +2

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

no code implementations30 Jan 2023 Shengmeng Li, Luping Liu, Zenghao Chai, Runnan Li, Xu Tan

Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower error in the estimated noise.

Denoising Image Generation

Regeneration Learning: A Learning Paradigm for Data Generation

no code implementations21 Jan 2023 Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio

Regeneration learning extends the concept of representation learning to data generation tasks, and can be regarded as a counterpart of traditional representation learning, since 1) regeneration learning handles the abstraction (Y') of the target data Y for data generation while traditional representation learning handles the abstraction (X') of source data X for data understanding; 2) both the processes of Y'-->Y in regeneration learning and X-->X' in representation learning can be learned in a self-supervised way (e. g., pre-training); 3) both the mappings from X to Y' in regeneration learning and from X' to Y in representation learning are simpler than the direct mapping from X to Y.

Image Generation Representation Learning +6

WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

1 code implementation11 Jan 2023 Kejun Zhang, Xinda Wu, Tieyao Zhang, Zhijie Huang, Xu Tan, Qihao Liang, Songruoyao Wu, Lingyun Sun

Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally.

Music Generation

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

1 code implementation30 Dec 2022 Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.

Denoising

Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation

1 code implementation19 Dec 2022 Zhujin Gao, Junliang Guo, Xu Tan, Yongxin Zhu, Fang Zhang, Jiang Bian, Linli Xu

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space.

Denoising Machine Translation +2

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

no code implementations9 Dec 2022 Anni Tang, Tianyu He, Xu Tan, Jun Ling, Li Song

More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details.

Neural Rendering Talking Face Generation

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

1 code implementation2 Dec 2022 Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

1 code implementation30 Nov 2022 Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.

Machine Translation Sentence +4

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction

1 code implementation23 Nov 2022 Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin

Since the error rate of the incorrect sentence is usually low (e. g., 10\%), the correction model can only learn to correct on limited error tokens but trivially copy on most tokens (correct tokens), which harms the effective training of error correction.

Sentence speech-recognition +1

PromptTTS: Controllable Text-to-Speech with Text Descriptions

no code implementations22 Nov 2022 Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan

Thus, we develop a text-to-speech (TTS) system (dubbed as PromptTTS) that takes a prompt with both style and content descriptions as input to synthesize the corresponding speech.

Speech Synthesis

Towards Understanding Omission in Dialogue Summarization

1 code implementation14 Nov 2022 Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Qi Zhang, Dongsheng Li, Tao Gui

By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization.

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

1 code implementation19 Oct 2022 Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu

A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e. g., over 10, 000 tokens), and the existing models have shortcomings in generating musical repetition structures.

Music Generation

MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks

1 code implementation30 Aug 2022 Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu

Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form.

Music Generation

StableFace: Analyzing and Improving Motion Stability for Talking Face Generation

no code implementations29 Aug 2022 Jun Ling, Xu Tan, Liyang Chen, Runnan Li, Yuchao Zhang, Sheng Zhao, Li Song

In this paper, we conduct systematic analyses on the motion jittering problem based on a state-of-the-art pipeline that uses 3D face representations to bridge the input audio and output video, and improve the motion stability with a series of effective designs.

Talking Face Generation Video Generation

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

1 code implementation11 Aug 2022 Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan

These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to better and explicitly align the lyric-melody features.

Language Modelling Retrieval

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation

no code implementations NAACL 2022 Kexun Zhang, Rui Wang, Xu Tan, Junliang Guo, Yi Ren, Tao Qin, Tie-Yan Liu

Furthermore, we take the best of both and design a new loss function to better handle the complicated syntactic multi-modality in real-world datasets.

Machine Translation Translation

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

1 code implementation30 May 2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.

Audio Synthesis

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

1 code implementation25 May 2022 Kaitao Song, Yichong Leng, Xu Tan, Yicheng Zou, Tao Qin, Dongsheng Li

Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.

Causal Language Modeling Language Modelling +2

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

3 code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Sentence Speech Synthesis +1

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations1 Apr 2022 Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations31 Mar 2022 Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Revisiting Over-Smoothness in Text to Speech

no code implementations ACL 2022 Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

Then we conduct a comprehensive study on NAR-TTS models that use some advanced modeling methods.

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

no code implementations8 Feb 2022 Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao

In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality.

Denoising

Speech-T: Transducer for Text to Speech and Beyond

no code implementations NeurIPS 2021 Jiawei Chen, Xu Tan, Yichong Leng, Jin Xu, Guihua Wen, Tao Qin, Tie-Yan Liu

Experiments on LJSpeech datasets demonstrate that Speech-T 1) is more robust than the attention based autoregressive TTS model due to its inherent monotonic alignments between text and speech; 2) naturally supports streaming TTS with good voice quality; and 3) enjoys the benefit of joint modeling TTS and ASR in a single network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

1 code implementation25 Oct 2021 Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.

Speech Synthesis

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations8 Oct 2021 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

1 code implementation Findings (EMNLP) 2021 Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

1 code implementation20 Sep 2021 Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu

In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).

Analyzing and Mitigating Interference in Neural Architecture Search

no code implementations29 Aug 2021 Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li

In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar.

Neural Architecture Search Reading Comprehension

A Survey on Low-Resource Neural Machine Translation

no code implementations9 Jul 2021 Rui Wang, Xu Tan, Renqian Luo, Tao Qin, Tie-Yan Liu

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data.

Low-Resource Neural Machine Translation NMT +1

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations6 Jul 2021 Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

1 code implementation ACL 2021 Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu

In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.

Language Modelling

A Survey on Neural Speech Synthesis

1 code implementation29 Jun 2021 Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.

Speech Synthesis

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

2 code implementations Findings (ACL) 2021 Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu

Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding.

Classification Emotion Classification +2

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

1 code implementation NeurIPS 2021 Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

A Rational Inattention Theory of Echo Chamber

no code implementations21 Apr 2021 Lin Hu, Anqi Li, Xu Tan

We develop a rational inattention theory of echo chamber, whereby players gather information about an uncertain state by allocating limited attention capacities across biased primary sources and other players.

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation20 Apr 2021 Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition

no code implementations13 Apr 2021 Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jihong Zhu

The adaptive adjusting term is composed of two complementary factors: 1) quantity factor, which pays more attention to tail classes, and 2) difficulty factor, which adaptively pays more attention to hard instances in the training process.

General Classification Semantic Similarity +1

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations ICLR 2021 Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

no code implementations25 Feb 2021 Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu

In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

3 code implementations ICLR 2021 Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, Shi Gu

To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity.

Image Classification object-detection +2

Task-Agnostic and Adaptive-Size BERT Compression

no code implementations1 Jan 2021 Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu

NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.

Language Modelling Model Compression +1

Denoising Text to Speech with Frame-Level Noise Modeling

no code implementations17 Dec 2020 Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu

In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.

Denoising

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

1 code implementation9 Dec 2020 Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry.

Sentence

Speech enhancement aided end-to-end multi-task learning for voice activity detection

no code implementations23 Oct 2020 Xu Tan, Xiao-Lei Zhang

Recent studies show that speech enhancement is helpful to VAD, but the performance improvement is limited.

Action Detection Activity Detection +3

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

1 code implementation3 Sep 2020 Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling.

Singing Voice Synthesis Vocal Bursts Intensity Prediction

PopMAG: Pop Music Accompaniment Generation

1 code implementation18 Aug 2020 Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.

Music Modeling

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

no code implementations9 Aug 2020 Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu

However, there are more than 6, 000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Neural Machine Translation with Error Correction

1 code implementation21 Jul 2020 Kaitao Song, Xu Tan, Jianfeng Lu

Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inference as well as error propagation, and affects the translation accuracy.

Machine Translation NMT +1

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

no code implementations9 Jul 2020 Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers.

Sentence Singing Voice Synthesis

Accuracy Prediction with Non-neural Model for Neural Architecture Search

1 code implementation9 Jul 2020 Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction.

Neural Architecture Search

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

UWSpeech: Speech to Speech Translation for Unwritten Languages

no code implementations14 Jun 2020 Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Ke-jun Zhang, Tie-Yan Liu

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training.

speech-recognition Speech Recognition +2

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

no code implementations11 Jun 2020 Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou

This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling.

Singing Voice Synthesis Vocal Bursts Intensity Prediction

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

32 code implementations ICLR 2021 Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

MultiSpeech: Multi-Speaker Text to Speech with Transformer

1 code implementation8 Jun 2020 Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu

Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

no code implementations27 Apr 2020 Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu

While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.

Knowledge Distillation Language Modelling

A Study of Non-autoregressive Model for Sequence Generation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu

In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

MPNet: Masked and Permuted Pre-training for Language Understanding

6 code implementations NeurIPS 2020 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Language Modelling Masked Language Modeling +3

A Study of Multilingual Neural Machine Translation

no code implementations25 Dec 2019 Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e. g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e. g., rich resource and low resource, one-to-many, and many-to-one translation).

Machine Translation NMT +1

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

2 code implementations20 Nov 2019 Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models.

Machine Translation Translation

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

3 code implementations24 Oct 2019 Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

Furthermore, the unified design enables the integration of ASR functions with TTS, e. g., ASR-based objective evaluation and semi-supervised learning with both ASR and TTS models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multilingual Neural Machine Translation with Language Clustering

no code implementations IJCNLP 2019 Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu

We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.

Clustering Machine Translation +2

Efficient Bidirectional Neural Machine Translation

no code implementations25 Aug 2019 Xu Tan, Yingce Xia, Lijun Wu, Tao Qin

In this paper, we propose an efficient method to generate a sequence in both left-to-right and right-to-left manners using a single encoder and decoder, combining the advantages of both generation directions.

Machine Translation Translation

Language Graph Distillation for Low-Resource Machine Translation

no code implementations17 Aug 2019 Tianyu He, Jiale Chen, Xu Tan, Tao Qin

Neural machine translation on low-resource language is challenging due to the lack of bilingual sentence pairs.

Knowledge Distillation Machine Translation +3

Representation Degeneration Problem in Training Natural Language Generation Models

1 code implementation ICLR 2019 Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.

Language Modelling Machine Translation +3

Unsupervised Pivot Translation for Distant Languages

no code implementations ACL 2019 Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.

Machine Translation NMT +1

FastSpeech: Fast,Robustand Controllable Text-to-Speech

11 code implementations22 May 2019 Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Text-To-Speech Synthesis

FastSpeech: Fast, Robust and Controllable Text to Speech

21 code implementations NeurIPS 2019 Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Ranked #10 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Speech Synthesis Text-To-Speech Synthesis

Almost Unsupervised Text to Speech and Automatic Speech Recognition

no code implementations13 May 2019 Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MASS: Masked Sequence to Sequence Pre-training for Language Generation

7 code implementations7 May 2019 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Pre-training and fine-tuning, e. g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks.

Conversational Response Generation Response Generation +5

Multilingual Neural Machine Translation with Knowledge Distillation

1 code implementation ICLR 2019 Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.

Knowledge Distillation Machine Translation +1

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

no code implementations23 Dec 2018 Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.

Machine Translation Sentence +2

Hybrid Self-Attention Network for Machine Translation

no code implementations1 Nov 2018 Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu

The encoder-decoder is the typical framework for Neural Machine Translation (NMT), and different structures have been developed for improving the translation performance.

Machine Translation NMT +1

FRAGE: Frequency-Agnostic Word Representation

2 code implementations NeurIPS 2018 Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.

Language Modelling Machine Translation +5

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter

no code implementations EMNLP 2018 Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.

Machine Translation Sentence +2

Model-Level Dual Learning

no code implementations ICML 2018 Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu

Many artificial intelligence tasks appear in dual forms like English$\leftrightarrow$French translation and speech$\leftrightarrow$text transformation.

Machine Translation Sentiment Analysis +1

Double Path Networks for Sequence to Sequence Learning

1 code implementation COLING 2018 Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Dense Information Flow for Neural Machine Translation

1 code implementation NAACL 2018 Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.

Machine Translation NMT +1

Cannot find the paper you are looking for? You can Submit a new open access paper.