Search Results for author: Tie-Yan Liu

Found 247 papers, 86 papers with code

Machine Translation With Weakly Paired Bilingual Documents

no code implementations ICLR 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu

Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.

Translation Unsupervised Machine Translation

Finding the Dominant Winning Ticket in Pre-Trained Language Models

no code implementations Findings (ACL) 2022 Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan

Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.

ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation

no code implementations ACL 2022 Chang Liu, Xu Tan, Chongyang Tao, Zhenxin Fu, Dongyan Zhao, Tie-Yan Liu, Rui Yan

To enable the chatbot to foresee the dialogue future, we design a beam-search-like roll-out strategy for dialogue future simulation using a typical dialogue generation model and a dialogue selector.

Dialogue Generation Response Generation

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

no code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

1 code implementation20 Apr 2022 Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-Yan Liu

While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, auto-regressive (AR) generation.

Automatic Speech Recognition Dialogue Generation +5

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

1 code implementation13 Apr 2022 Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations1 Apr 2022 Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

Multi-View Substructure Learning for Drug-Drug Interaction Prediction

no code implementations28 Mar 2022 Zimeng Li, Shichao Zhu, Bin Shao, Tie-Yan Liu, Xiangxiang Zeng, Tong Wang

Drug-drug interaction (DDI) prediction provides a drug combination strategy for systemically effective treatment.

DEPTS: Deep Expansion Learning for Periodic Time Series Forecasting

1 code implementation ICLR 2022 Wei Fan, Shun Zheng, Xiaohan Yi, Wei Cao, Yanjie Fu, Jiang Bian, Tie-Yan Liu

However, the complicated dependencies of the PTS signal on its inherent periodicity as well as the sophisticated composition of various periods hinder the performance of PTS forecasting.

Time Series Time Series Forecasting

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

1 code implementation9 Mar 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

no code implementations28 Feb 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

Revisiting Over-Smoothness in Text to Speech

no code implementations ACL 2022 Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

Then we conduct a comprehensive study on NAR-TTS models that use some advanced modeling methods.

Learning Physics-Informed Neural Networks without Stacked Back-propagation

no code implementations18 Feb 2022 Di He, Wenlei Shi, Shanda Li, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu

Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE).

Dynamic Relation Discovery and Utilization in Multi-Entity Time Series Forecasting

no code implementations18 Feb 2022 Lin Huang, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu

How to discover the useful implicit relation between entities and effectively utilize the relations for each entity under various circumstances is crucial.

Graph Learning Time Series +1

AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation

no code implementations18 Feb 2022 Lin Huang, Qiyuan Dong, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu

As a specific semantic segmentation task, aerial imagery segmentation has been widely employed in high spatial resolution (HSR) remote sensing images understanding.

Semantic Segmentation

Direct Molecular Conformation Generation

1 code implementation3 Feb 2022 Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Tong Wang, Yusong Wang, Wengang Zhou, Tao Qin, Houqiang Li, Tie-Yan Liu

In this work, we propose a method that directly predicts the coordinates of atoms.

SHGNN: Structure-Aware Heterogeneous Graph Neural Network

1 code implementation12 Dec 2021 Wentao Xu, Yingce Xia, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu

Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path.

Graph Embedding Node Classification

KGE-CL: Contrastive Learning of Knowledge Graph Embeddings

1 code implementation9 Dec 2021 Wentao Xu, Zhiping Luo, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu

To address this problem, we propose a simple yet efficient contrastive learning framework for knowledge graph embeddings, which can shorten the semantic distance of the related entities and entity-relation couples in different triples and thus improve the expressiveness of knowledge graph embeddings.

Knowledge Graph Embedding Knowledge Graph Embeddings +4

Stylized Dialogue Generation with Multi-Pass Dual Learning

1 code implementation NeurIPS 2021 Jinpeng Li, Yingce Xia, Rui Yan, Hongda Sun, Dongyan Zhao, Tie-Yan Liu

Considering there is no parallel data between the contexts and the responses of target style S1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S1 and an intermediate style S0 is used.

Dialogue Generation

Curriculum Offline Imitating Learning

no code implementations NeurIPS 2021 Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu

However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies.

Continuous Control Imitation Learning +1

Speech-T: Transducer for Text to Speech and Beyond

no code implementations NeurIPS 2021 Jiawei Chen, Xu Tan, Yichong Leng, Jin Xu, Guihua Wen, Tao Qin, Tie-Yan Liu

Experiments on LJSpeech datasets demonstrate that Speech-T 1) is more robust than the attention based autoregressive TTS model due to its inherent monotonic alignments between text and speech; 2) naturally supports streaming TTS with good voice quality; and 3) enjoys the benefit of joint modeling TTS and ASR in a single network.

Automatic Speech Recognition

Co-evolution Transformer for Protein Contact Prediction

1 code implementation NeurIPS 2021 He Zhang, Fusong Ju, Jianwei Zhu, Liang He, Bin Shao, Nanning Zheng, Tie-Yan Liu

These methods generally derive coevolutionary features by aggregating the learned residue representations from individual sequences with equal weights, which is inconsistent with the premise that residue co-evolutions are a reflection of collective covariation patterns of numerous homologous proteins.

Recovering Latent Causal Factor for Generalization to Distributional Shifts

1 code implementation NeurIPS 2021 Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu

To avoid such a spurious correlation, we propose \textbf{La}tent \textbf{C}ausal \textbf{I}nvariance \textbf{M}odels (LaCIM) that specifies the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction.

Do Transformers Really Perform Badly for Graph Representation?

no code implementations NeurIPS 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Curriculum Offline Imitation Learning

1 code implementation3 Nov 2021 Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu

However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies.

Continuous Control Imitation Learning +1

Indiscriminate Poisoning Attacks Are Shortcuts

no code implementations1 Nov 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We find that the perturbations of advanced poisoning attacks are almost \textbf{linear separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.

Data Poisoning

Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model

no code implementations29 Oct 2021 Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu

The key problem in the protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.

Language Modelling Multiple Sequence Alignment +1

HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information

1 code implementation26 Oct 2021 Wentao Xu, Weiqing Liu, Lewen Wang, Yingce Xia, Jiang Bian, Jian Yin, Tie-Yan Liu

To overcome the shortcomings of previous work, we proposed a novel stock trend forecasting framework that can adequately mine the concept-oriented shared information from predefined concepts and hidden concepts.

Equivariant vector field network for many-body system modeling

no code implementations26 Oct 2021 Weitao Du, He Zhang, Yuanqi Du, Qi Meng, Wei Chen, Bin Shao, Tie-Yan Liu

Some general equivariant models which are computationally efficient have been proposed, however, these models have no guarantee on the approximation power and may have information loss.

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

no code implementations NeurIPS 2021 Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu

To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources.

Distributional Reinforcement Learning reinforcement-learning

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Improved Drug-target Interaction Prediction with Intermolecular Graph Transformer

no code implementations14 Oct 2021 Siyuan Liu, Yusong Wang, Tong Wang, Yifan Deng, Liang He, Bin Shao, Jian Yin, Nanning Zheng, Tie-Yan Liu

The identification of active binding drugs for target proteins (termed as drug-target interaction prediction) is the key challenge in virtual screening, which plays an essential role in drug discovery.

Drug Discovery Pose Prediction

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

no code implementations Findings (EMNLP) 2021 Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.

Automatic Speech Recognition

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations29 Sep 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Multi-Armed Bandits

Target-Side Data Augmentation for Sequence Generation

1 code implementation ICLR 2022 Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Rui Yan, Tie-Yan Liu

Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens.

Abstractive Text Summarization Data Augmentation +2

Particle Based Stochastic Policy Optimization

no code implementations29 Sep 2021 Qiwei Ye, Yuxuan Song, Chang Liu, Fangyun Wei, Tao Qin, Tie-Yan Liu

Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification.

MuJoCo Games Offline RL

Multi-Agent Reinforcement Learning with Shared Resource in Inventory Management

no code implementations29 Sep 2021 Mingxiao Feng, Guozi Liu, Li Zhao, Lei Song, Jiang Bian, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

We consider inventory management (IM) problem for a single store with a large number of SKUs (stock keeping units) in this paper, where we need to make replenishment decisions for each SKU to balance its supply and demand.

Multi-agent Reinforcement Learning reinforcement-learning

Discovering Drug-Target Interaction Knowledge from Biomedical Literature

no code implementations27 Sep 2021 Yutai Hou, Yingce Xia, Lijun Wu, Shufang Xie, Yang Fan, Jinhua Zhu, Wanxiang Che, Tao Qin, Tie-Yan Liu

We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

no code implementations20 Sep 2021 Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu

In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).

Instance-wise Graph-based Framework for Multivariate Time Series Forecasting

1 code implementation14 Sep 2021 Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu

In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting.

Multivariate Time Series Forecasting Time Series

Analyzing and Mitigating Interference in Neural Architecture Search

no code implementations29 Aug 2021 Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li

Weight sharing has become the \textit{de facto} approach to reduce the training cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models.

Neural Architecture Search

A Survey on Low-Resource Neural Machine Translation

no code implementations9 Jul 2021 Rui Wang, Xu Tan, Renqian Luo, Tao Qin, Tie-Yan Liu

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data.

Low-Resource Neural Machine Translation Translation

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations6 Jul 2021 Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

Causally Invariant Predictor with Shift-Robustness

no code implementations5 Jul 2021 Xiangyu Zheng, Xinwei Sun, Wei Chen, Tie-Yan Liu

Instead of imposing regularizations to constrain the invariance of the predictor, we propose to predict by the intervened conditional expectation based on the do-operator and then prove that it is invariant across domains.

Causal Discovery

Supervised Off-Policy Ranking

1 code implementation3 Jul 2021 Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu

Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.

On the Generative Utility of Cyclic Conditionals

1 code implementation NeurIPS 2021 Chang Liu, Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu

This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch.

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations29 Jun 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

A Survey on Neural Speech Synthesis

4 code implementations29 Jun 2021 Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.

Speech Synthesis

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations NeurIPS 2021 Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

Dual-view Molecule Pre-training

no code implementations17 Jun 2021 Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.

Molecular Property Prediction Single-step retrosynthesis

Large Scale Private Learning via Low-rank Reparametrization

1 code implementation17 Jun 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

1 code implementation Findings (ACL) 2021 Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu

Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding.

Classification Emotion Classification +2

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations9 Jun 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Classification Graph Regression +1

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

no code implementations8 Jun 2021 Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Machine-Learning Non-Conservative Dynamics for New-Physics Detection

no code implementations31 May 2021 Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu

Energy conservation is a basic physics principle, the breakdown of which often implies new physics.

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation31 May 2021 Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Learning Structures for Deep Neural Networks

no code implementations27 May 2021 Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu

We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy.

Classification Image Classification

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

How could Neural Networks understand Programs?

1 code implementation10 May 2021 Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

1 code implementation NeurIPS 2021 Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.

14 Automatic Speech Recognition +2

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation20 Apr 2021 Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

Impact of pandemic fatigue on the spread of COVID-19: a mathematical modelling study

no code implementations9 Apr 2021 Disheng Tang, Wei Cao, Jiang Bian, Tie-Yan Liu, Zhifeng Gao, Shun Zheng, Jue Liu

We used a stochastic metapopulation model with a hierarchical structure and fitted the model to the positive cases in the US from the start of outbreak to the end of 2020.

IOT: Instance-wise Layer Reordering for Transformer Structures

1 code implementation ICLR 2021 Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.

Abstractive Text Summarization Code Generation +2

Learning Invariant Representations across Domains and Tasks

no code implementations3 Mar 2021 Jindong Wang, Wenjie Feng, Chang Liu, Chaohui Yu, Mingxuan Du, Renjun Xu, Tao Qin, Tie-Yan Liu

Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification.

Domain Adaptation Image Classification +1

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations ICLR 2021 Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

2 code implementations ICLR 2021 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.

LazyFormer: Self Attention with Lazy Update

no code implementations25 Feb 2021 Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu

In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.

Return-Based Contrastive Representation Learning for Reinforcement Learning

no code implementations ICLR 2021 Guoqing Liu, Chuheng Zhang, Li Zhao, Tao Qin, Jinhua Zhu, Jian Li, Nenghai Yu, Tie-Yan Liu

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL).

Atari Games reinforcement-learning +1

Revisiting Language Encoding in Learning Multilingual Representations

1 code implementation16 Feb 2021 Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu

The language embedding can be either added to the word embedding or attached at the beginning of the sentence.

Word Embeddings

REST: Relational Event-driven Stock Trend Forecasting

no code implementations15 Feb 2021 Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, Tie-Yan Liu

To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts.

Universal Trading for Order Execution with Oracle Policy Distillation

no code implementations28 Jan 2021 Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, Tie-Yan Liu

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument.

Algorithmic Trading reinforcement-learning

BN-invariant sharpness regularizes the training model to better generalization

no code implementations8 Jan 2021 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

Learning to Use Future Information in Simultaneous Translation

1 code implementation1 Jan 2021 Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu

For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i. e., using more future information for training than inference) generally outperforms wait-k training.

Machine Translation Translation

Task-Agnostic and Adaptive-Size BERT Compression

no code implementations1 Jan 2021 Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu

NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.

Language Modelling Model Compression +1

On the Stability of Multi-branch Network

no code implementations1 Jan 2021 Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu

More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.

Taking Notes on the Fly Helps Language Pre-Training

no code implementations ICLR 2021 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations

1 code implementation24 Dec 2020 Wenlei Shi, Xinran Wei, Jia Zhang, Xiaoyuan Ni, Arthur Jiang, Jiang Bian, Tie-Yan Liu

While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward.

Graph Attention Multi-agent Reinforcement Learning

Denoising Text to Speech with Frame-Level Noise Modeling

no code implementations17 Dec 2020 Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu

In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.

Denoising Frame

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

1 code implementation11 Dec 2020 Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.

RD$^2$: Reward Decomposition with Representation Decomposition

no code implementations NeurIPS 2020 Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu

In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards.

Latent Causal Invariant Model

no code implementations4 Nov 2020 Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu

To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.

Disentanglement

Learning Causal Semantic Representation for Out-of-Distribution Prediction

1 code implementation NeurIPS 2021 Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output.

Domain Adaptation

COSEA: Convolutional Code Search with Layer-wise Attention

no code implementations19 Oct 2020 Hao Wang, Jia Zhang, Yingce Xia, Jiang Bian, Chao Zhang, Tie-Yan Liu

However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes.

Code Search

Qlib: An AI-oriented Quantitative Investment Platform

1 code implementation22 Sep 2020 Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, Tie-Yan Liu

Quantitative investment aims to maximize the return and minimize the risk in a sequential trading period over a set of financial instruments.

Portfolio Optimization Stock Market Prediction

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation7 Sep 2020 Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Graph Classification Graph Representation Learning

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

no code implementations3 Sep 2020 Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling.

PopMAG: Pop Music Accompaniment Generation

1 code implementation18 Aug 2020 Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.

Music Modeling

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

no code implementations9 Aug 2020 Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu

However, there are more than 6, 000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages.

Automatic Speech Recognition Knowledge Distillation +1

Taking Notes on the Fly Helps BERT Pre-training

no code implementations4 Aug 2020 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

How Does Data Augmentation Affect Privacy in Machine Learning?

1 code implementation21 Jul 2020 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.

Data Augmentation

Learning to Match Distributions for Domain Adaptation

1 code implementation17 Jul 2020 Chaohui Yu, Jindong Wang, Chang Liu, Tao Qin, Renjun Xu, Wenjie Feng, Yiqiang Chen, Tie-Yan Liu

However, it remains challenging to determine which method is suitable for a given application since they are built with certain priors or bias.

Domain Adaptation

Temporally Correlated Task Scheduling for Sequence Learning

2 code implementations10 Jul 2020 Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu

In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict.

Machine Translation Translation

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

no code implementations9 Jul 2020 Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers.

Accuracy Prediction with Non-neural Model for Neural Architecture Search

1 code implementation9 Jul 2020 Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction.

Neural Architecture Search

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently.

Automatic Speech Recognition Knowledge Distillation +3

Rethinking Positional Encoding in Language Pre-training

2 code implementations ICLR 2021 Guolin Ke, Di He, Tie-Yan Liu

In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.

Natural Language Understanding Word Embeddings

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

no code implementations24 Jun 2020 Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.

Modeling Lost Information in Lossy Image Compression

no code implementations22 Jun 2020 Yaolong Wang, Mingqing Xiao, Chang Liu, Shuxin Zheng, Tie-Yan Liu

Specifically, ILC introduces an invertible encoding module to replace the encoder-decoder structure to produce the low dimensional informative latent representation, meanwhile, transform the lost information into an auxiliary latent variable that won't be further coded or stored.

Image Compression

Multi-branch Attentive Transformer

1 code implementation18 Jun 2020 Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.

Code Generation Machine Translation +2

UWSpeech: Speech to Speech Translation for Unwritten Languages

no code implementations14 Jun 2020 Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Ke-jun Zhang, Tie-Yan Liu

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training.

Speech Recognition Speech-to-Speech Translation +1

MC-BERT: Efficient Language Pre-Training via a Meta Controller

1 code implementation10 Jun 2020 Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu

Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.

Cloze Test Language Modelling +3

MultiSpeech: Multi-Speaker Text to Speech with Transformer

no code implementations8 Jun 2020 Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu

Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

25 code implementations ICLR 2021 Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Knowledge Distillation Speech Synthesis

Dual Learning: Theoretical Study and an Algorithmic Extension

no code implementations17 May 2020 Zhibing Zhao, Yingce Xia, Tao Qin, Lirong Xia, Tie-Yan Liu

Dual learning has been successfully applied in many machine learning applications including machine translation, image-to-image transformation, etc.

14 Machine Translation +1

Invertible Image Rescaling

3 code implementations ECCV 2020 Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.

Image Super-Resolution

SEEK: Segmented Embedding of Knowledge Graphs

1 code implementation ACL 2020 Wentao Xu, Shun Zheng, Liang He, Bin Shao, Jian Yin, Tie-Yan Liu

In recent years, knowledge graph embedding becomes a pretty hot research topic of artificial intelligence and plays increasingly vital roles in various downstream applications, such as recommendation and question answering.

Knowledge Graph Embedding Knowledge Graphs +2

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

no code implementations27 Apr 2020 Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu

While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.

Knowledge Distillation Language Modelling

A Study of Non-autoregressive Model for Sequence Generation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu

In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?

Automatic Speech Recognition Knowledge Distillation +1

MPNet: Masked and Permuted Pre-training for Language Understanding

6 code implementations NeurIPS 2020 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Language Modelling Masked Language Modeling

Suphx: Mastering Mahjong with Deep Reinforcement Learning

no code implementations30 Mar 2020 Junjie Li, Sotetsu Koyamada, Qiwei Ye, Guoqing Liu, Chao Wang, Ruihan Yang, Li Zhao, Tao Qin, Tie-Yan Liu, Hsiao-Wuen Hon

Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI.

reinforcement-learning

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

no code implementations ICLR Workshop DeepDiffEq 2019 Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu

In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.

Incorporating BERT into Neural Machine Translation

3 code implementations ICLR 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Natural Language Understanding Reading Comprehension +3

A Study of Multilingual Neural Machine Translation

no code implementations25 Dec 2019 Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e. g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e. g., rich resource and low resource, one-to-many, and many-to-one translation).

Machine Translation Translation

Neural Machine Translation with Soft Prototype

1 code implementation NeurIPS 2019 Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu

Neural machine translation models usually use the encoder-decoder framework and generate translation from left to right (or right to left) without fully utilizing the target-side global information.

Machine Translation Translation

Normalization Helps Training of Quantized LSTM

1 code implementation NeurIPS 2019 Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.

Quantization

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

no code implementations26 Nov 2019 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin

By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

2 code implementations20 Nov 2019 Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models.

Machine Translation Translation

Distributional Reward Decomposition for Reinforcement Learning

no code implementations NeurIPS 2019 Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Guangwen Yang, Tie-Yan Liu

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel.

reinforcement-learning

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

4 code implementations NeurIPS 2019 Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu

The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.

Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)

Atari Games Distributional Reinforcement Learning +1

Exploiting Monolingual Data at Scale for Neural Machine Translation

no code implementations IJCNLP 2019 Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them.

 Ranked #1 on Machine Translation on WMT2016 English-German (SacreBLEU metric, using extra training data)

Machine Translation Translation

Machine Translation With Weakly Paired Documents

no code implementations IJCNLP 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.

Translation Unsupervised Machine Translation

Path Space for Recurrent Neural Networks with ReLU Activations

no code implementations25 Sep 2019 Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations25 Sep 2019 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

P-BN: Towards Effective Batch Normalization in the Path Space

no code implementations25 Sep 2019 Xufang Luo, Qi Meng, Wei Chen, Tie-Yan Liu

Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space.

Independence-aware Advantage Estimation

no code implementations25 Sep 2019 Pushi Zhang, Li Zhao, Guoqing Liu, Jiang Bian, Minglie Huang, Tao Qin, Tie-Yan Liu

Most of existing advantage function estimation methods in reinforcement learning suffer from the problem of high variance, which scales unfavorably with the time horizon.

reinforcement-learning

Demonstration Actor Critic

no code implementations25 Sep 2019 Guoqing Liu, Li Zhao, Pushi Zhang, Jiang Bian, Tao Qin, Nenghai Yu, Tie-Yan Liu

One approach leverages demonstration data in a supervised manner, which is simple and direct, but can only provide supervision signal over those states seen in the demonstrations.

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations25 Sep 2019 Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

Hint-Based Training for Non-Autoregressive Machine Translation

1 code implementation IJCNLP 2019 Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.

Machine Translation Translation

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

1 code implementation8 Sep 2019 Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, Tie-Yan Liu

To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers.

Classification General Classification +1

Multilingual Neural Machine Translation with Language Clustering

no code implementations IJCNLP 2019 Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu

We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.

Machine Translation Translation

LightMC: A Dynamic and Efficient Multiclass Decomposition Algorithm

no code implementations25 Aug 2019 Ziyu Liu, Guolin Ke, Jiang Bian, Tie-Yan Liu

Instead of using fixed coding matrix and decoding strategy, LightMC uses a differentiable decoding strategy, which enables it to dynamically optimize the coding matrix and decoding strategy, toward increasing the overall accuracy of multiclass classification, via back propagation jointly with the training of base learners in an iterative way.

Classification General Classification

Representation Degeneration Problem in Training Natural Language Generation Models

no code implementations ICLR 2019 Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.

Language Modelling Machine Translation +3

Light Multi-segment Activation for Model Compression

2 code implementations16 Jul 2019 Zhenhui Xu, Guolin Ke, Jia Zhang, Jiang Bian, Tie-Yan Liu

Inspired by the nature of the expressiveness ability in Neural Networks, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model.

Knowledge Distillation Model Compression +1

Depth Growing for Neural Machine Translation

1 code implementation ACL 2019 Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem.

14 Machine Translation +2

Unsupervised Pivot Translation for Distant Languages

no code implementations ACL 2019 Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.

Machine Translation Translation

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

2 code implementations ICLR 2020 Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu

In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations29 May 2019 Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

no code implementations28 May 2019 Yufei Wang, Qiwei Ye, Tie-Yan Liu

In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy.

Atari Games Meta-Learning +1

Soft Contextual Data Augmentation for Neural Machine Translation

1 code implementation ACL 2019 Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xue-Qi Cheng, Tie-Yan Liu

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited.

Data Augmentation Language Modelling +2

FastSpeech: Fast,Robustand Controllable Text-to-Speech

10 code implementations22 May 2019 Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Text-To-Speech Synthesis

Almost Unsupervised Text to Speech and Automatic Speech Recognition

no code implementations13 May 2019 Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.

Automatic Speech Recognition Denoising