Search Results for author: Tie-Yan Liu

Found 283 papers, 118 papers with code

Global Ranking Using Continuous Conditional Random Fields

no code implementations • NeurIPS 2008 • Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, De-Sheng Wang, Hang Li

It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking.

Information Retrieval Learning-To-Rank +1

Paper
Add Code

Ranking Measures and Loss Functions in Learning to Rank

no code implementations • NeurIPS 2009 • Wei Chen, Tie-Yan Liu, Yanyan Lan, Zhi-Ming Ma, Hang Li

We show that these loss functions are upper bounds of the measure-based ranking errors.

General Classification Learning-To-Rank

Paper
Add Code

Statistical Consistency of Top-k Ranking

no code implementations • NeurIPS 2009 • Fen Xia, Tie-Yan Liu, Hang Li

This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting.

Information Retrieval Retrieval

Paper
Add Code

A New Probabilistic Model for Rank Aggregation

no code implementations • NeurIPS 2010 • Tao Qin, Xiubo Geng, Tie-Yan Liu

To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process.

Paper
Add Code

Two-Layer Generalization Analysis for Ranking Using Rademacher Average

no code implementations • NeurIPS 2010 • Wei Chen, Tie-Yan Liu, Zhi-Ming Ma

sampling of queries and the conditional i. i. d sampling of documents per query.

Generalization Bounds Information Retrieval +3

Paper
Add Code

Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

no code implementations • NeurIPS 2012 • Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-Yan Liu

This paper is concerned with the statistical consistency of ranking methods.

Paper
Add Code

A Theoretical Analysis of NDCG Type Ranking Measures

no code implementations • 24 Apr 2013 • Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen

We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.

Vocal Bursts Type Prediction

Paper
Add Code

Pure Price of Anarchy for Generalized Second Price Auction

no code implementations • 23 May 2013 • Wenkui Ding, Tao Wu, Tao Qin, Tie-Yan Liu

Previous studies have shown that the pure Price Of Anarchy (POA) of GSP is 1. 25 when there are two ad slots and 1. 259 when three ad slots.

Computer Science and Game Theory

Paper
Add Code

Introducing LETOR 4.0 Datasets

3 code implementations • 9 Jun 2013 • Tao Qin, Tie-Yan Liu

We call the two query sets MQ2007 and MQ2008 for short.

Learning-To-Rank

441

Paper
Code

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

no code implementations • NeurIPS 2013 • Min Xu, Tao Qin, Tie-Yan Liu

In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem.

Selection bias

Paper
Add Code

Agent Behavior Prediction and Its Generalization Analysis

no code implementations • 19 Apr 2014 • Fei Tian, Haifang Li, Wei Chen, Tao Qin, Enhong Chen, Tie-Yan Liu

Then we prove a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain, which depends on both the Markovian parameters and the covering number of the function class compounded by the loss function for behavior prediction and the behavior prediction model.

BIG-bench Machine Learning

Paper
Add Code

Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks

no code implementations • 23 Apr 2014 • Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, Tie-Yan Liu

Click prediction is one of the fundamental problems in sponsored search.

Paper
Add Code

A Game-theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search

no code implementations • 3 Jun 2014 • Di He, Wei Chen, Li-Wei Wang, Tie-Yan Liu

Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers.

BIG-bench Machine Learning Bilevel Optimization

Paper
Add Code

WordRep: A Benchmark for Research on Learning Word Representations

no code implementations • 7 Jul 2014 • Bin Gao, Jiang Bian, Tie-Yan Liu

In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding.

Word Embeddings

Paper
Add Code

KNET: A General Framework for Learning Word Embedding using Morphological Knowledge

no code implementations • 7 Jul 2014 • Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, Tie-Yan Liu

In particular, we introduce a novel neural network architecture called KNET that leverages both contextual information and morphological word similarity built based on morphological knowledge to learn word embeddings.

Information Retrieval Retrieval +2

Paper
Add Code

Co-learning of Word Representations and Morpheme Representations

no code implementations • COLING 2014 • Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, Tie-Yan Liu

Language Modelling Learning Word Embeddings

Paper
Add Code

A Probabilistic Model for Learning Multi-Prototype Word Embeddings

no code implementations • COLING 2014 • Fei Tian, Hanjun Dai, Jiang Bian, Bin Gao, Rui Zhang, Enhong Chen, Tie-Yan Liu

Word Embeddings

Paper
Add Code

Generalization Analysis for Game-Theoretic Machine Learning

no code implementations • 9 Oct 2014 • Haifang Li, Fei Tian, Wei Chen, Tao Qin, Tie-Yan Liu

For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e. g., auction) since self-interested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms.

BIG-bench Machine Learning

Paper
Add Code

LightLDA: Big Topic Models on Modest Compute Clusters

1 code implementation • 4 Dec 2014 • Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma

When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers.

Topic Models

127

Paper
Code

Thompson Sampling for Budgeted Multi-armed Bandits

no code implementations • 1 May 2015 • Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu

In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph

no code implementations • 19 May 2015 • Fei Tian, Bin Gao, Enhong Chen, Tie-Yan Liu

Although these works have achieved certain success, they have neglected some important facts about knowledge graphs: (i) many relationships in knowledge graphs are \emph{many-to-one}, \emph{one-to-many} or even \emph{many-to-many}, rather than simply \emph{one-to-one}; (ii) most head entities and tail entities in knowledge graphs come from very different semantic spaces.

Knowledge Graphs

Paper
Add Code

Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered Word Embedding

no code implementations • 29 May 2015 • Huazheng Wang, Fei Tian, Bin Gao, Jiang Bian, Tie-Yan Liu

Second, we obtain distributed representations of words and relations by leveraging a novel word embedding method that considers the multi-sense nature of words and the relational knowledge among words (or their senses) contained in dictionaries.

Paper
Add Code

On the Depth of Deep Neural Networks: A Theoretical View

no code implementations • 17 Jun 2015 • Shizhao Sun, Wei Chen, Li-Wei Wang, Xiaoguang Liu, Tie-Yan Liu

First, we derive an upper bound for RA of DNN, and show that it increases with increasing depth.

Paper
Add Code

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves

no code implementations • 7 Apr 2016 • Fei Tian, Bin Gao, Di He, Tie-Yan Liu

We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.

Sentence Short-Text Conversation +1

Paper
Add Code

Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks

no code implementations • 2 Jun 2016 • Shizhao Sun, Wei Chen, Jiang Bian, Xiaoguang Liu, Tie-Yan Liu

In this framework, we propose to aggregate the local models by ensemble, i. e., averaging the outputs of local models instead of the parameters.

Model Compression

Paper
Add Code

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

no code implementations • 27 Sep 2016 • Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

Paper
Add Code

Asynchronous Stochastic Gradient Descent with Delay Compensation

no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu

We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.

Paper
Add Code

Generalization Error Bounds for Optimization Algorithms via Stability

no code implementations • 27 Sep 2016 • Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).

BIG-bench Machine Learning

Paper
Add Code

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

no code implementations • NeurIPS 2016 • Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu

Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets.

Language Modelling Machine Translation

Paper
Add Code

Dual Learning for Machine Translation

1 code implementation • NeurIPS 2016 • Yingce Xia, Di He, Tao Qin, Li-Wei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma

Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).

Language Modelling Machine Translation +4

Paper
Code

Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding

no code implementations • EMNLP 2016 • Huazheng Wang, Fei Tian, Bin Gao, Chengjieren Zhu, Jiang Bian, Tie-Yan Liu

Face Recognition Question Answering +1

Paper
Add Code

A Communication-Efficient Parallel Algorithm for Decision Tree

no code implementations • NeurIPS 2016 • Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.

2k Attribute

Paper
Add Code

Randomized Mechanisms for Selling Reserved Instances in Cloud

no code implementations • 22 Nov 2016 • Jia Zhang, Weidong Ma, Tao Qin, Xiaoming Sun, Tie-Yan Liu

We then extend our mechanism to the general case and achieve a competitive ratio $\frac{1}{42\log k\log T}$ for both social welfare and revenue, where $T$ is the ratio of the maximum request length to the minimum request length and $k$ is the ratio of the maximum request value density to the minimum request value density.

Cloud Computing

Paper
Add Code

Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems

no code implementations • 29 Dec 2016 • Quanming Yao, James T. Kwok, Fei Gao, Wei Chen, Tie-Yan Liu

The proximal gradient algorithm has been popularly used for convex optimization.

Optimization and Control

Paper
Add Code

Learning What Data to Learn

no code implementations • 28 Feb 2017 • Yang Fan, Fei Tian, Tao Qin, Jiang Bian, Tie-Yan Liu

Machine learning is essentially the sciences of playing with data.

BIG-bench Machine Learning Image Classification

Paper
Add Code

Adversarial Neural Machine Translation

no code implementations • 20 Apr 2017 • Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human.

Machine Translation NMT +1

Paper
Add Code

Reinforcement Learning for Learning Rate Control

no code implementations • 31 May 2017 • Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu

Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Word-Entity Duet Representations for Document Ranking

no code implementations • 20 Jun 2017 • Chenyan Xiong, Jamie Callan, Tie-Yan Liu

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval.

Document Ranking Learning-To-Rank +1

Paper
Add Code

Dual Supervised Learning

1 code implementation • ICML 2017 • Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu

Many supervised learning tasks are emerged in dual forms, e. g., English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation.

General Classification Image Classification +6

142

Paper
Code

Large-Scale Low-Rank Matrix Learning with Nonconvex Regularizers

no code implementations • 1 Aug 2017 • Quanming Yao, James T. Kwok, Taifeng Wang, Tie-Yan Liu

Based on it, we develop a proximal gradient algorithm (and its accelerated variant) with inexact proximal splitting and prove that a convergence rate of O(1/T) where T is the number of iterations is guaranteed.

Matrix Completion

Paper
Add Code

Slim-DP: A Light Communication Data Parallelism for DNN

no code implementations • 27 Sep 2017 • Shizhao Sun, Wei Chen, Jiang Bian, Xiaoguang Liu, Tie-Yan Liu

However, with the increasing size of DNN models and the large number of workers in practice, this typical data parallelism cannot achieve satisfactory training acceleration, since it usually suffers from the heavy communication cost due to transferring huge amount of information between workers and the parameter server.

Paper
Add Code

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

no code implementations • 29 Sep 2017 • Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu

First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.

BIG-bench Machine Learning

Paper
Add Code

Decoding with Value Networks for Neural Machine Translation

no code implementations • NeurIPS 2017 • Di He, Hanqing Lu, Yingce Xia, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g., BLEU score) of the partial target sentence if it is completed by the NMT model.

Machine Translation NMT +2

Paper
Add Code

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

1 code implementation • NeurIPS 2017 • Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu

We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.

16,038

Paper
Code

Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

no code implementations • NeurIPS 2017 • Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, Tie-Yan Liu

In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation.

Image Captioning Machine Translation +3

Paper
Add Code

Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction

4 code implementations • 6 Dec 2017 • Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, Tie-Yan Liu

Stock trend prediction plays a critical role in seeking maximized profit from stock investment.

Ranked #16 on Stock Market Prediction on Astock

Paper
Code

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Paper
Add Code

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

no code implementations • 27 Feb 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu

This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic.

Paper
Add Code

Achieving Human Parity on Automatic Chinese to English News Translation

2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou

Machine translation has made rapid advances in recent years.

Ranked #3 on Machine Translation on WMT 2017 English-Chinese

Machine Translation Translation

Paper
Code

Conditional Image-to-Image Translation

no code implementations • CVPR 2018 • Jianxin Lin, Yingce Xia, Tao Qin, Zhibo Chen, Tie-Yan Liu

In this paper, we study a new problem, conditional image-to-image translation, which is to translate an image from the source domain to the target domain conditioned on a given image in the target domain.

Image-to-Image Translation Translation

Paper
Add Code

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

no code implementations • 3 May 2018 • Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Tie-Yan Liu

The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents.

Retrieval

Paper
Add Code

Differential Equations for Modeling Asynchronous Algorithms

no code implementations • 8 May 2018 • Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.

Paper
Add Code

Learning to Teach

no code implementations • ICLR 2018 • Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations.

BIG-bench Machine Learning Image Classification

Paper
Add Code

Efficient Sequence Learning with Group Recurrent Networks

no code implementations • NAACL 2018 • Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xue-Qi Cheng, Tie-Yan Liu

Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on.

Language Modelling Machine Translation +3

Paper
Add Code

Dense Information Flow for Neural Machine Translation

1 code implementation • NAACL 2018 • Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.

Ranked #68 on Machine Translation on WMT2014 English-German

Machine Translation NMT +1

Paper
Code

Towards Binary-Valued Gates for Robust LSTM Training

1 code implementation • ICML 2018 • Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling.

Paper
Code

Double Path Networks for Sequence to Sequence Learning

1 code implementation • COLING 2018 • Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Paper
Code

Model-Level Dual Learning

no code implementations • ICML 2018 • Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu

Many artificial intelligence tasks appear in dual forms like English$\leftrightarrow$French translation and speech$\leftrightarrow$text transformation.

Machine Translation Sentiment Analysis +1

Paper
Add Code

Neural Architecture Optimization

5 code implementations • NeurIPS 2018 • Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu

The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy.

Ranked #3 on Neural Architecture Search on CIFAR-10 Image Classification

Evolutionary Algorithms General Classification +3

282

Paper
Code

A Study of Reinforcement Learning for Neural Machine Translation

1 code implementation • EMNLP 2018 • Lijun Wu, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system.

Machine Translation NMT +3

185

Paper
Code

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter

no code implementations • EMNLP 2018 • Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.

Machine Translation Sentence +2

Paper
Add Code

FRAGE: Frequency-Agnostic Word Representation

2 code implementations • NeurIPS 2018 • Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.

Ranked #3 on Machine Translation on IWSLT2015 German-English

Language Modelling Machine Translation +5

118

Paper
Code

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations • 19 Sep 2018 • Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Paper
Add Code

Target Transfer Q-Learning and Its Convergence Analysis

no code implementations • 21 Sep 2018 • Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu

In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.

Q-Learning Reinforcement Learning (RL) +1

Paper
Add Code

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

no code implementations • NeurIPS 2017 • Yue Wang, Wei Chen, Yu-Ting Liu, Zhi-Ming Ma, Tie-Yan Liu

(2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Convergent Variant of the Boltzmann Softmax Operator in Reinforcement Learning

no code implementations • 27 Sep 2018 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu

We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.

Atari Games Q-Learning +2

Paper
Add Code

Expressiveness in Deep Reinforcement Learning

no code implementations • 27 Sep 2018 • Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu

Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.

Atari Games reinforcement-learning +2

Paper
Add Code

Learning to Teach with Dynamic Loss Functions

no code implementations • NeurIPS 2018 • Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher).

BIG-bench Machine Learning Image Classification +1

Paper
Add Code

Modeling Local Dependence in Natural Language with Multi-channel Recurrent Neural Networks

no code implementations • 13 Nov 2018 • Chang Xu, Weiran Huang, Hongwei Wang, Gang Wang, Tie-Yan Liu

In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information.

Abstractive Text Summarization Language Modelling +2

Paper
Add Code

On the Local Hessian in Back-propagation

no code implementations • NeurIPS 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu

We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP.

Paper
Add Code

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

no code implementations • NeurIPS 2018 • Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.

Machine Translation NMT +1

Paper
Add Code

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

no code implementations • 23 Dec 2018 • Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.

Machine Translation Sentence +2

Paper
Add Code

Non-Autoregressive Machine Translation with Auxiliary Regularization

no code implementations • 22 Feb 2019 • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu

However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).

Machine Translation Sentence +1

Paper
Add Code

Multilingual Neural Machine Translation with Knowledge Distillation

1 code implementation • ICLR 2019 • Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.

Knowledge Distillation Machine Translation +1

Paper
Code

A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network

1 code implementation • 2 Mar 2019 • Xihan Li, Jia Zhang, Jiang Bian, Yunhai Tong, Tie-Yan Liu

Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting.

Combinatorial Optimization Management +2

814

Paper
Code

Positively Scale-Invariant Flatness of ReLU Neural Networks

no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.

Paper
Add Code

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +2

Paper
Code

Stabilize Deep ResNet with A Sharp Scaling Factor $τ$

1 code implementation • 17 Mar 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.

Paper
Code

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion

no code implementations • 6 Apr 2019 • Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu

Recently, G2P conversion is viewed as a sequence to sequence task and modeled by RNN or CNN based encoder-decoder framework.

Ranked #1 on Text-To-Speech Synthesis on CMUDict 0.7b

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Adaptive Regret of Convex and Smooth Functions

no code implementations • 26 Apr 2019 • Lijun Zhang, Tie-Yan Liu, Zhi-Hua Zhou

We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure.

Paper
Add Code

TabNN: A Universal Neural Network Solution for Tabular Data

no code implementations • ICLR 2019 • Guolin Ke, Jia Zhang, Zhenhui Xu, Jiang Bian, Tie-Yan Liu

Since there are no shared patterns among these diverse tabular data, it is hard to design specific structures to fit them all.

Paper
Add Code

Dual Learning: Theoretical Study and Algorithmic Extensions

no code implementations • ICLR 2019 • Zhibing Zhao, Yingce Xia, Tao Qin, Tie-Yan Liu

Based on the theoretical discoveries, we extend dual learning by introducing more related mappings and propose highly symmetric frameworks, cycle dual learning and multipath dual learning, in both of which we can leverage the feedback signals from additional domains to improve the qualities of the mappings.

Machine Translation Translation

Paper
Add Code

Multi-Agent Dual Learning

no code implementations • ICLR 2019 • Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu

Dual learning has attracted much attention in machine learning, computer vision and natural language processing communities.

Ranked #1 on Machine Translation on WMT2016 English-German

Machine Translation Translation

Paper
Add Code

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Paper
Add Code

Optimization on Multiple Manifolds

no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

Paper
Add Code

Hint-based Training for Non-Autoregressive Translation

no code implementations • ICLR 2019 • Zhuohan Li, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

To improve the accuracy of NART models, in this paper, we propose to leverage the hints from a well-trained ART model to train the NART model.

Machine Translation Translation

Paper
Add Code

MASS: Masked Sequence to Sequence Pre-training for Language Generation

7 code implementations • 7 May 2019 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Pre-training and fine-tuning, e. g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks.

Ranked #2 on Unsupervised Machine Translation on WMT2014 English-French

Conversational Response Generation Response Generation +5

1,115

Paper
Code

Almost Unsupervised Text to Speech and Automatic Speech Recognition

no code implementations • 13 May 2019 • Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

FastSpeech: Fast,Robustand Controllable Text-to-Speech

11 code implementations • 22 May 2019 • Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Text-To-Speech Synthesis

10,095

Paper
Code

FastSpeech: Fast, Robust and Controllable Text to Speech

21 code implementations • NeurIPS 2019 • Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Ranked #10 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Speech Synthesis Text-To-Speech Synthesis

29,008

Paper
Code

Soft Contextual Data Augmentation for Neural Machine Translation

1 code implementation • ACL 2019 • Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xue-Qi Cheng, Tie-Yan Liu

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited.

Data Augmentation Language Modelling +3

Paper
Code

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

no code implementations • 28 May 2019 • Ruihan Yang, Qiwei Ye, Tie-Yan Liu

Based on that, We proposed an end-to-end algorithm to learn exploration policy by meta-learning.

counterfactual Efficient Exploration +1

Paper
Add Code

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

no code implementations • 28 May 2019 • Yufei Wang, Qiwei Ye, Tie-Yan Liu

In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy.

Atari Games Meta-Learning +2

Paper
Add Code

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations • 29 May 2019 • Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

Paper
Add Code

Unsupervised Pivot Translation for Distant Languages

no code implementations • ACL 2019 • Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.

Machine Translation NMT +1

Paper
Add Code

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

2 code implementations • ICLR 2020 • Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu

In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.

Position Sentence

145

Paper
Code

Depth Growing for Neural Machine Translation

1 code implementation • ACL 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem.

Ranked #11 on Machine Translation on WMT2014 English-French

Machine Translation NMT +3

Paper
Code

Light Multi-segment Activation for Model Compression

2 code implementations • 16 Jul 2019 • Zhenhui Xu, Guolin Ke, Jia Zhang, Jiang Bian, Tie-Yan Liu

Inspired by the nature of the expressiveness ability in Neural Networks, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model.

Knowledge Distillation Model Compression +1

Paper
Code

Representation Degeneration Problem in Training Natural Language Generation Models

1 code implementation • ICLR 2019 • Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.

Language Modelling Machine Translation +3

Paper
Code

LightMC: A Dynamic and Efficient Multiclass Decomposition Algorithm

no code implementations • 25 Aug 2019 • Ziyu Liu, Guolin Ke, Jiang Bian, Tie-Yan Liu

Instead of using fixed coding matrix and decoding strategy, LightMC uses a differentiable decoding strategy, which enables it to dynamically optimize the coding matrix and decoding strategy, toward increasing the overall accuracy of multiclass classification, via back propagation jointly with the training of base learners in an iterative way.

Classification General Classification

Paper
Add Code

Multilingual Neural Machine Translation with Language Clustering

no code implementations • IJCNLP 2019 • Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu

We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.

Clustering Machine Translation +2

Paper
Add Code

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

1 code implementation • 8 Sep 2019 • Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, Tie-Yan Liu

To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers.

Classification General Classification +1

248

Paper
Code

Hint-Based Training for Non-Autoregressive Machine Translation

1 code implementation • IJCNLP 2019 • Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.

Machine Translation Translation

Paper
Code

Path Space for Recurrent Neural Networks with ReLU Activations

no code implementations • 25 Sep 2019 • Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.

Paper
Add Code

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

Paper
Add Code

Independence-aware Advantage Estimation

no code implementations • 25 Sep 2019 • Pushi Zhang, Li Zhao, Guoqing Liu, Jiang Bian, Minglie Huang, Tao Qin, Tie-Yan Liu

Most of existing advantage function estimation methods in reinforcement learning suffer from the problem of high variance, which scales unfavorably with the time horizon.

Paper
Add Code

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

Paper
Add Code

Demonstration Actor Critic

no code implementations • 25 Sep 2019 • Guoqing Liu, Li Zhao, Pushi Zhang, Jiang Bian, Tao Qin, Nenghai Yu, Tie-Yan Liu

One approach leverages demonstration data in a supervised manner, which is simple and direct, but can only provide supervision signal over those states seen in the demonstrations.

Paper
Add Code

P-BN: Towards Effective Batch Normalization in the Path Space

no code implementations • 25 Sep 2019 • Xufang Luo, Qi Meng, Wei Chen, Tie-Yan Liu

Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space.

Paper
Add Code

Machine Translation With Weakly Paired Documents

no code implementations • IJCNLP 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.

Sentence Translation +1

Paper
Add Code

Exploiting Monolingual Data at Scale for Neural Machine Translation

no code implementations • IJCNLP 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them.

Ranked #1 on Machine Translation on WMT2016 English-German (SacreBLEU metric, using extra training data)

Machine Translation NMT +1

Paper
Add Code

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

6 code implementations • NeurIPS 2019 • Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu

The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.

Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)

Atari Games Distributional Reinforcement Learning +2

2,513

Paper
Code

Distributional Reward Decomposition for Reinforcement Learning

no code implementations • NeurIPS 2019 • Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Guangwen Yang, Tie-Yan Liu

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Microsoft Research Asia's Systems for WMT19

no code implementations • WS 2019 • Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin, Tie-Yan Liu

We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks.

Data Augmentation Knowledge Distillation +1

Paper
Add Code

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

2 code implementations • 20 Nov 2019 • Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models.

Machine Translation Translation

Paper
Code

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

no code implementations • 26 Nov 2019 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin

By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.

Paper
Add Code

Normalization Helps Training of Quantized LSTM

1 code implementation • NeurIPS 2019 • Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.

Quantization

Paper
Code

Neural Machine Translation with Soft Prototype

1 code implementation • NeurIPS 2019 • Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu

Neural machine translation models usually use the encoder-decoder framework and generate translation from left to right (or right to left) without fully utilizing the target-side global information.

Machine Translation Translation

Paper
Code

A Study of Multilingual Neural Machine Translation

no code implementations • 25 Dec 2019 • Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e. g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e. g., rich resource and low resource, one-to-many, and many-to-one translation).

Machine Translation NMT +1

Paper
Add Code

On Layer Normalization in the Transformer Architecture

8 code implementations • ICML 2020 • Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Li-Wei Wang, Tie-Yan Liu

This motivates us to remove the warm-up stage for the training of Pre-LN Transformers.

7,522

Paper
Code

Incorporating BERT into Neural Machine Translation

3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Ranked #1 on Unsupervised Machine Translation on WMT2014 English-French

Natural Language Understanding NMT +5

351

Paper
Code

Semi-Supervised Neural Architecture Search

2 code implementations • NeurIPS 2020 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

On ImageNet, it achieves 23. 5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search.

Ranked #81 on Neural Architecture Search on ImageNet

Natural Language Transduction Neural Architecture Search

Paper
Code

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

no code implementations • ICLR Workshop DeepDiffEq 2019 • Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu

In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.

Sentence

Paper
Add Code

Suphx: Mastering Mahjong with Deep Reinforcement Learning

no code implementations • 30 Mar 2020 • Junjie Li, Sotetsu Koyamada, Qiwei Ye, Guoqing Liu, Chao Wang, Ruihan Yang, Li Zhao, Tao Qin, Tie-Yan Liu, Hsiao-Wuen Hon

Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator

1 code implementation • 5 Apr 2020 • Yuxuan Song, Qiwei Ye, Minkai Xu, Tie-Yan Liu

Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.

Ranked #7 on Image Generation on STL-10

Image Generation

Paper
Code

MPNet: Masked and Permuted Pre-training for Language Understanding

6 code implementations • NeurIPS 2020 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Language Modelling Masked Language Modeling +3

124,527

Paper
Code

A Study of Non-autoregressive Model for Sequence Generation

no code implementations • ACL 2020 • Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu

In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

no code implementations • 27 Apr 2020 • Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu

While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.

Knowledge Distillation Language Modelling

Paper
Add Code

SEEK: Segmented Embedding of Knowledge Graphs

1 code implementation • ACL 2020 • Wentao Xu, Shun Zheng, Liang He, Bin Shao, Jian Yin, Tie-Yan Liu

In recent years, knowledge graph embedding becomes a pretty hot research topic of artificial intelligence and plays increasingly vital roles in various downstream applications, such as recommendation and question answering.

Ranked #1 on Link Prediction on YAGO37

Knowledge Graph Embedding Knowledge Graphs +2

Paper
Code

Invertible Image Rescaling

10 code implementations • ECCV 2020 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.

Image Super-Resolution

620

Paper
Code

Dual Learning: Theoretical Study and an Algorithmic Extension

no code implementations • 17 May 2020 • Zhibing Zhao, Yingce Xia, Tao Qin, Lirong Xia, Tie-Yan Liu

Dual learning has been successfully applied in many machine learning applications including machine translation, image-to-image transformation, etc.

Machine Translation Translation

Paper
Add Code

MultiSpeech: Multi-Speaker Text to Speech with Transformer

1 code implementation • 8 Jun 2020 • Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu

Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.

Paper
Code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

32 code implementations • ICLR 2021 • Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

29,008

Paper
Code

MC-BERT: Efficient Language Pre-Training via a Meta Controller

1 code implementation • 10 Jun 2020 • Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu

Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.

Binary Classification Cloze Test +4

Paper
Code

UWSpeech: Speech to Speech Translation for Unwritten Languages

no code implementations • 14 Jun 2020 • Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Ke-jun Zhang, Tie-Yan Liu

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training.

speech-recognition Speech Recognition +2

Paper
Add Code

Multi-branch Attentive Transformer

1 code implementation • 18 Jun 2020 • Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.

Ranked #4 on Machine Translation on WMT2014 English-German (SacreBLEU metric)

Code Generation Machine Translation +2

Paper
Code

Modeling Lost Information in Lossy Image Compression

no code implementations • 22 Jun 2020 • Yaolong Wang, Mingqing Xiao, Chang Liu, Shuxin Zheng, Tie-Yan Liu

Specifically, ILC introduces an invertible encoding module to replace the encoder-decoder structure to produce the low dimensional informative latent representation, meanwhile, transform the lost information into an auxiliary latent variable that won't be further coded or stored.

Image Compression

Paper
Add Code

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

no code implementations • 24 Jun 2020 • Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.

Paper
Add Code

Rethinking Positional Encoding in Language Pre-training

3 code implementations • ICLR 2021 • Guolin Ke, Di He, Tie-Yan Liu

In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.

Natural Language Understanding Sentence +1

250

Paper
Code

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

no code implementations • ACL 2020 • Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

no code implementations • 9 Jul 2020 • Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers.

Sentence Singing Voice Synthesis

Paper
Add Code

Accuracy Prediction with Non-neural Model for Neural Architecture Search

1 code implementation • 9 Jul 2020 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction.

Ranked #81 on Neural Architecture Search on ImageNet

Neural Architecture Search

Paper
Code

Temporally Correlated Task Scheduling for Sequence Learning

2 code implementations • 10 Jul 2020 • Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu

In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict.

Machine Translation Scheduling +1

14,132

Paper
Code

Learning to Match Distributions for Domain Adaptation

1 code implementation • 17 Jul 2020 • Chaohui Yu, Jindong Wang, Chang Liu, Tao Qin, Renjun Xu, Wenjie Feng, Yiqiang Chen, Tie-Yan Liu

However, it remains challenging to determine which method is suitable for a given application since they are built with certain priors or bias.

Domain Adaptation Inductive Bias

12,828

Paper
Code

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

no code implementations • 17 Jul 2020 • Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

SAT contains a hyperparameter k, and each k value defines a SAT task with different degrees of parallelism.

Machine Translation Sentence +1

Paper
Add Code

How Does Data Augmentation Affect Privacy in Machine Learning?

1 code implementation • 21 Jul 2020 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.

BIG-bench Machine Learning Data Augmentation

Paper
Code

Taking Notes on the Fly Helps BERT Pre-training

no code implementations • 4 Aug 2020 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Sentence

Paper
Add Code

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

no code implementations • 9 Aug 2020 • Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu

However, there are more than 6, 000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

PopMAG: Pop Music Accompaniment Generation

1 code implementation • 18 Aug 2020 • Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.

Music Modeling

581

Paper
Code

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

1 code implementation • 3 Sep 2020 • Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling.

Singing Voice Synthesis Vocal Bursts Intensity Prediction

105

Paper
Code

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Ranked #25 on Graph Property Prediction on ogbg-molhiv

Graph Classification Graph Representation Learning

Paper
Code

Qlib: An AI-oriented Quantitative Investment Platform

2 code implementations • 22 Sep 2020 • Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, Tie-Yan Liu

Quantitative investment aims to maximize the return and minimize the risk in a sequential trading period over a set of financial instruments.

Portfolio Optimization Stock Market Prediction

14,136

Paper
Code

COSEA: Convolutional Code Search with Layer-wise Attention

no code implementations • 19 Oct 2020 • Hao Wang, Jia Zhang, Yingce Xia, Jiang Bian, Chao Zhang, Tie-Yan Liu

However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes.

Code Search

Paper
Add Code

Learning Causal Semantic Representation for Out-of-Distribution Prediction

1 code implementation • NeurIPS 2021 • Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output.

Domain Adaptation

Paper
Code

Latent Causal Invariant Model

no code implementations • 4 Nov 2020 • Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu

To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.

Disentanglement

Paper
Add Code

RD$^2$: Reward Decomposition with Representation Decomposition

no code implementations • NeurIPS 2020 • Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu

In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards.

Paper
Add Code

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

1 code implementation • 11 Dec 2020 • Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.

Paper
Code

Denoising Text to Speech with Frame-Level Noise Modeling

no code implementations • 17 Dec 2020 • Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu

In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.

Denoising

Paper
Add Code

Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations

1 code implementation • 24 Dec 2020 • Wenlei Shi, Xinran Wei, Jia Zhang, Xiaoyuan Ni, Arthur Jiang, Jiang Bian, Tie-Yan Liu

While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward.

Graph Attention Multi-agent Reinforcement Learning

814

Paper
Code

On the Stability of Multi-branch Network

no code implementations • 1 Jan 2021 • Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu

More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.

Paper
Add Code

Taking Notes on the Fly Helps Language Pre-Training

no code implementations • ICLR 2021 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Sentence

Paper
Add Code

Task-Agnostic and Adaptive-Size BERT Compression

no code implementations • 1 Jan 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu

NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.

Language Modelling Model Compression +1

Paper
Add Code

Learning to Use Future Information in Simultaneous Translation

1 code implementation • 1 Jan 2021 • Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu

For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i. e., using more future information for training than inference) generally outperforms wait-k training.

Machine Translation NMT +2

Paper
Code

Exploring the Regulatory Function of the N-terminal Domain of SARS-CoV-2 Spike Protein Through Molecular Dynamics Simulation

no code implementations • 6 Jan 2021 • Yao Li, Tong Wang, Juanrong Zhang, Bin Shao, Haipeng Gong, Yusong Wang, Siyuan Liu, Tie-Yan Liu

We performed molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs).

Paper
Add Code

BN-invariant sharpness regularizes the training model to better generalization

no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

Paper
Add Code

Universal Trading for Order Execution with Oracle Policy Distillation

no code implementations • 28 Jan 2021 • Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, Tie-Yan Liu

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument.

Algorithmic Trading reinforcement-learning +1

Paper
Add Code

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

4 code implementations • 8 Feb 2021 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu

Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.

Model Compression Neural Architecture Search

1,281

Paper
Code

REST: Relational Event-driven Stock Trend Forecasting

no code implementations • 15 Feb 2021 • Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, Tie-Yan Liu

To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts.

Paper
Add Code

Revisiting Language Encoding in Learning Multilingual Representations

1 code implementation • 16 Feb 2021 • Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu

The language embedding can be either added to the word embedding or attached at the beginning of the sentence.

Sentence Word Embeddings

Paper
Code

Return-Based Contrastive Representation Learning for Reinforcement Learning

no code implementations • ICLR 2021 • Guoqing Liu, Chuheng Zhang, Li Zhao, Tao Qin, Jinhua Zhu, Jian Li, Nenghai Yu, Tie-Yan Liu

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL).

Atari Games reinforcement-learning +2

Paper
Add Code

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

2 code implementations • ICLR 2021 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.

Paper
Code

LazyFormer: Self Attention with Lazy Update

no code implementations • 25 Feb 2021 • Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu

In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.

Paper
Add Code

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations • ICLR 2021 • Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

155

Paper
Code

Learning Invariant Representations across Domains and Tasks

no code implementations • 3 Mar 2021 • Jindong Wang, Wenjie Feng, Chang Liu, Chaohui Yu, Mingxuan Du, Renjun Xu, Tao Qin, Tie-Yan Liu

Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification.

Domain Adaptation Image Classification +1

Paper
Add Code

IOT: Instance-wise Layer Reordering for Transformer Structures

1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.

Abstractive Text Summarization Code Generation +2

Paper
Code

Impact of pandemic fatigue on the spread of COVID-19: a mathematical modelling study

no code implementations • 9 Apr 2021 • Disheng Tang, Wei Cao, Jiang Bian, Tie-Yan Liu, Zhifeng Gao, Shun Zheng, Jue Liu

We used a stochastic metapopulation model with a hierarchical structure and fitted the model to the positive cases in the US from the start of outbreak to the end of 2020.

Paper
Add Code

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

no code implementations • NAACL 2021 • Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, Tie-Yan Liu

Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models.

Ranked #4 on Machine Translation on IWSLT2014 English-German

General Classification Machine Translation +3

Paper
Add Code

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation • 20 Apr 2021 • Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

Paper
Code

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

1 code implementation • NeurIPS 2021 • Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,281

Paper
Code

How could Neural Networks understand Programs?

1 code implementation • 10 May 2021 • Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.

valid

120

Paper
Code

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Paper
Add Code

Learning Structures for Deep Neural Networks

no code implementations • 27 May 2021 • Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu

We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy.

Classification Image Classification

Paper
Add Code

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

no code implementations • 30 May 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu

The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly.

Language Modelling Model Compression +1

Paper
Add Code

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation • CVPR 2022 • Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Vocal Bursts Valence Prediction

Paper
Code

Machine-Learning Non-Conservative Dynamics for New-Physics Detection

no code implementations • 31 May 2021 • Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu

Energy conservation is a basic physics principle, the breakdown of which often implies new physics.

BIG-bench Machine Learning Friction

Paper
Add Code

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Paper
Add Code

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Ranked #1 on Graph Regression on PCQM4M-LSC

Graph Classification Graph Property Prediction +2

1,894

Paper
Code

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

2 code implementations • Findings (ACL) 2021 • Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu

Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding.

Classification Emotion Classification +2

4,181

Paper
Code

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

1 code implementation • ICLR 2022 • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu

Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density.

Audio Generation Denoising +2

1,281

Paper
Code

Dual-view Molecule Pre-training

1 code implementation • 17 Jun 2021 • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.

Ranked #1 on Molecular Property Prediction on HIV dataset

Molecular Property Prediction Property Prediction +2

Paper
Code

Large Scale Private Learning via Low-rank Reparametrization

1 code implementation • 17 Jun 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.

Paper
Code

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

Paper
Add Code

R-Drop: Regularized Dropout for Neural Networks

9 code implementations • NeurIPS 2021 • Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Dropout is a powerful and widely used technique to regularize the training of deep neural networks.

Ranked #4 on Machine Translation on WMT2014 English-French

Abstractive Text Summarization Image Classification +3

857

Paper
Code

A Survey on Neural Speech Synthesis

1 code implementation • 29 Jun 2021 • Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.

Speech Synthesis

351

Paper
Code

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations • 29 Jun 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Paper
Add Code

On the Generative Utility of Cyclic Conditionals

1 code implementation • NeurIPS 2021 • Chang Liu, Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu

This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch.

Paper
Code

Supervised Off-Policy Ranking

1 code implementation • 3 Jul 2021 • Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu

Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance.

Off-policy evaluation

Paper
Code

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu

In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.

Language Modelling

4,181

Paper
Code

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

Paper
Add Code

A Survey on Low-Resource Neural Machine Translation

no code implementations • 9 Jul 2021 • Rui Wang, Xu Tan, Renqian Luo, Tao Qin, Tie-Yan Liu

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data.

Low-Resource Neural Machine Translation NMT +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.