6 code implementations • NeurIPS 2020 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.
Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)
2 code implementations • 19 Oct 2022 • Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu
Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.
Ranked #1 on Document Classification on HOC (Micro F1 metric)
21 code implementations • NeurIPS 2019 • Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.
Ranked #10 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
32 code implementations • ICLR 2021 • Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.
Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
1 code implementation • NeurIPS 2017 • Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu
We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.
2 code implementations • 20 Jul 2022 • Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu
Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications.
2 code implementations • 22 Sep 2020 • Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, Tie-Yan Liu
Quantitative investment aims to maximize the return and minimize the risk in a sequential trading period over a set of financial instruments.
2 code implementations • 10 Jul 2020 • Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu
In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict.
1 code implementation • 17 Jul 2020 • Chaohui Yu, Jindong Wang, Chang Liu, Tao Qin, Renjun Xu, Wenjie Feng, Yiqiang Chen, Tie-Yan Liu
However, it remains challenging to determine which method is suitable for a given application since they are built with certain priors or bias.
11 code implementations • 22 May 2019 • Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).
8 code implementations • ICML 2020 • Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Li-Wei Wang, Tie-Yan Liu
This motivates us to remove the warm-up stage for the training of Pre-LN Transformers.
2 code implementations • Findings (ACL) 2021 • Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu
Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding.
1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu
In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.
1 code implementation • 20 Sep 2021 • Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu
In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).
1 code implementation • 11 Aug 2022 • Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan
These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to better and explicitly align the lyric-melody features.
1 code implementation • 30 Aug 2022 • Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form.
1 code implementation • 19 Oct 2022 • Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu
A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e. g., over 10, 000 tokens), and the existing models have shortcomings in generating musical repetition structures.
6 code implementations • NeurIPS 2019 • Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu
The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.
Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)
2 code implementations • ICLR 2022 • Chongchong Li, Yue Wang, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu
Then we proposed a two-model-based learning method to control the prediction error and the gradient error.
4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
Ranked #1 on Graph Regression on PCQM4M-LSC
3 code implementations • 9 Mar 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.
4 code implementations • 8 Feb 2021 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu
Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.
1 code implementation • NeurIPS 2021 • Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu
A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • ICLR 2022 • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu
Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density.
1 code implementation • Findings (EMNLP) 2021 • Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu
Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
3 code implementations • 9 May 2022 • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
1 code implementation • 30 May 2022 • Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu
Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.
1 code implementation • 2 Dec 2022 • Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu
In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
7 code implementations • 7 May 2019 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
Pre-training and fine-tuning, e. g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks.
9 code implementations • NeurIPS 2021 • Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
Ranked #4 on Machine Translation on WMT2014 English-French
1 code implementation • 2 Mar 2019 • Xihan Li, Jia Zhang, Jiang Bian, Yunhai Tong, Tie-Yan Liu
Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting.
1 code implementation • 24 Dec 2020 • Wenlei Shi, Xinran Wei, Jia Zhang, Xiaoyuan Ni, Arthur Jiang, Jiang Bian, Tie-Yan Liu
While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward.
10 code implementations • ECCV 2020 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu
High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.
1 code implementation • 9 Oct 2022 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, Tie-Yan Liu
To be specific, we develop invertible models to generate valid degraded images and meanwhile transform the distribution of lost contents to the fixed distribution of a latent variable during the forward degradation.
1 code implementation • 18 Aug 2020 • Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.
3 code implementations • 9 Jun 2013 • Tao Qin, Tie-Yan Liu
We call the two query sets MQ2007 and MQ2008 for short.
1 code implementation • 29 Jun 2021 • Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.
3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.
5 code implementations • NeurIPS 2018 • Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu
The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy.
3 code implementations • ICLR 2021 • Guolin Ke, Di He, Tie-Yan Liu
In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.
1 code implementation • 8 Sep 2019 • Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, Tie-Yan Liu
To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers.
2 code implementations • 26 Oct 2021 • Wentao Xu, Weiqing Liu, Lewen Wang, Yingce Xia, Jiang Bian, Jian Yin, Tie-Yan Liu
To overcome the shortcomings of previous work, we proposed a novel stock trend forecasting framework that can adequately mine the concept-oriented shared information from predefined concepts and hidden concepts.
1 code implementation • 4 Oct 2022 • Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
Ranked #4 on Graph Regression on PCQM4Mv2-LSC
1 code implementation • EMNLP 2018 • Lijun Wu, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system.
2 code implementations • ICLR 2021 • Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu
2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.
2 code implementations • ICLR 2020 • Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu
In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.
1 code implementation • ICML 2017 • Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu
Many supervised learning tasks are emerged in dual forms, e. g., English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation.
1 code implementation • 20 Apr 2022 • Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-Yan Liu
While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +11
1 code implementation • 4 Dec 2014 • Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma
When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers.
1 code implementation • 10 May 2021 • Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.
2 code implementations • NeurIPS 2018 • Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.
Ranked #3 on Machine Translation on IWSLT2015 German-English
1 code implementation • 3 Sep 2020 • Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu
To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling.
1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang
We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.
Ranked #25 on Graph Property Prediction on ogbg-molhiv
1 code implementation • 10 Jun 2020 • Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu
Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.
2 code implementations • ICLR 2021 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu
The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.
1 code implementation • 17 Jun 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.
1 code implementation • ICML 2018 • Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling.
1 code implementation • NeurIPS 2023 • Qizhi Pei, Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Kun He, Tie-Yan Liu, Rui Yan
In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding.
2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou
Machine translation has made rapid advances in recent years.
Ranked #3 on Machine Translation on WMT 2017 English-Chinese
1 code implementation • NeurIPS 2021 • Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu
Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output.
1 code implementation • ICLR 2019 • Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu
Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.
1 code implementation • 20 Apr 2021 • Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu
In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.
4 code implementations • 6 Dec 2017 • Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, Tie-Yan Liu
Stock trend prediction plays a critical role in seeking maximized profit from stock investment.
Ranked #16 on Stock Market Prediction on Astock
1 code implementation • 2 Feb 2023 • Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu
The obtained motif vocabulary consists of not only molecular motifs (i. e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other.
1 code implementation • NeurIPS 2021 • Chang Liu, Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu
This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch.
1 code implementation • 3 Feb 2022 • Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu
Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.
1 code implementation • COLING 2018 • Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu
In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.
1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
2 code implementations • 20 Jun 2022 • Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, Rui Yan
Accurate prediction of Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities.
Ranked #1 on Drug Discovery on KIBA
1 code implementation • ICLR 2022 • Wei Fan, Shun Zheng, Xiaohan Yi, Wei Cao, Yanjie Fu, Jiang Bian, Tie-Yan Liu
However, the complicated dependencies of the PTS signal on its inherent periodicity as well as the sophisticated composition of various periods hinder the performance of PTS forecasting.
1 code implementation • ACL 2019 • Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xue-Qi Cheng, Tie-Yan Liu
While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited.
1 code implementation • ACL 2020 • Wentao Xu, Shun Zheng, Liang He, Bin Shao, Jian Yin, Tie-Yan Liu
In recent years, knowledge graph embedding becomes a pretty hot research topic of artificial intelligence and plays increasingly vital roles in various downstream applications, such as recommendation and question answering.
Ranked #1 on Link Prediction on YAGO37
1 code implementation • 18 Jun 2020 • Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.
Ranked #4 on Machine Translation on WMT2014 English-German (SacreBLEU metric)
1 code implementation • 17 Jun 2021 • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.
Ranked #1 on Molecular Property Prediction on HIV dataset
1 code implementation • 14 Sep 2021 • Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting.
1 code implementation • NeurIPS 2019 • Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu
The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.
1 code implementation • CVPR 2022 • Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu
Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.
1 code implementation • 26 May 2022 • Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.
1 code implementation • 9 Jul 2020 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction.
Ranked #81 on Neural Architecture Search on ImageNet
2 code implementations • NeurIPS 2020 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
On ImageNet, it achieves 23. 5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search.
Ranked #81 on Neural Architecture Search on ImageNet
1 code implementation • ACL 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem.
Ranked #11 on Machine Translation on WMT2014 English-French
1 code implementation • 14 Jul 2022 • Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.
1 code implementation • NeurIPS 2021 • Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu
Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations.
1 code implementation • NeurIPS 2021 • Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu
To avoid such a spurious correlation, we propose \textbf{La}tent \textbf{C}ausal \textbf{I}nvariance \textbf{M}odels (LaCIM) that specifies the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction.
2 code implementations • COLING 2022 • Zhiping Luo, Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
Learning the embeddings of knowledge graphs (KG) is vital in artificial intelligence, and can benefit various downstream applications, such as recommendation and question answering.
1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.
1 code implementation • 26 Oct 2021 • Weitao Du, He Zhang, Yuanqi Du, Qi Meng, Wei Chen, Bin Shao, Tie-Yan Liu
In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently.
2 code implementations • 20 Nov 2019 • Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu
Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models.
1 code implementation • 8 Jun 2020 • Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu
Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.
1 code implementation • 30 Aug 2022 • Kehan Wu, Yingce Xia, Yang Fan, Pan Deng, Haiguang Liu, Lijun Wu, Shufang Xie, Tong Wang, Tao Qin, Tie-Yan Liu
Structure-based drug design is drawing growing attentions in computer-aided drug discovery.
1 code implementation • 3 Nov 2021 • Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu
However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies.
1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.
Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)
1 code implementation • 12 Dec 2021 • Wentao Xu, Yingce Xia, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path.
1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu
In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.
1 code implementation • ICLR 2022 • Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Rui Yan, Tie-Yan Liu
Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens.
1 code implementation • 16 Feb 2021 • Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu
The language embedding can be either added to the word embedding or attached at the beginning of the sentence.
1 code implementation • 31 Jan 2023 • Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu
Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design.
Ranked #1 on Multi-step retrosynthesis on USPTO-190
1 code implementation • IJCNLP 2019 • Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.
1 code implementation • NeurIPS 2021 • He Zhang, Fusong Ju, Jianwei Zhu, Liang He, Bin Shao, Nanning Zheng, Tie-Yan Liu
These methods generally derive coevolutionary features by aggregating the learned residue representations from individual sequences with equal weights, which is inconsistent with the premise that residue co-evolutions are a reflection of collective covariation patterns of numerous homologous proteins.
1 code implementation • NeurIPS 2019 • Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu
Neural machine translation models usually use the encoder-decoder framework and generate translation from left to right (or right to left) without fully utilizing the target-side global information.
1 code implementation • NeurIPS 2016 • Yingce Xia, Di He, Tao Qin, Li-Wei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma
Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).
1 code implementation • 3 Jul 2021 • Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu
Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance.
1 code implementation • 10 Feb 2023 • Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu
To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
1 code implementation • 5 Apr 2020 • Yuxuan Song, Qiwei Ye, Minkai Xu, Tie-Yan Liu
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
Ranked #7 on Image Generation on STL-10
1 code implementation • 21 Jul 2020 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.
2 code implementations • 16 Jul 2019 • Zhenhui Xu, Guolin Ke, Jia Zhang, Jiang Bian, Tie-Yan Liu
Inspired by the nature of the expressiveness ability in Neural Networks, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model.
1 code implementation • 1 Nov 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.
1 code implementation • 18 Feb 2022 • Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu
In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.
1 code implementation • 13 Apr 2022 • Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu
Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.
1 code implementation • 17 Mar 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu
Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.
1 code implementation • NeurIPS 2021 • Jinpeng Li, Yingce Xia, Rui Yan, Hongda Sun, Dongyan Zhao, Tie-Yan Liu
Considering there is no parallel data between the contexts and the responses of target style S1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S1 and an intermediate style S0 is used.
1 code implementation • NeurIPS 2021 • Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu
To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources.
Distributional Reinforcement Learning reinforcement-learning +1
1 code implementation • 6 Jun 2022 • Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang
Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.
1 code implementation • NAACL 2018 • Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu
Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.
Ranked #68 on Machine Translation on WMT2014 English-German
1 code implementation • 1 Jan 2021 • Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu
For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i. e., using more future information for training than inference) generally outperforms wait-k training.
1 code implementation • 25 May 2022 • Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu
We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately.
no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?
no code implementations • ICLR 2018 • Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations.
no code implementations • 8 May 2018 • Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.
no code implementations • 3 May 2018 • Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Tie-Yan Liu
The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents.
no code implementations • CVPR 2018 • Jianxin Lin, Yingce Xia, Tao Qin, Zhibo Chen, Tie-Yan Liu
In this paper, we study a new problem, conditional image-to-image translation, which is to translate an image from the source domain to the target domain conditioned on a given image in the target domain.
no code implementations • 27 Feb 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu
This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic.
no code implementations • 29 Sep 2017 • Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu
First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.
no code implementations • 27 Sep 2017 • Shizhao Sun, Wei Chen, Jiang Bian, Xiaoguang Liu, Tie-Yan Liu
However, with the increasing size of DNN models and the large number of workers in practice, this typical data parallelism cannot achieve satisfactory training acceleration, since it usually suffers from the heavy communication cost due to transferring huge amount of information between workers and the parameter server.
no code implementations • 1 Aug 2017 • Quanming Yao, James T. Kwok, Taifeng Wang, Tie-Yan Liu
Based on it, we develop a proximal gradient algorithm (and its accelerated variant) with inexact proximal splitting and prove that a convergence rate of O(1/T) where T is the number of iterations is guaranteed.
no code implementations • 2 Jun 2016 • Shizhao Sun, Wei Chen, Jiang Bian, Xiaoguang Liu, Tie-Yan Liu
In this framework, we propose to aggregate the local models by ensemble, i. e., averaging the outputs of local models instead of the parameters.
no code implementations • 20 Apr 2017 • Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human.
no code implementations • 20 Jun 2017 • Chenyan Xiong, Jamie Callan, Tie-Yan Liu
This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval.
no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu
We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.
no code implementations • 31 May 2017 • Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu
Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks.
no code implementations • 28 Feb 2017 • Yang Fan, Fei Tian, Tao Qin, Jiang Bian, Tie-Yan Liu
Machine learning is essentially the sciences of playing with data.
no code implementations • 22 Nov 2016 • Jia Zhang, Weidong Ma, Tao Qin, Xiaoming Sun, Tie-Yan Liu
We then extend our mechanism to the general case and achieve a competitive ratio $\frac{1}{42\log k\log T}$ for both social welfare and revenue, where $T$ is the ratio of the maximum request length to the minimum request length and $k$ is the ratio of the maximum request value density to the minimum request value density.
no code implementations • NeurIPS 2016 • Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu
After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.
no code implementations • NeurIPS 2016 • Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets.
no code implementations • 27 Sep 2016 • Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.
no code implementations • 27 Sep 2016 • Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).
no code implementations • 29 May 2015 • Huazheng Wang, Fei Tian, Bin Gao, Jiang Bian, Tie-Yan Liu
Second, we obtain distributed representations of words and relations by leveraging a novel word embedding method that considers the multi-sense nature of words and the relational knowledge among words (or their senses) contained in dictionaries.
no code implementations • 7 Apr 2016 • Fei Tian, Bin Gao, Di He, Tie-Yan Liu
We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.
no code implementations • 17 Jun 2015 • Shizhao Sun, Wei Chen, Li-Wei Wang, Xiaoguang Liu, Tie-Yan Liu
First, we derive an upper bound for RA of DNN, and show that it increases with increasing depth.
no code implementations • 19 May 2015 • Fei Tian, Bin Gao, Enhong Chen, Tie-Yan Liu
Although these works have achieved certain success, they have neglected some important facts about knowledge graphs: (i) many relationships in knowledge graphs are \emph{many-to-one}, \emph{one-to-many} or even \emph{many-to-many}, rather than simply \emph{one-to-one}; (ii) most head entities and tail entities in knowledge graphs come from very different semantic spaces.
no code implementations • 1 May 2015 • Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu
In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget.
no code implementations • 9 Oct 2014 • Haifang Li, Fei Tian, Wei Chen, Tao Qin, Tie-Yan Liu
For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e. g., auction) since self-interested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms.
no code implementations • 7 Jul 2014 • Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, Tie-Yan Liu
In particular, we introduce a novel neural network architecture called KNET that leverages both contextual information and morphological word similarity built based on morphological knowledge to learn word embeddings.
no code implementations • 23 Apr 2014 • Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, Tie-Yan Liu
Click prediction is one of the fundamental problems in sponsored search.
no code implementations • 19 Apr 2014 • Fei Tian, Haifang Li, Wei Chen, Tao Qin, Enhong Chen, Tie-Yan Liu
Then we prove a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain, which depends on both the Markovian parameters and the covering number of the function class compounded by the loss function for behavior prediction and the behavior prediction model.
no code implementations • 7 Jul 2014 • Bin Gao, Jiang Bian, Tie-Yan Liu
In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding.
no code implementations • 3 Jun 2014 • Di He, Wei Chen, Li-Wei Wang, Tie-Yan Liu
Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers.
no code implementations • 24 Apr 2013 • Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen
We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.
no code implementations • EMNLP 2018 • Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.
no code implementations • 19 Sep 2018 • Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu
Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.
no code implementations • NeurIPS 2017 • Yue Wang, Wei Chen, Yu-Ting Liu, Zhi-Ming Ma, Tie-Yan Liu
(2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient.
no code implementations • 21 Sep 2018 • Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu
In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.
no code implementations • NeurIPS 2018 • Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher).
no code implementations • 13 Nov 2018 • Chang Xu, Weiran Huang, Hongwei Wang, Gang Wang, Tie-Yan Liu
In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information.
no code implementations • NAACL 2018 • Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xue-Qi Cheng, Tie-Yan Liu
Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on.
no code implementations • NeurIPS 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu
We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP.
no code implementations • NeurIPS 2018 • Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.
no code implementations • NeurIPS 2017 • Di He, Hanqing Lu, Yingce Xia, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g., BLEU score) of the partial target sentence if it is completed by the NMT model.
no code implementations • NeurIPS 2017 • Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, Tie-Yan Liu
In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation.
no code implementations • NeurIPS 2013 • Min Xu, Tao Qin, Tie-Yan Liu
In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem.
no code implementations • NeurIPS 2012 • Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-Yan Liu
This paper is concerned with the statistical consistency of ranking methods.
no code implementations • NeurIPS 2010 • Wei Chen, Tie-Yan Liu, Zhi-Ming Ma
sampling of queries and the conditional i. i. d sampling of documents per query.
no code implementations • NeurIPS 2010 • Tao Qin, Xiubo Geng, Tie-Yan Liu
To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process.
no code implementations • NeurIPS 2009 • Wei Chen, Tie-Yan Liu, Yanyan Lan, Zhi-Ming Ma, Hang Li
We show that these loss functions are upper bounds of the measure-based ranking errors.
no code implementations • NeurIPS 2009 • Fen Xia, Tie-Yan Liu, Hang Li
This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting.
no code implementations • NeurIPS 2008 • Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, De-Sheng Wang, Hang Li
It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking.
no code implementations • ICML 2018 • Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu
Many artificial intelligence tasks appear in dual forms like English$\leftrightarrow$French translation and speech$\leftrightarrow$text transformation.
no code implementations • ICLR 2019 • Zhuohan Li, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu
To improve the accuracy of NART models, in this paper, we propose to leverage the hints from a well-trained ART model to train the NART model.
no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.
no code implementations • ICLR 2019 • Zhibing Zhao, Yingce Xia, Tao Qin, Tie-Yan Liu
Based on the theoretical discoveries, we extend dual learning by introducing more related mappings and propose highly symmetric frameworks, cycle dual learning and multipath dual learning, in both of which we can leverage the feedback signals from additional domains to improve the qualities of the mappings.
no code implementations • ICLR 2019 • Guolin Ke, Jia Zhang, Zhenhui Xu, Jiang Bian, Tie-Yan Liu
Since there are no shared patterns among these diverse tabular data, it is hard to design specific structures to fit them all.
no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?
no code implementations • ICLR 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu
Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.
no code implementations • 23 Dec 2018 • Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu
Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.
no code implementations • ICLR 2019 • Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
Dual learning has attracted much attention in machine learning, computer vision and natural language processing communities.
Ranked #1 on Machine Translation on WMT2016 English-German
no code implementations • 22 Feb 2019 • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).
no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.
no code implementations • 6 Apr 2019 • Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu
Recently, G2P conversion is viewed as a sequence to sequence task and modeled by RNN or CNN based encoder-decoder framework.
Ranked #1 on Text-To-Speech Synthesis on CMUDict 0.7b
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Apr 2019 • Lijun Zhang, Tie-Yan Liu, Zhi-Hua Zhou
We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure.
no code implementations • 13 May 2019 • Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 28 May 2019 • Yufei Wang, Qiwei Ye, Tie-Yan Liu
In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy.
no code implementations • 28 May 2019 • Ruihan Yang, Qiwei Ye, Tie-Yan Liu
Based on that, We proposed an end-to-end algorithm to learn exploration policy by meta-learning.
no code implementations • 29 May 2019 • Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu
Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.
no code implementations • ACL 2019 • Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.
1 code implementation • ICLR 2019 • Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu
We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.
no code implementations • IJCNLP 2019 • Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu
We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.
no code implementations • 25 Aug 2019 • Ziyu Liu, Guolin Ke, Jiang Bian, Tie-Yan Liu
Instead of using fixed coding matrix and decoding strategy, LightMC uses a differentiable decoding strategy, which enables it to dynamically optimize the coding matrix and decoding strategy, toward increasing the overall accuracy of multiclass classification, via back propagation jointly with the training of base learners in an iterative way.
no code implementations • IJCNLP 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them.
Ranked #1 on Machine Translation on WMT2016 English-German (SacreBLEU metric, using extra training data)
no code implementations • IJCNLP 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.
no code implementations • NeurIPS 2019 • Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Guangwen Yang, Tie-Yan Liu
Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel.
no code implementations • WS 2019 • Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin, Tie-Yan Liu
We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks.
no code implementations • 26 Nov 2019 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin
By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.
no code implementations • 29 Dec 2016 • Quanming Yao, James T. Kwok, Fei Gao, Wei Chen, Tie-Yan Liu
The proximal gradient algorithm has been popularly used for convex optimization.
Optimization and Control
no code implementations • 25 Dec 2019 • Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu
Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e. g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e. g., rich resource and low resource, one-to-many, and many-to-one translation).
no code implementations • 30 Mar 2020 • Junjie Li, Sotetsu Koyamada, Qiwei Ye, Guoqing Liu, Chao Wang, Ruihan Yang, Li Zhao, Tao Qin, Tie-Yan Liu, Hsiao-Wuen Hon
Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI.
no code implementations • ACL 2020 • Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu
In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 27 Apr 2020 • Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu
While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.