no code implementations • ACL 2022 • Chang Liu, Xu Tan, Chongyang Tao, Zhenxin Fu, Dongyan Zhao, Tie-Yan Liu, Rui Yan
To enable the chatbot to foresee the dialogue future, we design a beam-search-like roll-out strategy for dialogue future simulation using a typical dialogue generation model and a dialogue selector.
no code implementations • ICLR 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu
Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.
no code implementations • Findings (ACL) 2022 • Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan
Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.
1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
no code implementations • 6 Jul 2023 • Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu
Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets.
no code implementations • 8 Jun 2023 • Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu
In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
no code implementations • 4 Jun 2023 • Zixin Zeng, Rui Wang, Yichong Leng, Junliang Guo, Xu Tan, Tao Qin, Tie-Yan Liu
Inspired by this translation process, we propose an Extract-and-Attend approach to enhance entity translation in NMT, where the translation candidates of source entities are first extracted from a dictionary and then attended to by the NMT model to generate the target sentence.
no code implementations • 18 May 2023 • Zequn Liu, Wei zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu
Considering that text is the most important record for scientific discovery, in this paper, we propose MolXPT, a unified language model of text and molecules pre-trained on SMILES (a sequence representation of molecules) wrapped by text.
Ranked #1 on
Molecular Property Prediction
on SIDER
1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.
Ranked #1 on
Graph Regression
on PCQM4M-LSC
(Validation MAE metric)
no code implementations • 20 Feb 2023 • Xinquan Huang, Wenlei Shi, Qi Meng, Yue Wang, Xiaotian Gao, Jia Zhang, Tie-Yan Liu
Neural networks have shown great potential in accelerating the solution of partial differential equations (PDEs).
no code implementations • 10 Feb 2023 • Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu
Neural operators, which use deep neural networks to approximate the solution mappings of partial differential equation (PDE) systems, are emerging as a new paradigm for PDE simulation.
1 code implementation • 2 Feb 2023 • Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu
The obtained motif vocabulary consists of not only molecular motifs (i. e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other.
no code implementations • 31 Jan 2023 • Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu
Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes.
Ranked #1 on
Multi-step retrosynthesis
on USPTO-190
no code implementations • 21 Jan 2023 • Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio
Regeneration learning extends the concept of representation learning to data generation tasks, and can be regarded as a counterpart of traditional representation learning, since 1) regeneration learning handles the abstraction (Y') of the target data Y for data generation while traditional representation learning handles the abstraction (X') of source data X for data understanding; 2) both the processes of Y'-->Y in regeneration learning and X-->X' in representation learning can be learned in a self-supervised way (e. g., pre-training); 3) both the mappings from X to Y' in regeneration learning and from X' to Y in representation learning are simpler than the direct mapping from X to Y.
no code implementations • 24 Dec 2022 • Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, Tie-Yan Liu
One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control.
1 code implementation • 2 Dec 2022 • Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu
In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 Nov 2022 • Yusong Wang, Shaoning Li, Zun Wang, Xinheng He, Bin Shao, Tie-Yan Liu, Tong Wang
In the technical report, we provide our solution for OGB-LSC 2022 Graph Regression Task.
no code implementations • 26 Oct 2022 • Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
Specifically, we first pre-train an antibody language model based on the sequence data, then propose a one-shot way for sequence and structure generation of CDR to avoid the heavy cost and error propagation from an autoregressive manner, and finally leverage the pre-trained antibody model for the antigen-specific antibody generation model with some carefully designed modules.
2 code implementations • 19 Oct 2022 • Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu
Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.
Ranked #1 on
Question Answering
on PubMedQA
1 code implementation • 19 Oct 2022 • Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu
A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e. g., over 10, 000 tokens), and the existing models have shortcomings in generating musical repetition structures.
1 code implementation • 9 Oct 2022 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, Tie-Yan Liu
To be specific, we develop invertible models to generate valid degraded images and meanwhile transform the distribution of lost contents to the fixed distribution of a latent variable during the forward degradation.
1 code implementation • 4 Oct 2022 • Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
Ranked #2 on
Graph Regression
on PCQM4Mv2-LSC
1 code implementation • 30 Aug 2022 • Kehan Wu, Yingce Xia, Yang Fan, Pan Deng, Haiguang Liu, Lijun Wu, Shufang Xie, Tong Wang, Tao Qin, Tie-Yan Liu
Structure-based drug design is drawing growing attentions in computer-aided drug discovery.
1 code implementation • 30 Aug 2022 • Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form.
no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen
In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.
1 code implementation • 11 Aug 2022 • Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan
These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to better and explicitly align the lyric-melody features.
2 code implementations • 20 Jul 2022 • Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu
Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications.
no code implementations • 18 Jul 2022 • Guoqing Liu, Mengzhang Cai, Li Zhao, Tao Qin, Adrian Brown, Jimmy Bischoff, Tie-Yan Liu
In this work, we propose using only screenshots/pixels as input for automated game testing and build a general game testing agent, Inspector, that can be easily applied to different games without deep integration with games.
1 code implementation • 14 Jul 2022 • Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.
no code implementations • NAACL 2022 • Kexun Zhang, Rui Wang, Xu Tan, Junliang Guo, Yi Ren, Tao Qin, Tie-Yan Liu
Furthermore, we take the best of both and design a new loss function to better handle the complicated syntactic multi-modality in real-world datasets.
no code implementations • 27 Jun 2022 • Xiaodong Yang, Huishuai Zhang, Wei Chen, Tie-Yan Liu
By ensuring differential privacy in the learning algorithms, one can rigorously mitigate the risk of large models memorizing sensitive training data.
1 code implementation • 20 Jun 2022 • Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu
This seriously hinders the application of deep learning based methods, which require a large scale of supervised data.
Ranked #1 on
Drug Discovery
on KIBA
no code implementations • 20 Jun 2022 • Rui Zhang, Peiyan Hu, Qi Meng, Yue Wang, Rongchan Zhu, Bingguang Chen, Zhi-Ming Ma, Tie-Yan Liu
To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.
no code implementations • 19 Jun 2022 • Wenlei Shi, Xinquan Huang, Xiaotian Gao, Xinran Wei, Jia Zhang, Jiang Bian, Mao Yang, Tie-Yan Liu
Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE).
1 code implementation • 6 Jun 2022 • Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang
Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.
1 code implementation • 30 May 2022 • Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu
Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.
1 code implementation • 26 May 2022 • Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.
1 code implementation • 25 May 2022 • Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu
We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately.
3 code implementations • 9 May 2022 • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on
Text-To-Speech Synthesis
on LJSpeech
(using extra training data)
1 code implementation • 20 Apr 2022 • Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-Yan Liu
While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+11
no code implementations • 13 Apr 2022 • Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
1 code implementation • 13 Apr 2022 • Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu
Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.
no code implementations • 1 Apr 2022 • Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
We model the speaker characteristics systematically to improve the generalization on new speakers.
no code implementations • 28 Mar 2022 • Zimeng Li, Shichao Zhu, Bin Shao, Tie-Yan Liu, Xiangxiang Zeng, Tong Wang
Drug-drug interaction (DDI) prediction provides a drug combination strategy for systemically effective treatment.
1 code implementation • ICLR 2022 • Wei Fan, Shun Zheng, Xiaohan Yi, Wei Cao, Yanjie Fu, Jiang Bian, Tie-Yan Liu
However, the complicated dependencies of the PTS signal on its inherent periodicity as well as the sophisticated composition of various periods hinder the performance of PTS forecasting.
3 code implementations • 9 Mar 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.
no code implementations • 28 Feb 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.
no code implementations • ACL 2022 • Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
Then we conduct a comprehensive study on NAR-TTS models that use some advanced modeling methods.
no code implementations • 18 Feb 2022 • Lin Huang, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu
How to discover the useful implicit relation between entities and effectively utilize the relations for each entity under various circumstances is crucial.
1 code implementation • 18 Feb 2022 • Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu
In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.
no code implementations • 18 Feb 2022 • Lin Huang, Qiyuan Dong, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu
As a specific semantic segmentation task, aerial imagery segmentation has been widely employed in high spatial resolution (HSR) remote sensing images understanding.
no code implementations • ICLR 2022 • Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL).
1 code implementation • 3 Feb 2022 • Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu
Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.
1 code implementation • 12 Dec 2021 • Wentao Xu, Yingce Xia, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path.
2 code implementations • COLING 2022 • Zhiping Luo, Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
Learning the embeddings of knowledge graphs (KG) is vital in artificial intelligence, and can benefit various downstream applications, such as recommendation and question answering.
no code implementations • NeurIPS 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
1 code implementation • NeurIPS 2021 • Jinpeng Li, Yingce Xia, Rui Yan, Hongda Sun, Dongyan Zhao, Tie-Yan Liu
Considering there is no parallel data between the contexts and the responses of target style S1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S1 and an intermediate style S0 is used.
1 code implementation • NeurIPS 2021 • Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu
To avoid such a spurious correlation, we propose \textbf{La}tent \textbf{C}ausal \textbf{I}nvariance \textbf{M}odels (LaCIM) that specifies the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction.
1 code implementation • NeurIPS 2021 • He Zhang, Fusong Ju, Jianwei Zhu, Liang He, Bin Shao, Nanning Zheng, Tie-Yan Liu
These methods generally derive coevolutionary features by aggregating the learned residue representations from individual sequences with equal weights, which is inconsistent with the premise that residue co-evolutions are a reflection of collective covariation patterns of numerous homologous proteins.
no code implementations • NeurIPS 2021 • Jiawei Chen, Xu Tan, Yichong Leng, Jin Xu, Guihua Wen, Tao Qin, Tie-Yan Liu
Experiments on LJSpeech datasets demonstrate that Speech-T 1) is more robust than the attention based autoregressive TTS model due to its inherent monotonic alignments between text and speech; 2) naturally supports streaming TTS with good voice quality; and 3) enjoys the benefit of joint modeling TTS and ASR in a single network.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • NeurIPS 2021 • Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu
However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies.
1 code implementation • 3 Nov 2021 • Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu
However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies.
1 code implementation • 1 Nov 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.
no code implementations • 29 Oct 2021 • Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu
The key problem in the protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
1 code implementation • NeurIPS 2021 • Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu
Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations.
1 code implementation • 26 Oct 2021 • Weitao Du, He Zhang, Yuanqi Du, Qi Meng, Wei Chen, Bin Shao, Tie-Yan Liu
In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently.
2 code implementations • 26 Oct 2021 • Wentao Xu, Weiqing Liu, Lewen Wang, Yingce Xia, Jiang Bian, Jian Yin, Tie-Yan Liu
To overcome the shortcomings of previous work, we proposed a novel stock trend forecasting framework that can adequately mine the concept-oriented shared information from predefined concepts and hidden concepts.
no code implementations • NeurIPS 2021 • Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu
To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources.
Distributional Reinforcement Learning
reinforcement-learning
+1
no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.
no code implementations • 14 Oct 2021 • Siyuan Liu, Yusong Wang, Tong Wang, Yifan Deng, Liang He, Bin Shao, Jian Yin, Nanning Zheng, Tie-Yan Liu
The identification of active binding drugs for target proteins (termed as drug-target interaction prediction) is the key challenge in virtual screening, which plays an essential role in drug discovery.
no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
The momentum acceleration technique is widely adopted in many optimization algorithms.
1 code implementation • ICLR 2022 • Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Rui Yan, Tie-Yan Liu
Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens.
2 code implementations • ICLR 2022 • Chongchong Li, Yue Wang, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu
Then we proposed a two-model-based learning method to control the prediction error and the gradient error.
no code implementations • 29 Sep 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu
In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.
no code implementations • 29 Sep 2021 • Qiwei Ye, Yuxuan Song, Chang Liu, Fangyun Wei, Tao Qin, Tie-Yan Liu
Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification.
Ranked #1 on
MuJoCo Games
on Ant-v3
1 code implementation • Findings (EMNLP) 2021 • Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu
Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Sep 2021 • Mingxiao Feng, Guozi Liu, Li Zhao, Lei Song, Jiang Bian, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
We consider inventory management (IM) problem for a single store with a large number of SKUs (stock keeping units) in this paper, where we need to make replenishment decisions for each SKU to balance its supply and demand.
no code implementations • 27 Sep 2021 • Yutai Hou, Yingce Xia, Lijun Wu, Shufang Xie, Yang Fan, Jinhua Zhu, Wanxiang Che, Tao Qin, Tie-Yan Liu
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
1 code implementation • 20 Sep 2021 • Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu
In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).
1 code implementation • 14 Sep 2021 • Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting.
no code implementations • 29 Aug 2021 • Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li
In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar.
no code implementations • 9 Jul 2021 • Rui Wang, Xu Tan, Renqian Luo, Tao Qin, Tie-Yan Liu
Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data.
no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu
While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.
1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu
In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.
1 code implementation • 3 Jul 2021 • Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu
Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance.
1 code implementation • NeurIPS 2021 • Chang Liu, Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu
This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch.
1 code implementation • 29 Jun 2021 • Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.
no code implementations • 29 Jun 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu
However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.
9 code implementations • NeurIPS 2021 • Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
Ranked #4 on
Machine Translation
on WMT2014 English-French
no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu
Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.
1 code implementation • 17 Jun 2021 • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.
Ranked #1 on
Molecular Property Prediction
on HIV dataset
1 code implementation • 17 Jun 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.
1 code implementation • ICLR 2022 • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu
Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density.
2 code implementations • Findings (ACL) 2021 • Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu
Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding.
4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
Ranked #1 on
Graph Regression
on PCQM4M-LSC
no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.
1 code implementation • CVPR 2022 • Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu
Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.
no code implementations • 31 May 2021 • Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu
Energy conservation is a basic physics principle, the breakdown of which often implies new physics.
no code implementations • 30 May 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu
The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly.
no code implementations • 27 May 2021 • Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu
We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy.
no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.
1 code implementation • 10 May 2021 • Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.
1 code implementation • NeurIPS 2021 • Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu
A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 20 Apr 2021 • Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu
In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.
no code implementations • NAACL 2021 • Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, Tie-Yan Liu
Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models.
Ranked #3 on
Machine Translation
on IWSLT2014 English-German
no code implementations • 9 Apr 2021 • Disheng Tang, Wei Cao, Jiang Bian, Tie-Yan Liu, Zhifeng Gao, Shun Zheng, Jue Liu
We used a stochastic metapopulation model with a hierarchical structure and fitted the model to the positive cases in the US from the start of outbreak to the end of 2020.
1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.
no code implementations • 3 Mar 2021 • Jindong Wang, Wenjie Feng, Chang Liu, Chaohui Yu, Mingxuan Du, Renjun Xu, Tao Qin, Tie-Yan Liu
Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification.
2 code implementations • ICLR 2021 • Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu
2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.
2 code implementations • ICLR 2021 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu
The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.
no code implementations • 25 Feb 2021 • Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu
In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.
no code implementations • ICLR 2021 • Guoqing Liu, Chuheng Zhang, Li Zhao, Tao Qin, Jinhua Zhu, Jian Li, Nenghai Yu, Tie-Yan Liu
Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL).
1 code implementation • 16 Feb 2021 • Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu
The language embedding can be either added to the word embedding or attached at the beginning of the sentence.
no code implementations • 15 Feb 2021 • Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, Tie-Yan Liu
To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts.
4 code implementations • 8 Feb 2021 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu
Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.
no code implementations • 28 Jan 2021 • Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, Tie-Yan Liu
As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument.
no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.
no code implementations • 6 Jan 2021 • Yao Li, Tong Wang, Juanrong Zhang, Bin Shao, Haipeng Gong, Yusong Wang, Siyuan Liu, Tie-Yan Liu
We performed molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs).
no code implementations • 1 Jan 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu
NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.
no code implementations • 1 Jan 2021 • Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu
More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.
no code implementations • ICLR 2021 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.
1 code implementation • 1 Jan 2021 • Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu
For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i. e., using more future information for training than inference) generally outperforms wait-k training.
1 code implementation • 24 Dec 2020 • Wenlei Shi, Xinran Wei, Jia Zhang, Xiaoyuan Ni, Arthur Jiang, Jiang Bian, Tie-Yan Liu
While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward.
no code implementations • 17 Dec 2020 • Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu
In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.
1 code implementation • 11 Dec 2020 • Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu
Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.
no code implementations • NeurIPS 2020 • Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu
In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards.
no code implementations • 4 Nov 2020 • Xinwei Sun, Botong Wu, Xiangyu Zheng, Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu
To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
1 code implementation • NeurIPS 2021 • Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu
Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output.
no code implementations • 19 Oct 2020 • Hao Wang, Jia Zhang, Yingce Xia, Jiang Bian, Chao Zhang, Tie-Yan Liu
However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes.
2 code implementations • 22 Sep 2020 • Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, Tie-Yan Liu
Quantitative investment aims to maximize the return and minimize the risk in a sequential trading period over a set of financial instruments.
1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang
We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.
Ranked #23 on
Graph Property Prediction
on ogbg-molhiv
1 code implementation • 3 Sep 2020 • Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu
To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling.
1 code implementation • 18 Aug 2020 • Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.
no code implementations • 9 Aug 2020 • Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu
However, there are more than 6, 000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Aug 2020 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.
1 code implementation • 21 Jul 2020 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.
no code implementations • 17 Jul 2020 • Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu
SAT contains a hyperparameter k, and each k value defines a SAT task with different degrees of parallelism.
1 code implementation • 17 Jul 2020 • Chaohui Yu, Jindong Wang, Chang Liu, Tao Qin, Renjun Xu, Wenjie Feng, Yiqiang Chen, Tie-Yan Liu
However, it remains challenging to determine which method is suitable for a given application since they are built with certain priors or bias.
2 code implementations • 10 Jul 2020 • Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu
In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict.
no code implementations • 9 Jul 2020 • Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu
DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers.
1 code implementation • 9 Jul 2020 • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction.
Ranked #81 on
Neural Architecture Search
on ImageNet
no code implementations • ACL 2020 • Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu
In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
3 code implementations • ICLR 2021 • Guolin Ke, Di He, Tie-Yan Liu
In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.
no code implementations • 24 Jun 2020 • Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.
no code implementations • 22 Jun 2020 • Yaolong Wang, Mingqing Xiao, Chang Liu, Shuxin Zheng, Tie-Yan Liu
Specifically, ILC introduces an invertible encoding module to replace the encoder-decoder structure to produce the low dimensional informative latent representation, meanwhile, transform the lost information into an auxiliary latent variable that won't be further coded or stored.
1 code implementation • 18 Jun 2020 • Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.
Ranked #4 on
Machine Translation
on WMT2014 English-German
(SacreBLEU metric)
no code implementations • 14 Jun 2020 • Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Ke-jun Zhang, Tie-Yan Liu
Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training.
1 code implementation • 10 Jun 2020 • Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu
Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.
1 code implementation • 8 Jun 2020 • Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu
Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.
32 code implementations • ICLR 2021 • Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.
Ranked #6 on
Text-To-Speech Synthesis
on LJSpeech
(using extra training data)
no code implementations • 17 May 2020 • Zhibing Zhao, Yingce Xia, Tao Qin, Lirong Xia, Tie-Yan Liu
Dual learning has been successfully applied in many machine learning applications including machine translation, image-to-image transformation, etc.
8 code implementations • ECCV 2020 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu
High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.
1 code implementation • ACL 2020 • Wentao Xu, Shun Zheng, Liang He, Bin Shao, Jian Yin, Tie-Yan Liu
In recent years, knowledge graph embedding becomes a pretty hot research topic of artificial intelligence and plays increasingly vital roles in various downstream applications, such as recommendation and question answering.
Ranked #1 on
Link Prediction
on YAGO37
no code implementations • 27 Apr 2020 • Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu
While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.
no code implementations • ACL 2020 • Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu
In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
6 code implementations • NeurIPS 2020 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.
1 code implementation • 5 Apr 2020 • Yuxuan Song, Qiwei Ye, Minkai Xu, Tie-Yan Liu
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
Ranked #6 on
Image Generation
on