no code implementations • ICLR 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu
Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.
no code implementations • Findings (ACL) 2022 • Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan
Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.
1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
no code implementations • 23 Jul 2024 • Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, LiWei Wang
In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33. 5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process.
no code implementations • 17 Jul 2024 • Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin
Despite the efforts of some online training methods, tackling spatial credit assignments by alternatives with comparable performance as spatial BP remains a significant problem.
no code implementations • 3 Jul 2024 • Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He
Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance.
1 code implementation • 24 Jun 2024 • Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, LiWei Wang
We argue that there is a strong need for a general and flexible framework for learning both invariant and equivariant features.
1 code implementation • 27 May 2024 • Mingqing Xiao, Yixin Zhu, Di He, Zhouchen Lin
Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation.
no code implementations • 29 Apr 2024 • Han Zhong, Guhao Feng, Wei Xiong, Xinle Cheng, Li Zhao, Di He, Jiang Bian, LiWei Wang
For its practical implementation, \texttt{RTO} innovatively integrates Direct Preference Optimization (DPO) and PPO.
no code implementations • 6 Mar 2024 • Weihao Jiang, Guodong Liu, Di He, Kun He
However, as a non-end-to-end training method, indicating the meta-training stage can only begin after the completion of pre-training, Meta-Baseline suffers from higher training cost and suboptimal performance due to the inherent conflicts of the two training stages.
no code implementations • 21 Feb 2024 • Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, LiWei Wang
Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.
1 code implementation • 19 Feb 2024 • Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin
Neuromorphic computing with spiking neural networks is promising for energy-efficient artificial intelligence (AI) applications.
no code implementations • 15 Feb 2024 • Ruichen Li, Chuwei Wang, Haotian Ye, Di He, LiWei Wang
Solving partial differential equations (PDEs) efficiently is essential for analyzing complex physical systems.
1 code implementation • 29 Jan 2024 • Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, LiWei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He
In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE).
no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).
no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.
1 code implementation • 16 Jan 2024 • Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, LiWei Wang
Specifically, we identify a fundamental expressivity measure termed homomorphism expressivity, which quantifies the ability of GNN models to count graphs under homomorphism.
no code implementations • 8 Jan 2024 • Qingsi Lai, Lin Yao, Zhifeng Gao, Siyuan Liu, Hongshuai Wang, Shuqi Lu, Di He, LiWei Wang, Cheng Wang, Guolin Ke
XtalNet represents a significant advance in CSP, enabling the prediction of complex structures from PXRD data without the need for external databases or manual intervention.
1 code implementation • 14 Nov 2023 • Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.
no code implementations • 23 Sep 2023 • Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, LiWei Wang, Zhouchen Lin, Song-Chun Zhu
With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
2 code implementations • 17 Jul 2023 • Ruichen Li, Haotian Ye, Du Jiang, Xuelan Wen, Chuwei Wang, Zhe Li, Xiang Li, Di He, Ji Chen, Weiluo Ren, LiWei Wang
Neural network-based variational Monte Carlo (NN-VMC) has emerged as a promising cutting-edge technique of ab initio quantum chemistry.
no code implementations • NeurIPS 2023 • Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, LiWei Wang
By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length.
no code implementations • 23 May 2023 • Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow
Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Mar 2023 • Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh
In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.
1 code implementation • 16 Mar 2023 • Shuqi Lu, Zhifeng Gao, Di He, Linfeng Zhang, Guolin Ke
Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit.
1 code implementation • 14 Feb 2023 • Bohang Zhang, Guhao Feng, Yiheng Du, Di He, LiWei Wang
Recently, subgraph GNNs have emerged as an important direction for developing expressive graph neural networks (GNNs).
Ranked #1 on Subgraph Counting - C6 on Synthetic Graph
no code implementations • 12 Feb 2023 • Shuqi Lu, Lin Yao, Xi Chen, Hang Zheng, Di He, Guolin Ke
Extensive experiment results on pocket-based molecular generation demonstrate that VD-Gen can generate novel 3D molecules to fill the target pocket cavity with high binding affinities, significantly outperforming previous baselines.
no code implementations • 3 Feb 2023 • Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller
We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs).
1 code implementation • 23 Jan 2023 • Bohang Zhang, Shengjie Luo, LiWei Wang, Di He
In this paper, we take a fundamentally different perspective to study the expressive power of GNNs beyond the WL test.
3 code implementations • CVPR 2023 • Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, LiWei Wang
However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points.
Ranked #1 on 3D Object Detection on waymo cyclist
no code implementations • 28 Oct 2022 • Ligen Shi, Chang Liu, Di He, Xing Zhao, Jun Qiu
A major challenge for matching-based depth estimation is to prevent mismatches in occlusion and smooth regions.
1 code implementation • 10 Oct 2022 • Quanlin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, LiWei Wang, Di He
In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images.
1 code implementation • 9 Oct 2022 • Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin
With OTTT, it is the first time that two mainstream supervised SNN training methods, BPTT with SG and spike representation-based training, are connected, and meanwhile in a biologically plausible form.
Ranked #3 on Event data classification on CIFAR10-DVS
1 code implementation • 4 Oct 2022 • Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
Ranked #4 on Graph Regression on PCQM4Mv2-LSC
1 code implementation • 4 Oct 2022 • Bohang Zhang, Du Jiang, Di He, LiWei Wang
Designing neural networks with bounded Lipschitz constant is a promising way to obtain certifiably robust classifiers against adversarial examples.
no code implementations • 9 Jun 2022 • Huishuai Zhang, Da Yu, Yiping Lu, Di He
Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.
1 code implementation • 4 Jun 2022 • Chuwei Wang, Shanda Li, Di He, LiWei Wang
In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero.
1 code implementation • 26 May 2022 • Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He
Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.
no code implementations • 13 Apr 2022 • Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
3 code implementations • 9 Mar 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.
no code implementations • 28 Feb 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.
no code implementations • 22 Feb 2022 • Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas
Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.
1 code implementation • 18 Feb 2022 • Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu
In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.
1 code implementation • 16 Feb 2022 • Rui Li, Jianan Zhao, Chaozhuo Li, Di He, Yiqi Wang, Yuming Liu, Hao Sun, Senzhang Wang, Weiwei Deng, Yanming Shen, Xing Xie, Qi Zhang
The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties.
no code implementations • NeurIPS 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
no code implementations • 23 Nov 2021 • Tian Cai, Li Xie, Muge Chen, Yang Liu, Di He, Shuo Zhang, Cameron Mura, Philip E. Bourne, Lei Xie
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology.
no code implementations • 2 Nov 2021 • Shanda Li, Xiangning Chen, Di He, Cho-Jui Hsieh
Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers.
2 code implementations • ICLR 2022 • Bohang Zhang, Du Jiang, Di He, LiWei Wang
Recently, Zhang et al. (2021) developed a new neural network architecture based on $\ell_\infty$-distance functions, which naturally possesses certified $\ell_\infty$ robustness by its construction.
no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu
Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.
4 code implementations • 15 Jun 2021 • Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He
In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.
4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
Ranked #1 on Graph Regression on PCQM4M-LSC
1 code implementation • CVPR 2022 • Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu
Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.
1 code implementation • 10 May 2021 • Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.
no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.
no code implementations • 27 Feb 2021 • Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio
In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.
no code implementations • 25 Feb 2021 • Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu
In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.
1 code implementation • 18 Feb 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, TieYan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
1 code implementation • 16 Feb 2021 • Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu
The language embedding can be either added to the word embedding or attached at the beginning of the sentence.
2 code implementations • 10 Feb 2021 • Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang
This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.
1 code implementation • 31 Jan 2021 • Di He, Lei Xie
Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.
no code implementations • ICLR 2021 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.
no code implementations • 1 Jan 2021 • Lei Wang, Runtian Zhai, Di He, LiWei Wang, Li Jian
For certification, we carefully allocate specific robust regions for each test sample.
no code implementations • 9 Oct 2020 • Di He, Lei Xie
An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).
1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang
We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.
Ranked #25 on Graph Property Prediction on ogbg-molhiv
no code implementations • 4 Aug 2020 • Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu
In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.
no code implementations • 24 Jul 2020 • Yunzhen Feng, Runtian Zhai, Di He, Li-Wei Wang, Bin Dong
Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions.
3 code implementations • ICLR 2021 • Guolin Ke, Di He, Tie-Yan Liu
In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.
1 code implementation • 10 Jun 2020 • Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu
Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.
10 code implementations • ECCV 2020 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu
High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu
In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.
3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.
8 code implementations • ICML 2020 • Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Li-Wei Wang, Tie-Yan Liu
This motivates us to remove the warm-up stage for the training of Pre-LN Transformers.
2 code implementations • ICLR 2020 • Runtian Zhai, Chen Dan, Di He, huan zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Li-Wei Wang
Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.
1 code implementation • 19 Nov 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang
Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.
no code implementations • IJCNLP 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.
1 code implementation • NeurIPS 2019 • Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng
However, these models assume that the decoding process of each token is conditionally independent of others.
no code implementations • 27 Sep 2019 • Jinchen Xuan, Yunchang Yang, Ze Yang, Di He, Li-Wei Wang
Motivated by this observation, we discover two specific problems of GANs leading to anomalous generalization behaviour, which we refer to as the sample insufficiency and the pixel-wise combination.
no code implementations • 25 Sep 2019 • Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang
We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.
1 code implementation • IJCNLP 2019 • Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.
no code implementations • IJCNLP 2019 • Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu
We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.
1 code implementation • ICLR 2019 • Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu
We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.
2 code implementations • ICLR 2020 • Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu
In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.
1 code implementation • 3 Jun 2019 • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang
Neural network robustness has recently been highlighted by the existence of adversarial examples.
no code implementations • 28 May 2019 • Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang
First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.
no code implementations • ICLR 2019 • Zhuohan Li, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu
To improve the accuracy of NART models, in this paper, we propose to leverage the hints from a well-trained ART model to train the NART model.
1 code implementation • ICLR 2019 • Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu
Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.
no code implementations • 22 Feb 2019 • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).
no code implementations • 23 Dec 2018 • Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu
Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.
no code implementations • 12 Dec 2018 • Chengyue Gong, Xu Tan, Di He, Tao Qin
Maximum-likelihood estimation (MLE) is widely used in sequence to sequence tasks for model training.
no code implementations • NeurIPS 2018 • Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.
no code implementations • 5 Nov 2018 • Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen
In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.
no code implementations • 27 Sep 2018 • Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu
Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.
1 code implementation • 21 Sep 2018 • Di He
Geo-tags from micro-blog posts have been shown to be useful in many data mining applications.
Social and Information Networks Computers and Society
2 code implementations • NeurIPS 2018 • Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.
Ranked #3 on Machine Translation on IWSLT2015 German-English
no code implementations • EMNLP 2018 • Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.
1 code implementation • COLING 2018 • Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu
In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.
1 code implementation • ICML 2018 • Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling.
1 code implementation • NAACL 2018 • Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu
Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.
Ranked #68 on Machine Translation on WMT2014 English-German
no code implementations • 15 May 2018 • Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen
Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • NeurIPS 2017 • Di He, Hanqing Lu, Yingce Xia, Tao Qin, Li-Wei Wang, Tie-Yan Liu
Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g., BLEU score) of the partial target sentence if it is completed by the NMT model.
1 code implementation • NeurIPS 2016 • Yingce Xia, Di He, Tao Qin, Li-Wei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma
Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).
no code implementations • 7 Apr 2016 • Fei Tian, Bin Gao, Di He, Tie-Yan Liu
We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.
no code implementations • 3 Jun 2014 • Di He, Wei Chen, Li-Wei Wang, Tie-Yan Liu
Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers.
no code implementations • 24 Apr 2013 • Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen
We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.