Search Results for author: Di He

Found 99 papers, 47 papers with code

Machine Translation With Weakly Paired Bilingual Documents

no code implementations ICLR 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu

Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.

Sentence Translation +1

Finding the Dominant Winning Ticket in Pre-Trained Language Models

no code implementations Findings (ACL) 2022 Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan

Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.

Boosting Meta-Training with Base Class Information for Few-Shot Learning

no code implementations6 Mar 2024 Weihao Jiang, Guodong Liu, Di He, Kun He

However, as a non-end-to-end training method, indicating the meta-training stage can only begin after the completion of pre-training, Meta-Baseline suffers from higher training cost and suboptimal performance due to the inherent conflicts of the two training stages.

Few-Shot Learning

Do Efficient Transformers Really Save Computation?

no code implementations21 Feb 2024 Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, LiWei Wang

Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.

Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks

1 code implementation19 Feb 2024 Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin

Neuromorphic computing with spiking neural networks is promising for energy-efficient artificial intelligence (AI) applications.

Continual Learning

DOF: Accelerating High-order Differential Operators with Forward Propagation

no code implementations15 Feb 2024 Ruichen Li, Chuwei Wang, Haotian Ye, Di He, LiWei Wang

Solving partial differential equations (PDEs) efficiently is essential for analyzing complex physical systems.

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

no code implementations29 Jan 2024 Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, LiWei Wang

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE).

Disentanglement Position

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations26 Jan 2024 Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness

1 code implementation16 Jan 2024 Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, LiWei Wang

Specifically, we identify a fundamental expressivity measure termed homomorphism expressivity, which quantifies the ability of GNN models to count graphs under homomorphism.

Graph Learning Subgraph Counting

End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

no code implementations8 Jan 2024 Qingsi Lai, Lin Yao, Zhifeng Gao, Siyuan Liu, Hongshuai Wang, Shuqi Lu, Di He, LiWei Wang, Cheng Wang, Guolin Ke

To validate the effectiveness of XtalNet, we curate a much more challenging and practical dataset hMOF-100, XtalNet performs well on this dataset, reaching 96. 3\% top-10 hit ratio on the database retrieval task and 95. 0\% top-10 match rate on the ranked structure generation task.

Contrastive Learning Retrieval

REST: Retrieval-Based Speculative Decoding

1 code implementation14 Nov 2023 Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity

no code implementations23 Sep 2023 Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, LiWei Wang, Zhouchen Lin, Song-Chun Zhu

With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.

Distributed Optimization

Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

no code implementations NeurIPS 2023 Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, LiWei Wang

By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length.

Decision Making Math

Personalized Predictive ASR for Latency Reduction in Voice Assistants

no code implementations23 May 2023 Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

no code implementations23 Mar 2023 Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.

Multi-Armed Bandits

3D Molecular Generation via Virtual Dynamics

no code implementations12 Feb 2023 Shuqi Lu, Lin Yao, Xi Chen, Hang Zheng, Di He, Guolin Ke

Extensive experiment results on pocket-based molecular generation demonstrate that VD-Gen can generate novel 3D molecules to fill the target pocket cavity with high binding affinities, significantly outperforming previous baselines.

Drug Discovery

Rethinking the Expressive Power of GNNs via Graph Biconnectivity

1 code implementation23 Jan 2023 Bohang Zhang, Shengjie Luo, LiWei Wang, Di He

In this paper, we take a fundamentally different perspective to study the expressive power of GNNs beyond the WL test.

Matching entropy based disparity estimation from light field

no code implementations28 Oct 2022 Ligen Shi, Chang Liu, Di He, Xing Zhao, Jun Qiu

A major challenge for matching-based depth estimation is to prevent mismatches in occlusion and smooth regions.

Depth Estimation Disparity Estimation

Denoising Masked AutoEncoders Help Robust Classification

1 code implementation10 Oct 2022 Quanlin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, LiWei Wang, Di He

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images.

Classification Denoising +1

Online Training Through Time for Spiking Neural Networks

1 code implementation9 Oct 2022 Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin

With OTTT, it is the first time that two mainstream supervised SNN training methods, BPTT with SG and spike representation-based training, are connected, and meanwhile in a biologically plausible form.

Event data classification Gesture Recognition +1

Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective

1 code implementation4 Oct 2022 Bohang Zhang, Du Jiang, Di He, LiWei Wang

Designing neural networks with bounded Lipschitz constant is a promising way to obtain certifiably robust classifiers against adversarial examples.

Robust classification

One Transformer Can Understand Both 2D & 3D Molecular Data

1 code implementation4 Oct 2022 Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He

To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.

Graph Regression molecular representation +1

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

no code implementations9 Jun 2022 Huishuai Zhang, Da Yu, Yiping Lu, Di He

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.

Is $L^2$ Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?

1 code implementation4 Jun 2022 Chuwei Wang, Shanda Li, Di He, LiWei Wang

In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero.

Your Transformer May Not be as Powerful as You Expect

1 code implementation26 May 2022 Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He

Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

no code implementations13 Apr 2022 Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.

Denoising

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

3 code implementations9 Mar 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

Benchmarking Graph Regression +1

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

no code implementations28 Feb 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

no code implementations22 Feb 2022 Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.

Action Detection Activity Detection +3

Learning Physics-Informed Neural Networks without Stacked Back-propagation

1 code implementation18 Feb 2022 Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu

In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.

HousE: Knowledge Graph Embedding with Householder Parameterization

1 code implementation16 Feb 2022 Rui Li, Jianan Zhao, Chaozhuo Li, Di He, Yiqi Wang, Yuming Liu, Hao Sun, Senzhang Wang, Weiwei Deng, Yanming Shen, Xing Xie, Qi Zhang

The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties.

Knowledge Graph Embedding Relation +1

Do Transformers Really Perform Badly for Graph Representation?

no code implementations NeurIPS 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Can Vision Transformers Perform Convolution?

no code implementations2 Nov 2021 Shanda Li, Xiangning Chen, Di He, Cho-Jui Hsieh

Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers.

Boosting the Certified Robustness of L-infinity Distance Nets

2 code implementations ICLR 2022 Bohang Zhang, Du Jiang, Di He, LiWei Wang

Recently, Zhang et al. (2021) developed a new neural network architecture based on $\ell_\infty$-distance functions, which naturally possesses certified $\ell_\infty$ robustness by its construction.

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations NeurIPS 2021 Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

4 code implementations15 Jun 2021 Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations9 Jun 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Classification Graph Property Prediction +2

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation CVPR 2022 Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Vocal Bursts Valence Prediction

How could Neural Networks understand Programs?

1 code implementation10 May 2021 Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.

valid

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Transformers with Competitive Ensembles of Independent Mechanisms

no code implementations27 Feb 2021 Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.

Speech Enhancement

LazyFormer: Self Attention with Lazy Update

no code implementations25 Feb 2021 Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu

In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.

Revisiting Language Encoding in Learning Multilingual Representations

1 code implementation16 Feb 2021 Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu

The language embedding can be either added to the word embedding or attached at the beginning of the sentence.

Sentence Word Embeddings

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

2 code implementations10 Feb 2021 Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang

This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.

CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

1 code implementation31 Jan 2021 Di He, Lei Xie

Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.

Transfer Learning

Pretrain-to-Finetune Adversarial Training via Sample-wise Randomized Smoothing

no code implementations1 Jan 2021 Lei Wang, Runtian Zhai, Di He, LiWei Wang, Li Jian

For certification, we carefully allocate specific robust regions for each test sample.

Taking Notes on the Fly Helps Language Pre-Training

no code implementations ICLR 2021 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Sentence

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

no code implementations9 Oct 2020 Di He, Lei Xie

An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).

Domain Adaptation Representation Learning

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation7 Sep 2020 Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Graph Classification Graph Representation Learning

Taking Notes on the Fly Helps BERT Pre-training

no code implementations4 Aug 2020 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Sentence

Transferred Discrepancy: Quantifying the Difference Between Representations

no code implementations24 Jul 2020 Yunzhen Feng, Runtian Zhai, Di He, Li-Wei Wang, Bin Dong

Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions.

Rethinking Positional Encoding in Language Pre-training

3 code implementations ICLR 2021 Guolin Ke, Di He, Tie-Yan Liu

In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.

Natural Language Understanding Sentence +1

MC-BERT: Efficient Language Pre-Training via a Meta Controller

1 code implementation10 Jun 2020 Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu

Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.

Binary Classification Cloze Test +4

Invertible Image Rescaling

10 code implementations ECCV 2020 Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.

Image Super-Resolution

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

no code implementations ICLR Workshop DeepDiffEq 2019 Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu

In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.

Sentence

Incorporating BERT into Neural Machine Translation

3 code implementations ICLR 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Natural Language Understanding NMT +5

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

2 code implementations ICLR 2020 Runtian Zhai, Chen Dan, Di He, huan zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Li-Wei Wang

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.

Defective Convolutional Networks

1 code implementation19 Nov 2019 Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.

Machine Translation With Weakly Paired Documents

no code implementations IJCNLP 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.

Sentence Translation +1

Fast Structured Decoding for Sequence Models

1 code implementation NeurIPS 2019 Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng

However, these models assume that the decoding process of each token is conditionally independent of others.

Machine Translation Sentence +1

On the Anomalous Generalization of GANs

no code implementations27 Sep 2019 Jinchen Xuan, Yunchang Yang, Ze Yang, Di He, Li-Wei Wang

Motivated by this observation, we discover two specific problems of GANs leading to anomalous generalization behaviour, which we refer to as the sample insufficiency and the pixel-wise combination.

Defective Convolutional Layers Learn Robust CNNs

no code implementations25 Sep 2019 Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang

We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.

Hint-Based Training for Non-Autoregressive Machine Translation

1 code implementation IJCNLP 2019 Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.

Machine Translation Translation

Multilingual Neural Machine Translation with Language Clustering

no code implementations IJCNLP 2019 Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu

We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.

Clustering Machine Translation +2

Representation Degeneration Problem in Training Natural Language Generation Models

1 code implementation ICLR 2019 Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.

Language Modelling Machine Translation +3

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

2 code implementations ICLR 2020 Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu

In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.

Position Sentence

Adversarially Robust Generalization Just Requires More Unlabeled Data

1 code implementation3 Jun 2019 Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang

Neural network robustness has recently been highlighted by the existence of adversarial examples.

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

no code implementations28 May 2019 Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.

regression Second-order methods

Hint-based Training for Non-Autoregressive Translation

no code implementations ICLR 2019 Zhuohan Li, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

To improve the accuracy of NART models, in this paper, we propose to leverage the hints from a well-trained ART model to train the NART model.

Machine Translation Translation

Multilingual Neural Machine Translation with Knowledge Distillation

1 code implementation ICLR 2019 Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.

Knowledge Distillation Machine Translation +1

Non-Autoregressive Machine Translation with Auxiliary Regularization

no code implementations22 Feb 2019 Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu

However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).

Machine Translation Sentence +1

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

no code implementations23 Dec 2018 Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.

Machine Translation Sentence +2

When CTC Training Meets Acoustic Landmarks

no code implementations5 Nov 2018 Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Automatic Speech Recognition (ASR)

Expressiveness in Deep Reinforcement Learning

no code implementations27 Sep 2018 Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu

Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.

Atari Games reinforcement-learning +2

Augmenting Input Method Language Model with user Location Type Information

1 code implementation21 Sep 2018 Di He

Geo-tags from micro-blog posts have been shown to be useful in many data mining applications.

Social and Information Networks Computers and Society

FRAGE: Frequency-Agnostic Word Representation

2 code implementations NeurIPS 2018 Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.

Language Modelling Machine Translation +5

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter

no code implementations EMNLP 2018 Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.

Machine Translation Sentence +2

Double Path Networks for Sequence to Sequence Learning

1 code implementation COLING 2018 Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Towards Binary-Valued Gates for Robust LSTM Training

1 code implementation ICML 2018 Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling.

Dense Information Flow for Neural Machine Translation

1 code implementation NAACL 2018 Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.

Machine Translation NMT +1

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations15 May 2018 Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Decoding with Value Networks for Neural Machine Translation

no code implementations NeurIPS 2017 Di He, Hanqing Lu, Yingce Xia, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g., BLEU score) of the partial target sentence if it is completed by the NMT model.

Machine Translation NMT +2

Dual Learning for Machine Translation

1 code implementation NeurIPS 2016 Yingce Xia, Di He, Tao Qin, Li-Wei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma

Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).

Language Modelling Machine Translation +4

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves

no code implementations7 Apr 2016 Fei Tian, Bin Gao, Di He, Tie-Yan Liu

We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.

Sentence Short-Text Conversation +1

A Game-theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search

no code implementations3 Jun 2014 Di He, Wei Chen, Li-Wei Wang, Tie-Yan Liu

Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers.

BIG-bench Machine Learning Bilevel Optimization

A Theoretical Analysis of NDCG Type Ranking Measures

no code implementations24 Apr 2013 Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen

We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.

Vocal Bursts Type Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.