Search Results for author: Di He

Found 70 papers, 31 papers with code

Machine Translation With Weakly Paired Bilingual Documents

no code implementations ICLR 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu

Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.

Translation Unsupervised Machine Translation

Finding the Dominant Winning Ticket in Pre-Trained Language Models

no code implementations Findings (ACL) 2022 Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan

Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

1 code implementation9 Mar 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

no code implementations28 Feb 2022 Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

no code implementations22 Feb 2022 Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.

Action Detection Activity Detection +1

Learning Physics-Informed Neural Networks without Stacked Back-propagation

no code implementations18 Feb 2022 Di He, Wenlei Shi, Shanda Li, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu

Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE).

HousE: Knowledge Graph Embedding with Householder Parameterization

1 code implementation16 Feb 2022 Rui Li, Jianan Zhao, Chaozhuo Li, Di He, Yiqi Wang, Yuming Liu, Hao Sun, Senzhang Wang, Weiwei Deng, Yanming Shen, Xing Xie, Qi Zhang

The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties.

Knowledge Graph Embedding

Do Transformers Really Perform Badly for Graph Representation?

no code implementations NeurIPS 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Can Vision Transformers Perform Convolution?

no code implementations2 Nov 2021 Shanda Li, Xiangning Chen, Di He, Cho-Jui Hsieh

Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers.

Boosting the Certified Robustness of L-infinity Distance Nets

2 code implementations ICLR 2022 Bohang Zhang, Du Jiang, Di He, LiWei Wang

Recently, Zhang et al. (2021) developed a new neural network architecture based on $\ell_\infty$-distance functions, which naturally possesses certified $\ell_\infty$ robustness by its construction.

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations NeurIPS 2021 Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

4 code implementations15 Jun 2021 Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations9 Jun 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Classification Graph Regression +1

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation31 May 2021 Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

How could Neural Networks understand Programs?

1 code implementation10 May 2021 Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Transformers with Competitive Ensembles of Independent Mechanisms

no code implementations27 Feb 2021 Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.

Speech Enhancement

LazyFormer: Self Attention with Lazy Update

no code implementations25 Feb 2021 Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu

In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers.

Revisiting Language Encoding in Learning Multilingual Representations

1 code implementation16 Feb 2021 Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu

The language embedding can be either added to the word embedding or attached at the beginning of the sentence.

Word Embeddings

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

2 code implementations10 Feb 2021 Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang

This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.

CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

1 code implementation31 Jan 2021 Di He, Lei Xie

Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.

Transfer Learning

Pretrain-to-Finetune Adversarial Training via Sample-wise Randomized Smoothing

no code implementations1 Jan 2021 Lei Wang, Runtian Zhai, Di He, LiWei Wang, Li Jian

For certification, we carefully allocate specific robust regions for each test sample.

Taking Notes on the Fly Helps Language Pre-Training

no code implementations ICLR 2021 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

no code implementations9 Oct 2020 Di He, Lei Xie

An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).

Domain Adaptation Representation Learning

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation7 Sep 2020 Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Graph Classification Graph Representation Learning

Taking Notes on the Fly Helps BERT Pre-training

no code implementations4 Aug 2020 Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization.

Transferred Discrepancy: Quantifying the Difference Between Representations

no code implementations24 Jul 2020 Yunzhen Feng, Runtian Zhai, Di He, Li-Wei Wang, Bin Dong

Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions.

Rethinking Positional Encoding in Language Pre-training

2 code implementations ICLR 2021 Guolin Ke, Di He, Tie-Yan Liu

In this work, we investigate the positional encoding methods used in language pre-training (e. g., BERT) and identify several problems in the existing formulations.

Natural Language Understanding Word Embeddings

MC-BERT: Efficient Language Pre-Training via a Meta Controller

1 code implementation10 Jun 2020 Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu

Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.

Cloze Test Language Modelling +3

Invertible Image Rescaling

3 code implementations ECCV 2020 Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.

Image Super-Resolution

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

no code implementations ICLR Workshop DeepDiffEq 2019 Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu

In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.

Incorporating BERT into Neural Machine Translation

3 code implementations ICLR 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Natural Language Understanding Reading Comprehension +3

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

2 code implementations ICLR 2020 Runtian Zhai, Chen Dan, Di He, huan zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Li-Wei Wang

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.

Defective Convolutional Networks

1 code implementation19 Nov 2019 Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.

Machine Translation With Weakly Paired Documents

no code implementations IJCNLP 2019 Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.

Translation Unsupervised Machine Translation

Fast Structured Decoding for Sequence Models

1 code implementation NeurIPS 2019 Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng

However, these models assume that the decoding process of each token is conditionally independent of others.

Machine Translation Translation

On the Anomalous Generalization of GANs

no code implementations27 Sep 2019 Jinchen Xuan, Yunchang Yang, Ze Yang, Di He, Li-Wei Wang

Motivated by this observation, we discover two specific problems of GANs leading to anomalous generalization behaviour, which we refer to as the sample insufficiency and the pixel-wise combination.

Defective Convolutional Layers Learn Robust CNNs

no code implementations25 Sep 2019 Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang

We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.

Hint-Based Training for Non-Autoregressive Machine Translation

1 code implementation IJCNLP 2019 Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.

Machine Translation Translation

Multilingual Neural Machine Translation with Language Clustering

no code implementations IJCNLP 2019 Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, Tie-Yan Liu

We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space.

Machine Translation Translation

Representation Degeneration Problem in Training Natural Language Generation Models

no code implementations ICLR 2019 Jun Gao, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}.

Language Modelling Machine Translation +3

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

2 code implementations ICLR 2020 Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu

In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.

Adversarially Robust Generalization Just Requires More Unlabeled Data

1 code implementation3 Jun 2019 Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang

Neural network robustness has recently been highlighted by the existence of adversarial examples.

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

no code implementations28 May 2019 Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.

Hint-based Training for Non-Autoregressive Translation

no code implementations ICLR 2019 Zhuohan Li, Di He, Fei Tian, Tao Qin, Li-Wei Wang, Tie-Yan Liu

To improve the accuracy of NART models, in this paper, we propose to leverage the hints from a well-trained ART model to train the NART model.

Machine Translation Translation

Multilingual Neural Machine Translation with Knowledge Distillation

1 code implementation ICLR 2019 Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.

Knowledge Distillation Machine Translation +1

Non-Autoregressive Machine Translation with Auxiliary Regularization

no code implementations22 Feb 2019 Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu

However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).

Machine Translation Translation

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

no code implementations23 Dec 2018 Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models.

Machine Translation Translation +1

When CTC Training Meets Acoustic Landmarks

no code implementations5 Nov 2018 Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Automatic Speech Recognition

Expressiveness in Deep Reinforcement Learning

no code implementations27 Sep 2018 Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu

Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.

Atari Games reinforcement-learning +1

Augmenting Input Method Language Model with user Location Type Information

1 code implementation21 Sep 2018 Di He

Geo-tags from micro-blog posts have been shown to be useful in many data mining applications.

Social and Information Networks Computers and Society

FRAGE: Frequency-Agnostic Word Representation

2 code implementations NeurIPS 2018 Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.

Language Modelling Machine Translation +4

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter

no code implementations EMNLP 2018 Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.

Machine Translation Text Summarization +1

Double Path Networks for Sequence to Sequence Learning

1 code implementation COLING 2018 Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Towards Binary-Valued Gates for Robust LSTM Training

1 code implementation ICML 2018 Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling.

Dense Information Flow for Neural Machine Translation

1 code implementation NAACL 2018 Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework.

Machine Translation Translation

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations15 May 2018 Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Automatic Speech Recognition Multi-Task Learning

Decoding with Value Networks for Neural Machine Translation

no code implementations NeurIPS 2017 Di He, Hanqing Lu, Yingce Xia, Tao Qin, Li-Wei Wang, Tie-Yan Liu

Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g., BLEU score) of the partial target sentence if it is completed by the NMT model.

Machine Translation Translation

Dual Learning for Machine Translation

1 code implementation NeurIPS 2016 Yingce Xia, Di He, Tao Qin, Li-Wei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma

Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).

Language Modelling Machine Translation +2

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves

no code implementations7 Apr 2016 Fei Tian, Bin Gao, Di He, Tie-Yan Liu

We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.

Short-Text Conversation Topic Models

A Game-theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search

no code implementations3 Jun 2014 Di He, Wei Chen, Li-Wei Wang, Tie-Yan Liu

Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers.

Bilevel Optimization

A Theoretical Analysis of NDCG Type Ranking Measures

no code implementations24 Apr 2013 Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen

We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.

Cannot find the paper you are looking for? You can Submit a new open access paper.