Search Results for author: Li Dong

Found 114 papers, 63 papers with code

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation ICML 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Language Modelling Natural Language Understanding +1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

no code implementations27 Feb 2024 Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

Towards Optimal Learning of Language Models

no code implementations27 Feb 2024 Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei

This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance.

Cross-target Stance Detection by Exploiting Target Analytical Perspectives

no code implementations3 Jan 2024 Daijun Ding, Rong Chen, Liwen Jing, BoWen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song

In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.

Language Modelling Large Language Model +1

Large Language Model Enhanced Multi-Agent Systems for 6G Communications

no code implementations13 Dec 2023 Feibo Jiang, Li Dong, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato, Octavia A. Dobre

The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e. g., network optimization and management by allowing users to input task requirements to LLMs by nature language.

Language Modelling Large Language Model +2

BioCLIP: A Vision Foundation Model for the Tree of Life

1 code implementation30 Nov 2023 Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge.

BitNet: Scaling 1-bit Transformers for Large Language Models

2 code implementations17 Oct 2023 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modelling Quantization

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation4 Oct 2023 Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides.

Image Generation

Large Language Model for Science: A Study on P vs. NP

1 code implementation11 Sep 2023 Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei

In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics.

Language Modelling Large Language Model

Large AI Model Empowered Multimodal Semantic Communications

no code implementations3 Sep 2023 Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You

To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, in which we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency.

Language Modelling Large Language Model

LAMBO: Large Language Model Empowered Edge Intelligence

no code implementations29 Aug 2023 Li Dong, Feibo Jiang, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Robert Schober

Next-generation edge intelligence is anticipated to bring huge benefits to various applications, e. g., offloading systems.

Active Learning Decision Making +3

Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition

1 code implementation4 Aug 2023 Jiacheng Deng, Li Dong, Jiahao Chen, Diqun Yan, Rangding Wang, Dengpan Ye, Lingchen Zhao, Jinyu Tian

In this work, we propose a novel and effective defense mechanism termed the Universal Defensive Underpainting Patch (UDUP) that modifies the underpainting of text images instead of the characters.

Optical Character Recognition Optical Character Recognition (OCR)

Retentive Network: A Successor to Transformer for Large Language Models

8 code implementations17 Jul 2023 Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.

Language Modelling

Large AI Model-Based Semantic Communications

no code implementations7 Jul 2023 Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You

Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed-reality, and the Internet of everything.

Mixed Reality

LongNet: Scaling Transformers to 1,000,000,000 Tokens

3 code implementations5 Jul 2023 Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

Scaling sequence length has become a critical demand in the era of large language models.

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

no code implementations28 Jun 2023 Zhe Ye, Terui Mao, Li Dong, Diqun Yan

This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion.

Backdoor Attack Voice Conversion

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations26 Jun 2023 Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Image Captioning In-Context Learning +8

Semi-Offline Reinforcement Learning for Optimized Text Generation

1 code implementation16 Jun 2023 Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline.

Offline RL reinforcement-learning +2

Knowledge Distillation of Large Language Models

1 code implementation14 Jun 2023 Yuxian Gu, Li Dong, Furu Wei, Minlie Huang

We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution.

Instruction Following Knowledge Distillation +1

Augmenting Language Models with Long-Term Memory

no code implementations NeurIPS 2023 Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.

In-Context Learning Language Modelling +1

Pre-Training to Learn in Context

1 code implementation16 May 2023 Yuxian Gu, Li Dong, Furu Wei, Minlie Huang

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community.

In-Context Learning Language Modelling +3

Language Models as Inductive Reasoners

1 code implementation21 Dec 2022 Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei

To this end, we propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1. 2k rule-fact pairs for the task, where rules and facts are written in natural language.

Philosophy

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

1 code implementation20 Dec 2022 Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.

In-Context Learning Open-Ended Question Answering

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

1 code implementation20 Dec 2022 Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.

Denoising Sentence +1

Optimizing Prompts for Text-to-Image Generation

2 code implementations NeurIPS 2023 Yaru Hao, Zewen Chi, Li Dong, Furu Wei

Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.

Language Modelling Prompt Engineering +2

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

1 code implementation13 Dec 2022 Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei

Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.

In-Context Learning

Joint Optimization of Deployment and Trajectory in UAV and IRS-Assisted IoT Data Collection System

no code implementations27 Oct 2022 Li Dong, Zhibin Liu, Feibo Jiang, Kezhi Wang

To address this issue, we propose a joint optimization framework of deployment and trajectory (JOLT), where an adaptive whale optimization algorithm (AWOA) is applied to optimize the deployment of the UAV, and an elastic ring self-organizing map (ERSOM) is introduced to optimize the trajectory of the UAV.

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

no code implementations26 Oct 2022 Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.

Representation Learning

A Unified View of Masked Image Modeling

1 code implementation19 Oct 2022 Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.

Image Classification Segmentation +1

Semi-Supervised Semantic Segmentation with Cross Teacher Training

1 code implementation Neurocomputing 2022 Hui Xiao, Li Dong, Kangkang Song, Hao Xu, Shuibo Fu, Diqun Yan, Chengbin Peng

In experiments, the cross-teacher module significantly improves the performance of traditional student-teacher approaches, and our framework outperforms stateof-the-art methods on benchmark datasets.

Contrastive Learning Semi-Supervised Semantic Segmentation

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

2 code implementations12 Aug 2022 Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.

Image Classification Knowledge Distillation +2

Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations

no code implementations30 Jun 2022 Mingyu Dong, Jiahao Chen, Diqun Yan, Jingxing Gao, Li Dong, Rangding Wang

Experimental results show that the proposed method can not only detect the adversarial examples with high accuracy, but also detect the specific category of the AEs.

Language Models are General-Purpose Interfaces

1 code implementation13 Jun 2022 Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Causal Language Modeling Few-Shot Learning +6

VL-BEiT: Generative Vision-Language Pretraining

no code implementations2 Jun 2022 Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei

Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.

Image Classification Language Modelling +7

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

no code implementations Findings (ACL) 2022 Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei

As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).

Privacy Preserving

Visually-Augmented Language Modeling

1 code implementation20 May 2022 Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.

Image Retrieval Language Modelling +1

Prototypical Calibration for Few-shot Learning of Language Models

1 code implementation20 May 2022 Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei

In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.

Few-Shot Learning In-Context Learning

StableMoE: Stable Routing Strategy for Mixture of Experts

1 code implementation ACL 2022 Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei

We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.

Language Modelling Machine Translation

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment

no code implementations ACL 2022 Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei

We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.

Question Answering Visual Entailment +1

DeepNet: Scaling Transformers to 1,000 Layers

6 code implementations1 Mar 2022 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.

Translation

Controllable Natural Language Generation with Contrastive Prefixes

no code implementations Findings (ACL) 2022 Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen

We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.

Attribute Language Modelling +1

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

no code implementations17 Feb 2022 Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao

With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.

Language Modelling

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

no code implementations7 Feb 2022 Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.

Image Classification Semantic Segmentation

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

1 code implementation15 Jan 2022 Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang

In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.

Language Modelling Question Answering

Swin Transformer V2: Scaling Up Capacity and Resolution

19 code implementations CVPR 2022 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Ranked #4 on Image Classification on ImageNet V2 (using extra training data)

Action Classification Image Classification +3

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

2 code implementations3 Nov 2021 Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.

Image Retrieval Retrieval +3

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning

1 code implementation26 Oct 2021 Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei

Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.

Abstractive Text Summarization Question Generation +2

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

1 code implementation EMNLP 2021 Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.

Language Modelling

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations25 Jun 2021 Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Machine Translation +5

Learning to Sample Replacements for ELECTRA Pre-Training

no code implementations Findings (ACL) 2021 Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei

Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.

Language Modelling Masked Language Modeling

A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents

no code implementations ACL (ECNLP) 2021 Li Dong, Matthew C. Spencer, Amir Biagi

We improve the performance significantly by evolving the model from multiclass classification to semi-supervised multi-task learning by leveraging the negative cases, domain- and task-adaptively pretrained ALBERT on customer contact texts, and a number of un-curated data with no labels.

intent-classification Multi-Task Learning +2

MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

1 code implementation EMNLP 2021 Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.

Abstractive Text Summarization Machine Translation +7

Knowledge Neurons in Pretrained Transformers

3 code implementations ACL 2022 Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei

In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

2 code implementations Findings (ACL) 2021 Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Relation XLM-R

Investigating Learning Dynamics of BERT Fine-Tuning

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Yaru Hao, Li Dong, Furu Wei, Ke Xu

The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.

Language Modelling

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

4 code implementations NAACL 2021 Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.

Contrastive Learning Cross-Lingual Transfer +2

Distributed Resource Scheduling for Large-Scale MEC Systems: A Multi-Agent Ensemble Deep Reinforcement Learning with Imitation Acceleration

no code implementations21 May 2020 Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

We consider the optimization of distributed resource scheduling to minimize the sum of task latency and energy consumption for all the Internet of things devices (IoTDs) in a large-scale mobile edge computing (MEC) system.

Decision Making Edge-computing +1

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

1 code implementation ACL 2020 Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu

Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.

Few-Shot Learning Question Answering

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

2 code implementations23 Apr 2020 Yaru Hao, Li Dong, Furu Wei, Ke Xu

The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.

Attribute

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations ECCV 2020 Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

 Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

3 code implementations28 Feb 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Language Modelling +3

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

1 code implementation NeurIPS 2020 Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou

The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).

Zero-shot Text Search

AI Driven Heterogeneous MEC System with UAV Assistance for Dynamic Environment -- Challenges and Solutions

no code implementations11 Feb 2020 Feibo Jiang, Kezhi Wang, Li Dong, Cunhua Pan, Wei Xu, Kun Yang

By taking full advantage of Computing, Communication and Caching (3C) resources at the network edge, Mobile Edge Computing (MEC) is envisioned as one of the key enablers for the next generation networks.

Decision Making Edge-computing +3

Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

no code implementations24 Jan 2020 Feibo Jiang, Kezhi Wang, Li Dong, Cunhua Pan, Kun Yang

An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet of things (IoT) users, by optimizing offloading decision, transmission power and resource allocation in the large-scale mobile edge computing (MEC) system.

Data Compression Edge-computing +1

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

no code implementations8 Nov 2019 Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu

The limited size of existing query-focused summarization datasets renders training data-driven summarization models challenging.

Data Augmentation Query-focused Summarization

Cross-Lingual Natural Language Generation via Pre-Training

1 code implementation23 Sep 2019 Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang

In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.

Abstractive Text Summarization Machine Translation +5

Visualizing and Understanding the Effectiveness of BERT

no code implementations IJCNLP 2019 Yaru Hao, Li Dong, Furu Wei, Ke Xu

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.

Language Modelling

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

no code implementations ACL 2019 Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu

We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.

Data Augmentation Machine Reading Comprehension +2

Data-to-text Generation with Entity Modeling

2 code implementations ACL 2019 Ratish Puduppully, Li Dong, Mirella Lapata

Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end.

Data-to-Text Generation Representation Learning

Unified Language Model Pre-training for Natural Language Understanding and Generation

9 code implementations NeurIPS 2019 Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +7

Multi-Task Learning for Semantic Parsing with Cross-Domain Sketch

no code implementations ICLR 2019 Huan Wang, Yuxiang Hu, Li Dong, Feijun Jiang, Zaiqing Nie

Semantic parsing which maps a natural language sentence into a formal machine-readable representation of its meaning, is highly constrained by the limited annotated training data.

Multi-Task Learning Semantic Parsing +1

An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale

no code implementations19 Apr 2019 Yong Liu, Pavel Dmitriev, Yifei HUANG, Andrew Brooks, Li Dong

Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.

Language Modelling Transfer Learning

Data-to-Text Generation with Content Selection and Planning

2 code implementations3 Sep 2018 Ratish Puduppully, Li Dong, Mirella Lapata

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order.

Data-to-Text Generation Descriptive

Coarse-to-Fine Decoding for Neural Semantic Parsing

2 code implementations ACL 2018 Li Dong, Mirella Lapata

Semantic parsing aims at mapping natural language utterances into structured meaning representations.

Semantic Parsing

Confidence Modeling for Neural Semantic Parsing

1 code implementation ACL 2018 Li Dong, Chris Quirk, Mirella Lapata

In this work we focus on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models.

Semantic Parsing

Learning to Paraphrase for Question Answering

no code implementations EMNLP 2017 Li Dong, Jonathan Mallinson, Siva Reddy, Mirella Lapata

Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need.

Question Answering Sentence

Learning to Generate Product Reviews from Attributes

no code implementations EACL 2017 Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu

This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.

Attribute Review Generation +2

Proactive Resource Management for LTE in Unlicensed Spectrum: A Deep Learning Perspective

no code implementations22 Feb 2017 Ursula Challita, Li Dong, Walid Saad

LTE in unlicensed spectrum using licensed assisted access LTE (LTE-LAA) is a promising approach to overcome the wireless spectrum scarcity.

Fairness Management

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

no code implementations25 May 2016 Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou

In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.

Term Extraction

Language to Logical Form with Neural Attention

5 code implementations ACL 2016 Li Dong, Mirella Lapata

Semantic parsing aims at mapping natural language to machine interpretable meaning representations.

Semantic Parsing

A Statistical Parsing Framework for Sentiment Classification

no code implementations CL 2015 Li Dong, Furu Wei, Shujie Liu, Ming Zhou, Ke Xu

Unlike previous works that employ syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence.

Classification General Classification +4

Cannot find the paper you are looking for? You can Submit a new open access paper.