1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
no code implementations • 13 Jan 2025 • Chengzu Li, Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan Vulić, Furu Wei
Ultimately, MVoT establishes new possibilities for complex reasoning tasks where visual thinking can effectively complement verbal reasoning.
no code implementations • 28 Dec 2024 • Li Dong, Yubo Peng, Feibo Jiang, Kezhi Wang, Kun Yang
Therefore, first, we present an eXplainable Semantic Federated Learning (XSFL) to train the SC model, thus ensuring data privacy and security.
1 code implementation • 11 Dec 2024 • Yutao Sun, Hangbo Bao, Wenhui Wang, Zhiliang Peng, Li Dong, Shaohan Huang, Jianyong Wang, Furu Wei
In this work, we propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers.
no code implementations • 10 Dec 2024 • Yao Fu, Yinsicheng Jiang, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Kai Zou, Edoardo Ponti, Luo Mai
The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently; however, MoE systems rely on heterogeneous compute and memory resources.
no code implementations • 4 Dec 2024 • Yaoyao Chang, Lei Cui, Li Dong, Shaohan Huang, Yangyu Huang, Yupan Huang, Scarlett Li, Tengchao Lv, Shuming Ma, Qinzheng Sun, Wenhui Wang, Furu Wei, Ying Xin, Mao Yang, Qiufeng Yin, Xingxing Zhang
This study explores the untapped potential of Common Crawl as a comprehensive and flexible resource for pre-training LLMs, addressing both general-purpose language understanding and specialized domain knowledge.
no code implementations • 6 Nov 2024 • Feibo Jiang, Siwei Tu, Li Dong, Cunhua Pan, Jiangzhou Wang, Xiaohu You
The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom).
no code implementations • 6 Nov 2024 • Yuhao He, Jinyu Tian, Xianwei Zheng, Li Dong, Yuanman Li, Jiantao Zhou
We achieve this by ensuring the poisoned model's loss function has a similar value as a normally trained model at each input sample but with a large local curvature.
no code implementations • 11 Oct 2024 • Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang
As a result, we propose an explainable personalized FL framework, called XPFL.
1 code implementation • 9 Oct 2024 • Yuxian Gu, Li Dong, Hongning Wang, Yaru Hao, Qingxiu Dong, Furu Wei, Minlie Huang
In our experiments, we adopt PDS to select data from CommmonCrawl and show that the PDS-selected corpus accelerates the learning of LMs and constantly boosts their performance on a wide range of downstream tasks across various model sizes.
no code implementations • 9 Oct 2024 • Qingxiu Dong, Li Dong, Xingxing Zhang, Zhifang Sui, Furu Wei
Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses.
1 code implementation • 7 Oct 2024 • Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei
Transformer tends to overallocate attention to irrelevant context.
no code implementations • 3 Oct 2024 • Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang
Then, to train the GSC model using the local data of MUs while ensuring privacy and accommodating heterogeneous requirements of MUs, we introduce Personalized Semantic Federated Learning (PSFL).
no code implementations • 2 Aug 2024 • Li Dong, Feibo Jiang, Minjie Wang, Yubo Peng, Xiaolong Li
Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs.
no code implementations • 8 Jul 2024 • Henry Shaowu Yuchi, Shixiang Zhu, Li Dong, Yigit M. Arisoy, Matthew C. Spencer
Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis.
no code implementations • 28 Jun 2024 • Yixing Li, Yuxian Gu, Li Dong, Dequan Wang, Yu Cheng, Furu Wei
Meanwhile, we prove the value and effectiveness of the introduced implicit reward and output preference in KD through experiments and theoretical analysis.
1 code implementation • 8 May 2024 • Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.
1 code implementation • 3 May 2024 • Jiawei Zhou, Li Dong, Furu Wei, Lei Chen
The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored.
no code implementations • 26 Apr 2024 • Yuanman Li, Yingjie He, Changsheng chen, Li Dong, Bin Li, Jiantao Zhou, Xia Li
To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods.
no code implementations • 20 Apr 2024 • Feibo Jiang, Li Dong, Siwei Tu, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato
Large Language Models (LLMs) have revolutionized natural language processing tasks.
1 code implementation • 4 Apr 2024 • Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.
no code implementations • 2 Apr 2024 • Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng
Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data.
1 code implementation • 23 Mar 2024 • Daijun Ding, Li Dong, Zhichao Huang, Guangning Xu, Xu Huang, Bo Liu, Liwen Jing, BoWen Zhang
To address these issues, we propose an encoder-decoder data augmentation (EDDA) framework.
no code implementations • 9 Mar 2024 • Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You
Semantic Communication (SC) is a novel paradigm for data transmission in 6G.
3 code implementations • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).
no code implementations • 27 Feb 2024 • Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei
This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance.
no code implementations • 20 Feb 2024 • Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang Sui, Benyou Wang, Wai Lam, Furu Wei
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs).
no code implementations • 3 Jan 2024 • Daijun Ding, Rong Chen, Liwen Jing, BoWen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song
In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.
no code implementations • 13 Dec 2023 • Feibo Jiang, Li Dong, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato, Octavia A. Dobre
The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e. g., network optimization and management by allowing users to input task requirements to LLMs by nature language.
2 code implementations • CVPR 2024 • Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su
We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge.
3 code implementations • 17 Oct 2023 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
In this paper we present KOSMOS-2. 5, a multimodal literate model for machine reading of text-intensive images.
1 code implementation • 11 Sep 2023 • Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics.
no code implementations • 3 Sep 2023 • Feibo Jiang, Li Dong, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You
To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, where we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency.
no code implementations • 29 Aug 2023 • Li Dong, Feibo Jiang, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Robert Schober
Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques.
1 code implementation • 4 Aug 2023 • Jiacheng Deng, Li Dong, Jiahao Chen, Diqun Yan, Rangding Wang, Dengpan Ye, Lingchen Zhao, Jinyu Tian
In this work, we propose a novel and effective defense mechanism termed the Universal Defensive Underpainting Patch (UDUP) that modifies the underpainting of text images instead of the characters.
Optical Character Recognition Optical Character Recognition (OCR)
9 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.
no code implementations • 7 Jul 2023 • Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You
Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed reality, and the Internet of Everything.
3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Scaling sequence length has become a critical demand in the era of large language models.
no code implementations • 28 Jun 2023 • Zhe Ye, Terui Mao, Li Dong, Diqun Yan
This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion.
2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
Ranked #13 on Visual Question Answering on ViP-Bench
1 code implementation • 16 Jun 2023 • Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline.
2 code implementations • 14 Jun 2023 • Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
In this work, we propose a KD approach that distills LLMs into smaller language models.
no code implementations • NeurIPS 2023 • Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.
1 code implementation • 16 May 2023 • Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community.
1 code implementation • CVPR 2023 • Wei Huang, Zhiliang Peng, Li Dong, Furu Wei, Jianbin Jiao, Qixiang Ye
Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms.
1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
1 code implementation • 10 Jan 2023 • Hao Xu, Hui Xiao, Huazheng Hao, Li Dong, Xiaojie Qiu, Chengbin Peng
We also propose a mechanism to select a few pseudo-negative labels to feed into submodels.
no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
1 code implementation • 21 Dec 2022 • Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei
To this end, we propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1. 2k rule-fact pairs for the task, where rules and facts are written in natural language.
5 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei
Position modeling plays a critical role in Transformers.
1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei
We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.
1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li
Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.
3 code implementations • NeurIPS 2023 • Yaru Hao, Zewen Chi, Li Dong, Furu Wei
Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.
1 code implementation • 13 Dec 2022 • Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.
no code implementations • NeurIPS 2023 • Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL).
1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Large Transformers have achieved state-of-the-art performance across many tasks.
no code implementations • 27 Oct 2022 • Li Dong, Zhibin Liu, Feibo Jiang, Kezhi Wang
To address this issue, we propose a joint optimization framework of deployment and trajectory (JOLT), where an adaptive whale optimization algorithm (AWOA) is applied to optimize the deployment of the UAV, and an elastic ring self-organizing map (ERSOM) is introduced to optimize the trajectory of the UAV.
no code implementations • 26 Oct 2022 • Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.
1 code implementation • 19 Oct 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.
1 code implementation • CVPR 2023 • Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei
Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align images and texts.
1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei
Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.
4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
1 code implementation • Neurocomputing 2022 • Hui Xiao, Li Dong, Kangkang Song, Hao Xu, Shuibo Fu, Diqun Yan, Chengbin Peng
In experiments, the cross-teacher module significantly improves the performance of traditional student-teacher approaches, and our framework outperforms stateof-the-art methods on benchmark datasets.
2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
Ranked #1 on Visual Reasoning on NLVR2 Test
2 code implementations • 12 Aug 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.
Ranked #29 on Self-Supervised Image Classification on ImageNet
no code implementations • 30 Jun 2022 • Mingyu Dong, Jiahao Chen, Diqun Yan, Jingxing Gao, Li Dong, Rangding Wang
Experimental results show that the proposed method can not only detect the adversarial examples with high accuracy, but also detect the specific category of the AEs.
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on Image Captioning on nocaps val
no code implementations • 2 Jun 2022 • Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei
Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.
no code implementations • Findings (ACL) 2022 • Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).
1 code implementation • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.
1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.
2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
We also present a comprehensive analysis on the representation and routing behaviors of our models.
1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei
We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.
no code implementations • ACL 2022 • Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.
6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.
no code implementations • 17 Feb 2022 • Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao
With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.
no code implementations • 10 Feb 2022 • Yulong Chen, Yang Liu, Li Dong, Shuohang Wang, Chenguang Zhu, Michael Zeng, Yue Zhang
However, for prompt learning, there are still two salient gaps between NLP tasks and pretraining.
no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.
1 code implementation • 15 Jan 2022 • Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang
In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.
22 code implementations • CVPR 2022 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
Ranked #4 on Image Classification on ImageNet V2 (using extra training data)
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
2 code implementations • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.
Ranked #2 on Image Retrieval on PhotoChat
1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei
Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.
2 code implementations • EMNLP 2021 • Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.
Ranked #1 on Zero-Shot Cross-Lingual Transfer on XTREME
no code implementations • Findings (ACL) 2021 • Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei
Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.
no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.
2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.
14 code implementations • ICLR 2022 • Hangbo Bao, Li Dong, Songhao Piao, Furu Wei
We first "tokenize" the original image into visual tokens.
Ranked #11 on Document Layout Analysis on PubLayNet val
1 code implementation • ACL 2021 • Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei
The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences.
no code implementations • ACL (ECNLP) 2021 • Li Dong, Matthew C. Spencer, Amir Biagi
We improve the performance significantly by evolving the model from multiclass classification to semi-supervised multi-task learning by leveraging the negative cases, domain- and task-adaptively pretrained ALBERT on customer contact texts, and a number of un-curated data with no labels.
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
2 code implementations • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei
Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.
3 code implementations • ACL 2022 • Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei
In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.
1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.
2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.
4 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.
Ranked #16 on Zero-Shot Cross-Lingual Transfer on XTREME
no code implementations • 21 May 2020 • Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan
We consider the optimization of distributed resource scheduling to minimize the sum of task latency and energy consumption for all the Internet of things devices (IoTDs) in a large-scale mobile edge computing (MEC) system.
1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu
Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.
Ranked #189 on Question Answering on SQuAD1.1
2 code implementations • 23 Apr 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.
4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.
Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)
3 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)
1 code implementation • NeurIPS 2020 • Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou
The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).
Ranked #8 on Zero-shot Text Search on BEIR
no code implementations • 11 Feb 2020 • Feibo Jiang, Kezhi Wang, Li Dong, Cunhua Pan, Wei Xu, Kun Yang
By taking full advantage of Computing, Communication and Caching (3C) resources at the network edge, Mobile Edge Computing (MEC) is envisioned as one of the key enablers for the next generation networks.
no code implementations • 24 Jan 2020 • Feibo Jiang, Kezhi Wang, Li Dong, Cunhua Pan, Kun Yang
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet of things (IoT) users, by optimizing offloading decision, transmission power and resource allocation in the large-scale mobile edge computing (MEC) system.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Zewen Chi, Li Dong, Furu Wei, Xian-Ling Mao, He-Yan Huang
Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer.
no code implementations • 8 Nov 2019 • Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu
The limited size of existing query-focused summarization datasets renders training data-driven summarization models challenging.
no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou
Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.
1 code implementation • CONLL 2019 • Xiao Huang, Li Dong, Elizabeth Boschee, Nanyun Peng
Named entity recognition (NER) identifies typed entity mentions in raw text.
Ranked #13 on Named Entity Recognition (NER) on NCBI-disease
1 code implementation • 23 Sep 2019 • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang
In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.
no code implementations • IJCNLP 2019 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.
no code implementations • ACL 2019 • Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu
We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.
2 code implementations • ACL 2019 • Ratish Puduppully, Li Dong, Mirella Lapata
Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end.
Ranked #3 on Data-to-Text Generation on MLB Dataset
9 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.
Ranked #2 on Generative Question Answering on CoQA (using extra training data)
no code implementations • ICLR 2019 • Huan Wang, Yuxiang Hu, Li Dong, Feijun Jiang, Zaiqing Nie
Semantic parsing which maps a natural language sentence into a formal machine-readable representation of its meaning, is highly constrained by the limited annotated training data.
no code implementations • 19 Apr 2019 • Yong Liu, Pavel Dmitriev, Yifei HUANG, Andrew Brooks, Li Dong
Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.
2 code implementations • 3 Sep 2018 • Ratish Puduppully, Li Dong, Mirella Lapata
Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order.
2 code implementations • ACL 2018 • Li Dong, Mirella Lapata
Semantic parsing aims at mapping natural language utterances into structured meaning representations.
Ranked #1 on Semantic Parsing on Geo
1 code implementation • ACL 2018 • Li Dong, Chris Quirk, Mirella Lapata
In this work we focus on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models.
no code implementations • EMNLP 2017 • Li Dong, Jonathan Mallinson, Siva Reddy, Mirella Lapata
Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need.
no code implementations • EACL 2017 • Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu
This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.
no code implementations • 22 Feb 2017 • Ursula Challita, Li Dong, Walid Saad
LTE in unlicensed spectrum using licensed assisted access LTE (LTE-LAA) is a promising approach to overcome the wireless spectrum scarcity.
no code implementations • 25 May 2016 • Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou
In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.
3 code implementations • EMNLP 2016 • Jianpeng Cheng, Li Dong, Mirella Lapata
In this paper we address the question of how to render sequence-level networks better at handling structured input.
Ranked #56 on Natural Language Inference on SNLI
5 code implementations • ACL 2016 • Li Dong, Mirella Lapata
Semantic parsing aims at mapping natural language to machine interpretable meaning representations.
no code implementations • CL 2015 • Li Dong, Furu Wei, Shujie Liu, Ming Zhou, Ke Xu
Unlike previous works that employ syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence.