no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.
1 code implementation • ACL 2022 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +6
1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
no code implementations • EACL (AdaptNLP) 2021 • Tianyu Chen, Shaohan Huang, Furu Wei, JianXin Li
In unsupervised domain adaptation, we aim to train a model that works well on a target domain when provided with labeled source samples and unlabeled target samples.
no code implementations • ACL 2022 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
no code implementations • 22 May 2024 • Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao
This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation.
1 code implementation • 20 May 2024 • Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
1 code implementation • 8 May 2024 • Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.
1 code implementation • 3 May 2024 • Jiawei Zhou, Li Dong, Furu Wei, Lei Chen
The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored.
no code implementations • 23 Apr 2024 • Xun Wu, Shaohan Huang, Furu Wei
Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts.
no code implementations • 23 Apr 2024 • Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei
These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form.
1 code implementation • 21 Apr 2024 • Xun Wu, Shaohan Huang, Furu Wei
LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques.
1 code implementation • 18 Apr 2024 • Dawei Zhu, Liang Wang, Nan Yang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li
This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training.
1 code implementation • 4 Apr 2024 • Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.
no code implementations • 1 Apr 2024 • Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei
This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
no code implementations • 31 Mar 2024 • Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei
In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.
no code implementations • 5 Mar 2024 • Zhengyang Tang, Xingxing Zhang, Benyou Wan, Furu Wei
Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions.
1 code implementation • 28 Feb 2024 • Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune large language models (LLMs).
4 code implementations • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).
no code implementations • 27 Feb 2024 • Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei
This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance.
no code implementations • 26 Feb 2024 • Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
no code implementations • 24 Feb 2024 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Large language models (LLMs) have emerged as a promising alternative to expensive human evaluations.
no code implementations • 21 Feb 2024 • Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, Qi Zhang
The remarkable capability of large language models (LLMs) for in-context learning (ICL) needs to be activated by demonstration examples.
no code implementations • 20 Feb 2024 • Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang Sui, Benyou Wang, Wai Lam, Furu Wei
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs).
no code implementations • 19 Feb 2024 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio.
2 code implementations • 15 Feb 2024 • Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss.
1 code implementation • 8 Feb 2024 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.
no code implementations • 2 Feb 2024 • Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei
While Large Language Models (LLMs) have demonstrated their proficiency in complex reasoning tasks, their performance in dynamic, interactive, and competitive scenarios - such as business strategy and stock market analysis - remains underexplored.
1 code implementation • 14 Jan 2024 • Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method.
2 code implementations • 31 Dec 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
no code implementations • 30 Dec 2023 • Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei
In this paper, we conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech synthesis models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder.
no code implementations • 6 Dec 2023 • Wenhui Wang, Shuming Ma, Hanwen Xu, Naoto Usuyama, Jiayu Ding, Hoifung Poon, Furu Wei
This technical report presents LongViT, a vision Transformer that can process gigapixel images in an end-to-end manner.
no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.
1 code implementation • 15 Nov 2023 • Jinghan Yang, Shuming Ma, Furu Wei
In the era of Large Language Models (LLMs), human-computer interaction has evolved towards natural language, offering unprecedented flexibility.
1 code implementation • 6 Nov 2023 • Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei
This paper introduces Alympics (Olympics for Agents), a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
no code implementations • 23 Oct 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others.
1 code implementation • 20 Oct 2023 • Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei
Furthermore, we apply probabilistic ranking and contextual ranking sequentially to the instruction-tuned LLM.
1 code implementation • 20 Oct 2023 • Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability.
2 code implementations • 17 Oct 2023 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.
1 code implementation • 12 Oct 2023 • Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, Jimmy Lin
Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models.
no code implementations • 12 Oct 2023 • Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan
In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
1 code implementation • 29 Sep 2023 • Xin Cheng, Xun Wang, Tao Ge, Si-Qing Chen, Furu Wei, Dongyan Zhao, Rui Yan
In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
no code implementations • 23 Sep 2023 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation.
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.
1 code implementation • 19 Sep 2023 • Dawei Zhu, Nan Yang, Liang Wang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li
To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window.
1 code implementation • 18 Sep 2023 • Daixuan Cheng, Shaohan Huang, Furu Wei
Taken inspiration from human learning via reading comprehension--practice after reading improves the ability to answer questions based on the learned knowledge--we propose a simple method for transforming raw corpora into reading comprehension texts.
1 code implementation • 11 Sep 2023 • Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics.
no code implementations • 24 Aug 2023 • Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism.
8 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.
2 code implementations • 14 Jul 2023 • Liang Wang, Nan Yang, Furu Wei
Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever.
1 code implementation • 13 Jul 2023 • Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei
These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management.
2 code implementations • 11 Jul 2023 • Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji
In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas.
3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Scaling sequence length has become a critical demand in the era of large language models.
2 code implementations • 27 Jun 2023 • Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li
However, only learning to generate is insufficient for generative retrieval.
2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
Ranked #11 on Visual Question Answering on ViP-Bench
2 code implementations • 14 Jun 2023 • Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
In this work, we propose a KD approach that distills LLMs into smaller language models.
no code implementations • NeurIPS 2023 • Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.
1 code implementation • 26 May 2023 • Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li
Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target.
no code implementations • 25 May 2023 • Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei
Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.
2 code implementations • 24 May 2023 • Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Tom Kocmi, Furu Wei
Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements.
no code implementations • 23 May 2023 • Lan Jiang, Haoyang Huang, Dongdong Zhang, Rui Jiang, Furu Wei
Notably, the analysis demonstrates that our method significantly influences the initial training process, leading to more efficient convergence and superior solutions.
no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.
2 code implementations • 18 May 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
1 code implementation • 16 May 2023 • Ziheng Li, Shaohan Huang, Zihan Zhang, Zhi-Hong Deng, Qiang Lou, Haizhen Huang, Jian Jiao, Furu Wei, Weiwei Deng, Qi Zhang
Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding.
1 code implementation • 16 May 2023 • Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community.
no code implementations • 11 May 2023 • Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
no code implementations • 11 May 2023 • Hongyuan Lu, Haoyang Huang, Dongdong Zhang, Haoran Yang, Wai Lam, Furu Wei
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data.
no code implementations • 6 May 2023 • Beiduo Chen, Shaohan Huang, Zihan Zhang, Wu Guo, ZhenHua Ling, Haizhen Huang, Furu Wei, Weiwei Deng, Qi Zhang
Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision.
2 code implementations • 17 Apr 2023 • Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei
By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.
2 code implementations • 10 Apr 2023 • Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei
We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references.
1 code implementation • 15 Mar 2023 • Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization.
no code implementations • 14 Mar 2023 • Liang Wang, Nan Yang, Furu Wei
This paper introduces a simple yet effective query expansion approach, denoted as query2doc, to improve both sparse and dense retrieval systems.
1 code implementation • 7 Mar 2023 • Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.
no code implementations • 2 Mar 2023 • Guangyue Peng, Tao Ge, Si-Qing Chen, Furu Wei, Houfeng Wang
We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size.
1 code implementation • 1 Mar 2023 • Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei
Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow.
1 code implementation • CVPR 2023 • Wei Huang, Zhiliang Peng, Li Dong, Furu Wei, Jianbin Jiao, Qixiang Ye
Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms.
1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
no code implementations • 17 Jan 2023 • Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
6 code implementations • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
1 code implementation • 21 Dec 2022 • Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei
To this end, we propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1. 2k rule-fact pairs for the task, where rules and facts are written in natural language.
1 code implementation • 20 Dec 2022 • Xun Wang, Tao Ge, Allen Mao, Yuki Li, Furu Wei, Si-Qing Chen
We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task.
5 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei
Position modeling plays a critical role in Transformers.
1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei
We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.
1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li
Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.
2 code implementations • NeurIPS 2023 • Yaru Hao, Zewen Chi, Li Dong, Furu Wei
Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.
2 code implementations • 18 Dec 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei
In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.
Ranked #1 on Audio Classification on Balanced Audio Set
no code implementations • 15 Dec 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei
Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages.
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +4
1 code implementation • 13 Dec 2022 • Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.
no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.
Ranked #2 on Text Summarization on SAMSum
1 code implementation • 7 Dec 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.
Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)
no code implementations • NeurIPS 2023 • Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL).
1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Large Transformers have achieved state-of-the-art performance across many tasks.
no code implementations • 21 Nov 2022 • Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, LiRong Dai, Daxin Jiang, Jinyu Li, Furu Wei
Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e. g., vision, text.
no code implementations • 3 Nov 2022 • Yubo Zhang, Xingxing Zhang, Xun Wang, Si-Qing Chen, Furu Wei
In this paper, we propose Lotus (shorthand for Latent Prompt Tuning for Summarization), which is a single model that can be applied in both controlled and uncontrolled (without control signals) modes.
1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei
However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
no code implementations • 26 Oct 2022 • Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.
1 code implementation • 19 Oct 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.
no code implementations • 19 Oct 2022 • Hongcheng Guo, Jiaheng Liu, Haoyang Huang, Jian Yang, Zhoujun Li, Dongdong Zhang, Zheng Cui, Furu Wei
To this end, we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.
no code implementations • CVPR 2023 • Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei
Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align images and texts.
1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei
Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.
4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
1 code implementation • 7 Oct 2022 • Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, LiRong Dai, Jinyu Li, Furu Wei
The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
The surge of pre-training has witnessed the rapid development of document understanding recently.
Ranked #7 on Semantic entity labeling on FUNSD
1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei
In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.
no code implementations • 28 Sep 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam
Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance.
2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
Ranked #1 on Visual Reasoning on NLVR2 Test
2 code implementations • 12 Aug 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.
Ranked #27 on Self-Supervised Image Classification on <h2>oi</h2>
1 code implementation • 8 Aug 2022 • Zehan Li, Nan Yang, Liang Wang, Furu Wei
In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions.
1 code implementation • 29 Jul 2022 • Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.
no code implementations • 19 Jul 2022 • Yuan Xie, Shaohan Huang, Tianyu Chen, Furu Wei
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, Furu Wei
Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Zhoujun Li, Furu Wei
Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages.
1 code implementation • 6 Jul 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei
It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.
no code implementations • 21 Jun 2022 • Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei
Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on Image Captioning on nocaps val
1 code implementation • 12 Jun 2022 • Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, Jinyu Li
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
no code implementations • 2 Jun 2022 • Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei
Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.
no code implementations • Findings (ACL) 2022 • Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).
no code implementations • 1 Jun 2022 • Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity.
1 code implementation • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.
1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.
2 code implementations • 20 May 2022 • Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei
We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding.
Abstractive Text Summarization Grammatical Error Correction +4
no code implementations • ACL 2022 • Ruipeng Jia, Xingxing Zhang, Yanan Cao, Shi Wang, Zheng Lin, Furu Wei
In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages.
no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei
Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.
2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
We also present a comprehensive analysis on the representation and routing behaviors of our models.
2 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Ranked #1 on Key Information Extraction on EPHOIE
1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei
We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.
1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei
In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
2 code implementations • 30 Mar 2022 • Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui
We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding.
no code implementations • ACL 2022 • Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.
3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
Ranked #1 on Table Detection on ICDAR 2019
6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.
1 code implementation • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang
To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.
no code implementations • 17 Feb 2022 • Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao
With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.
1 code implementation • 16 Feb 2022 • Tao Ge, Si-Qing Chen, Furu Wei
We introduce EdgeFormer -- a parameter-efficient Transformer for on-device seq2seq generation under the strict computation and memory constraints.
no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.
no code implementations • 26 Jan 2022 • Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.
1 code implementation • 15 Jan 2022 • Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang
In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.
1 code implementation • 12 Jan 2022 • Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, Qi Zhang
We propose PromptBERT, a novel contrastive learning method for learning better sentence representation.
no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei
Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.
2 code implementations • 16 Dec 2021 • Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei
We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.
19 code implementations • CVPR 2022 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
Ranked #4 on Image Classification on ImageNet V2 (using extra training data)
no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.
2 code implementations • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.
Ranked #2 on Image Retrieval on PhotoChat
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei
Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.
1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei
Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.
5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
1 code implementation • 21 Oct 2021 • Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge.
2 code implementations • 16 Oct 2021 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +6
3 code implementations • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +8
3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu
We integrate the proposed methods into the HuBERT framework.
no code implementations • EMNLP 2021 • Jiaqi Bai, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming Zhou, Zhoujun Li
We propose a novel task of jointly repairing program codes and generating commit messages.
2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
Text recognition is a long-standing research problem for document digitalization.
Ranked #3 on Handwritten Text Recognition on IAM
2 code implementations • EMNLP 2021 • Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
no code implementations • 8 Sep 2021 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei
In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training.
1 code implementation • EMNLP 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.
1 code implementation • EMNLP 2021 • Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei
Reading order detection is the cornerstone to understanding visually-rich documents (e. g., receipts and forms).
Ranked #2 on Reading Order Detection on ReadingBank
Document Layout Analysis Optical Character Recognition (OCR) +1
no code implementations • ACL 2021 • Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, Linjun Yang
Dense passage retrieval has been shown to be an effective approach for information retrieval tasks such as open domain question answering.
no code implementations • ACL 2021 • Shuo Ren, Long Zhou, Shujie Liu, Furu Wei, Ming Zhou, Shuai Ma
While pre-training techniques are working very well in natural language processing, how to pre-train a decoder and effectively use it for neural machine translation (NMT) still remains a tricky issue.
no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei
Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.
no code implementations • 12 Jul 2021 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei
Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.
3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.
Ranked #1 on Zero-Shot Cross-Lingual Transfer on XTREME
no code implementations • Findings (ACL) 2021 • Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei
Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.
2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.
11 code implementations • ICLR 2022 • Hangbo Bao, Li Dong, Songhao Piao, Furu Wei
We first "tokenize" the original image into visual tokens.
Ranked #10 on Document Layout Analysis on PubLayNet val
1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.
1 code implementation • ACL 2021 • Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei
The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences.
1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei
Video transcript summarization is a fundamental task for video understanding.
1 code implementation • ACL 2021 • Xin Sun, Tao Ge, Furu Wei, Houfeng Wang
In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC).
1 code implementation • ACL 2022 • Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei
In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models.
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Ranked #13 on Document Image Classification on RVL-CDIP
1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.
3 code implementations • ACL 2022 • Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei
In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.
1 code implementation • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei
Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.
1 code implementation • NAACL 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Cant is important for understanding advertising, comedies and dog-whistle politics.
3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.
1 code implementation • EMNLP 2021 • Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei
In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.
2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.
5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on Key Information Extraction on SROIE
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Canwen Xu, Tao Ge, Chenliang Li, Furu Wei
Chinese and Japanese share many characters with similar surface morphology.
no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou
Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou
We also find in experiments that our model is less dependent on sentence positions.
no code implementations • EMNLP 2020 • Mengyun Chen, Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC).
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang
Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense.
1 code implementation • EMNLP 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang
Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation.
4 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.
Ranked #16 on Zero-Shot Cross-Lingual Transfer on XTREME
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei
In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).
2 code implementations • COLING 2020 • Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.
1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu
Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.
Ranked #189 on Question Answering on SQuAD1.1
1 code implementation • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.