Search Results for author: Furu Wei

Found 279 papers, 144 papers with code

Paper
Add Code

Joint Inference of Named Entity Recognition and Normalization for Tweets

no code implementations • ACL 2012 • Xiaohua Liu, Ming Zhou, Xiangyang Zhou, Zhongyang Fu, Furu Wei

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Cross-Lingual Mixture Model for Sentiment Classification

no code implementations • ACL 2012 • Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang

Classification General Classification +4

Paper
Add Code

Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality

no code implementations • COLING 2012 • Yajuan Duan, Zhumin Chen, Furu Wei, Ming Zhou, Heung-Yeung Shum

Document Summarization Extractive Text Summarization

Paper
Add Code

Graph-Based Multi-Tweet Summarization using Social Signals

no code implementations • COLING 2012 • Xiaohua Liu, Yitong Li, Furu Wei, Ming Zhou

Paper
Add Code

Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation

no code implementations • COLING 2012 • Xinfan Meng, Furu Wei, Ge Xu, Longkai Zhang, Xiaohua Liu, Ming Zhou, Houfeng Wang

Machine Translation Sentiment Analysis +1

Paper
Add Code

Entity Linking for Tweets

no code implementations • ACL 2013 • Xiaohua Liu, Yitong Li, Haocheng Wu, Ming Zhou, Furu Wei, Yi Lu

Entity Linking Entity Resolution +2

Paper
Add Code

A Statistical Parsing Framework for Sentiment Classification

no code implementations • CL 2015 • Li Dong, Furu Wei, Shujie Liu, Ming Zhou, Ke Xu

Unlike previous works that employ syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence.

Classification General Classification +4

Paper
Add Code

Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification

no code implementations • ACL 2014 • Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, Ke Xu

Classification Dependency Parsing +4

Paper
Add Code

Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

1 code implementation • ACL 2014 • Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Classification Feature Engineering +4

Paper
Code

Coooolll: A Deep Learning System for Twitter Sentiment Classification

no code implementations • SEMEVAL 2014 • Duyu Tang, Furu Wei, Bing Qin, Ting Liu, Ming Zhou

Classification General Classification +3

Paper
Add Code

Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach

no code implementations • COLING 2014 • Duyu Tang, Furu Wei, Bing Qin, Ming Zhou, Ting Liu

Information Retrieval Representation Learning +1

Paper
Add Code

A Joint Segmentation and Classification Framework for Sentiment Analysis

no code implementations • EMNLP 2014 • Duyu Tang, Furu Wei, Bing Qin, Li Dong, Ting Liu, Ming Zhou

Classification General Classification +3

Paper
Add Code

Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation

no code implementations • CL 2015 • Dehong Gao, Furu Wei, Wenjie Li, Xiaohua Liu, Ming Zhou

Sentiment Analysis Word Alignment

Paper
Add Code

Splusplus: A Feature-Rich Two-stage Classifier for Sentiment Analysis of Tweets

no code implementations • SEMEVAL 2015 • Li Dong, Furu Wei, Yichun Yin, Ming Zhou, Ke Xu

Sentiment Analysis

Paper
Add Code

Question Answering over Freebase with Multi-Column Convolutional Neural Networks

no code implementations • IJCNLP 2015 • Li Dong, Furu Wei, Ming Zhou, Ke Xu

Multi-Task Learning Question Answering +2

Paper
Add Code

Learning Summary Prior Representation for Extractive Summarization

no code implementations • IJCNLP 2015 • Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, Houfeng Wang

Extractive Summarization Feature Engineering

Paper
Add Code

Multi-Document Summarization via Discriminative Summary Reranking

no code implementations • 8 Jul 2015 • Xiaojun Wan, Ziqiang Cao, Furu Wei, Sujian Li, Ming Zhou

However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and even a summarization model with good overall performance may produce low-quality summaries for some document sets.

Document Summarization Multi-Document Summarization +1

Paper
Add Code

A Dependency-Based Neural Network for Relation Classification

no code implementations • IJCNLP 2015 • Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, Houfeng Wang

Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees.

Ranked #5 on Relation Classification on SemEval 2010 Task 8

Classification General Classification +2

Paper
Add Code

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

no code implementations • 26 Nov 2015 • Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, Ming Zhou

Both informativeness and readability of the collected summaries are verified by manual judgment.

Document Summarization Informativeness +2

Paper
Add Code

AttSum: Joint Learning of Focusing and Summarization with Neural Attention

no code implementations • COLING 2016 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei, Yan-ran Li

Query relevance ranking and sentence saliency ranking are the two main tasks in extractive query-focused summarization.

Query-focused Summarization Saliency Ranking +1

Paper
Add Code

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

no code implementations • 25 May 2016 • Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou

In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.

Term Extraction

Paper
Add Code

Solving and Generating Chinese Character Riddles

no code implementations • EMNLP 2016 • Chuanqi Tan, Furu Wei, Li Dong, Weifeng Lv, Ming Zhou

Paper
Add Code

Improving Multi-Document Summarization via Text Classification

no code implementations • 28 Nov 2016 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents.

Document Summarization General Classification +3

Paper
Add Code

A Redundancy-Aware Sentence Regression Framework for Extractive Summarization

no code implementations • COLING 2016 • Pengjie Ren, Furu Wei, Zhumin Chen, Jun Ma, Ming Zhou

Existing sentence regression methods for extractive summarization usually model sentence importance and redundancy in two separate processes.

Document Summarization Extractive Summarization +3

Paper
Add Code

Learning to Generate Product Reviews from Attributes

no code implementations • EACL 2017 • Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu

This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.

Attribute Review Generation +2

Paper
Add Code

Neural Question Generation from Text: A Preliminary Study

6 code implementations • 6 Apr 2017 • Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, Ming Zhou

Automatic question generation aims to generate questions from a text passage where the generated questions can be answered by certain sub-spans of the given passage.

Ranked #13 on Question Generation on SQuAD1.1

Position Question Generation +2

142

Paper
Code

Entity Linking for Queries by Searching Wikipedia Sentences

no code implementations • EMNLP 2017 • Chuanqi Tan, Furu Wei, Pengjie Ren, Weifeng Lv, Ming Zhou

The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query.

Entity Linking Word Embeddings

Paper
Add Code

Selective Encoding for Abstractive Sentence Summarization

2 code implementations • ACL 2017 • Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou

We propose a selective encoding model to extend the sequence-to-sequence framework for abstractive sentence summarization.

Ranked #8 on Text Summarization on DUC 2004 Task 1

Sentence Sentence Summarization

Paper
Code

Reinforced Mnemonic Reader for Machine Reading Comprehension

3 code implementations • 8 May 2017 • Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, Ming Zhou

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects.

Ranked #17 on Question Answering on SQuAD1.1 dev

Machine Reading Comprehension Question Answering +2

135

Paper
Code

S-Net: From Answer Extraction to Answer Generation for Machine Reading Comprehension

no code implementations • 15 Jun 2017 • Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, Ming Zhou

We build the answer extraction model with state-of-the-art neural networks for single passage reading comprehension, and propose an additional task of passage ranking to help answer extraction in multiple passages.

Answer Generation Machine Reading Comprehension +1

Paper
Add Code

Gated Self-Matching Networks for Reading Comprehension and Question Answering

no code implementations • ACL 2017 • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, Ming Zhou

We first match the question and passage with gated attention-based recurrent networks to obtain the question-aware passage representation.

Ranked #35 on Question Answering on SQuAD1.1 dev

Question Answering Reading Comprehension

Paper
Add Code

SuperAgent: A Customer Service Chatbot for E-commerce Websites

no code implementations • ACL 2017 • Lei Cui, Shaohan Huang, Furu Wei, Chuanqi Tan, Chaoqun Duan, Ming Zhou

Chatbot Opinion Mining +1

Paper
Add Code

Faithful to the Original: Fact Aware Neural Abstractive Summarization

no code implementations • 13 Nov 2017 • Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li

While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system.

Ranked #22 on Text Summarization on GigaWord

Abstractive Text Summarization Extractive Summarization +3

Paper
Add Code

EventWiki: A Knowledge Base of Major Events

no code implementations • LREC 2018 • Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou

Question Answering Semantic Parsing

Paper
Add Code

Improv Chat: Second Response Generation for Chatbot

no code implementations • 10 May 2018 • Furu Wei

Existing research on response generation for chatbot focuses on \textbf{First Response Generation} which aims to teach the chatbot to say the first response (e. g. a sentence) appropriate to the conversation context (e. g. the user's query).

Chatbot Response Generation +2

Paper
Add Code

Neural Open Information Extraction

no code implementations • ACL 2018 • Lei Cui, Furu Wei, Ming Zhou

Conventional Open Information Extraction (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation.

Computational Efficiency Open Information Extraction

Paper
Add Code

Response Generation by Context-aware Prototype Editing

3 code implementations • 19 Jun 2018 • Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou

Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses.

Informativeness Response Generation +1

Paper
Code

Dictionary-Guided Editing Networks for Paraphrase Generation

no code implementations • 21 Jun 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct.

Paraphrase Generation Sentence

Paper
Add Code

Multiway Attention Networks for Modeling Sentence Pairs

1 code implementation • IJCAI 2018 • Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou

Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection.

Ranked #11 on Paraphrase Identification on Quora Question Pairs (Accuracy metric)

Natural Language Inference Paraphrase Identification +1

Paper
Code

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization

no code implementations • ACL 2018 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei

Most previous seq2seq summarization systems purely depend on the source text to generate summaries, which tends to work unstably.

Ranked #23 on Text Summarization on GigaWord

Abstractive Text Summarization Informativeness +2

Paper
Add Code

Fluency Boost Learning and Inference for Neural Grammatical Error Correction

no code implementations • ACL 2018 • Tao Ge, Furu Wei, Ming Zhou

Most of the neural sequence-to-sequence (seq2seq) models for grammatical error correction (GEC) have two limitations: (1) a seq2seq model may not be well generalized with only limited error-corrected data; (2) a seq2seq model may fail to completely correct a sentence with multiple errors through normal seq2seq inference.

Grammatical Error Correction Sentence

Paper
Add Code

Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

1 code implementation • 3 Jul 2018 • Tao Ge, Furu Wei, Ming Zhou

Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC).

Ranked #1 on Grammatical Error Correction on Unrestricted

Grammatical Error Correction Sentence

Paper
Code

Neural Document Summarization by Jointly Learning to Score and Select Sentences

1 code implementation • ACL 2018 • Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao

In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences.

Ranked #9 on Extractive Text Summarization on CNN / Daily Mail

Document Summarization Extractive Document Summarization +3

150

Paper
Code

Sequential Copying Networks

1 code implementation • 6 Jul 2018 • Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou

Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation.

Question Generation Question-Generation +3

Paper
Code

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

no code implementations • 17 Aug 2018 • Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li

Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred.

Ranked #11 on Question Answering on SQuAD2.0 dev

Machine Reading Comprehension Question Answering

Paper
Add Code

Neural Latent Extractive Document Summarization

no code implementations • EMNLP 2018 • Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou

Extractive summarization models require sentence-level labels, which are usually created heuristically (e. g., with rule-based methods) given that most summarization datasets only have document-summary pairs.

Ranked #11 on Extractive Text Summarization on CNN / Daily Mail

Document Summarization Extractive Document Summarization +3

Paper
Add Code

Attention-Guided Answer Distillation for Machine Reading Comprehension

no code implementations • EMNLP 2018 • Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou

Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models.

Knowledge Distillation Machine Reading Comprehension

Paper
Add Code

Neural Melody Composition from Lyrics

no code implementations • 12 Sep 2018 • Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou

In this paper, we study a novel task that learns to compose music from natural language.

Paper
Add Code

Retrieval-Enhanced Adversarial Training for Neural Response Generation

no code implementations • ACL 2019 • Qingfu Zhu, Lei Cui, Wei-Nan Zhang, Furu Wei, Ting Liu

Dialogue systems are usually built on either generation-based or retrieval-based approaches, yet they do not benefit from the advantages of different models.

Response Generation Retrieval

Paper
Add Code

Unsupervised Machine Commenting with Neural Variational Topic Model

no code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Furu Wei, Xu sun

To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments.

Retrieval

Paper
Add Code

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

3 code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Damai Dai, Furu Wei, Xu sun

We introduce the task of automatic live commenting.

Retrieval

124

Paper
Code

Text Morphing

no code implementations • 30 Sep 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences.

Sentence Text Generation

Paper
Add Code

Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition

no code implementations • EMNLP 2018 • Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou

This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm.

Decipherment Information Retrieval

Paper
Add Code

TableBank: A Benchmark Dataset for Table Detection and Recognition

2 code implementations • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.

Table Detection

966

Paper
Code

Formality Style Transfer with Hybrid Textual Annotations

no code implementations • 15 Mar 2019 • Ruochen Xu, Tao Ge, Furu Wei

Its challenge is the lack of large-scale sentence-aligned parallel data.

Formality Style Transfer Sentence +2

Paper
Add Code

Unified Language Model Pre-training for Natural Language Understanding and Generation

9 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +7

18,327

Paper
Code

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

no code implementations • ACL 2019 • Xingxing Zhang, Furu Wei, Ming Zhou

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods.

Ranked #7 on Extractive Text Summarization on CNN / Daily Mail

Document Summarization Extractive Summarization +2

Paper
Add Code

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

no code implementations • ACL 2019 • Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu

We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.

Data Augmentation Machine Reading Comprehension +2

Paper
Add Code

Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study

no code implementations • ACL 2019 • Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou

Sequence-to-sequence (seq2seq) models have achieved tremendous success in text generation tasks.

Formality Style Transfer Grammatical Error Correction +6

Paper
Add Code

BERT-based Lexical Substitution

1 code implementation • ACL 2019 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Our approach first applies dropout to the target word{'}s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word{'}s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution{'}s influence on the global contextualized representation of the sentence.

Sentence

Paper
Code

Visualizing and Understanding the Effectiveness of BERT

no code implementations • IJCNLP 2019 • Yaru Hao, Li Dong, Furu Wei, Ke Xu

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.

Language Modelling

Paper
Add Code

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

3 code implementations • ICLR 2020 • Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).

Ranked #1 on Visual Question Answering (VQA) on VCR (Q-A) dev

Image-text matching Language Modelling +6

732

Paper
Code

Sequence-to-sequence Pre-training with Data Augmentation for Sentence Rewriting

no code implementations • 13 Sep 2019 • Yi Zhang, Tao Ge, Furu Wei, Ming Zhou, Xu sun

We study sequence-to-sequence (seq2seq) pre-training with data augmentation for sentence rewriting.

Data Augmentation Formality Style Transfer +4

Paper
Add Code

Cross-Lingual Natural Language Generation via Pre-Training

1 code implementation • 23 Sep 2019 • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang

In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.

Abstractive Text Summarization Machine Translation +5

128

Paper
Code

Video Dialog via Progressive Inference and Cross-Transformer

no code implementations • IJCNLP 2019 • Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history.

Answer Generation Question Answering +4

Paper
Add Code

Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension

no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou

Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.

Machine Reading Comprehension

Paper
Add Code

Improving Grammatical Error Correction with Machine Translation Pairs

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou

The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations.

Grammatical Error Correction Language Modelling +3

Paper
Code

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

no code implementations • 8 Nov 2019 • Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu

The limited size of existing query-focused summarization datasets renders training data-driven summarization models challenging.

Data Augmentation Query-focused Summarization

Paper
Add Code

Can Monolingual Pretrained Models Help Cross-Lingual Classification?

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Zewen Chi, Li Dong, Furu Wei, Xian-Ling Mao, He-Yan Huang

Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer.

Classification Cross-Lingual Transfer +1

Paper
Add Code

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

15 code implementations • 31 Dec 2019 • Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Ranked #7 on Relation Extraction on FUNSD

Document AI Document Image Classification +3

124,984

Paper
Code

Fact-aware Sentence Split and Rephrase with Permutation Invariant Training

no code implementations • 16 Jan 2020 • Yinuo Guo, Tao Ge, Furu Wei

To overcome the challenges, we first propose the Fact-aware Sentence Encoding, which enables the model to learn facts from the long sentence and thus improves the precision of sentence split; then we introduce Permutation Invariant Training to alleviate the effects of order variance in seq2seq learning for this task.

Sentence Split and Rephrase

Paper
Add Code

Self-Adversarial Learning with Comparative Discrimination for Text Generation

no code implementations • ICLR 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples.

Sentence Text Generation

Paper
Add Code

Multimodal Matching Transformer for Live Commenting

no code implementations • 7 Feb 2020 • Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao

In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities.

Text Generation

Paper
Add Code

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

1 code implementation • EMNLP 2020 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou

Our approach first divides the original BERT into several modules and builds their compact substitutes.

Knowledge Distillation Model Compression

308

Paper
Code

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

1 code implementation • NeurIPS 2020 • Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou

The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).

Ranked #8 on Zero-shot Text Search on BEIR

Zero-shot Text Search

18,327

Paper
Code

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

3 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Language Modelling +3

18,327

Paper
Code

Pre-training for Abstractive Document Summarization by Reinstating Source Text

no code implementations • EMNLP 2020 • Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, Ming Zhou

The main idea is that, given an input text artificially constructed from a document, a model is pre-trained to reinstate the original document.

Abstractive Text Summarization Document Summarization +1

Paper
Add Code

Learning to Summarize Passages: Mining Passage-Summary Pairs from Wikipedia Revision Histories

no code implementations • 6 Apr 2020 • Qingyu Zhou, Furu Wei, Ming Zhou

In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories.

Paper
Add Code

At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization

no code implementations • COLING 2020 • Qingyu Zhou, Furu Wei, Ming Zhou

In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative.

Constituency Parsing Document Summarization +4

Paper
Add Code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

1,202

Paper
Code

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

2 code implementations • 23 Apr 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu

The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.

Attribute

Paper
Code

Scheduled DropHead: A Regularization Method for Transformer Models

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks.

Machine Translation text-classification +2

Paper
Code

TableBank: Table Benchmark for Image-based Table Detection and Recognition

1 code implementation • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.

Table Detection

966

Paper
Code

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu

Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.

Ranked #189 on Question Answering on SQuAD1.1

Few-Shot Learning Question Answering

Paper
Code

DocBank: A Benchmark Dataset for Document Layout Analysis

2 code implementations • COLING 2020 • Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou

DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.

Document Layout Analysis

520

Paper
Code

BERT Loses Patience: Fast and Robust Inference with Early Exit

1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).

Language Modelling

Paper
Code

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

4 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.

Ranked #16 on Zero-Shot Cross-Lingual Transfer on XTREME

Contrastive Learning Cross-Lingual Transfer +2

18,326

Paper
Code

Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph

1 code implementation • EMNLP 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang

Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation.

Text Generation

127

Paper
Code

Generating Commonsense Explanation by Extracting Bridge Concepts from Reasoning Paths

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang

Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense.

Explanation Generation

Paper
Add Code

Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

no code implementations • EMNLP 2020 • Mengyun Chen, Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou

We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC).

Grammatical Error Correction Sentence

Paper
Add Code

Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou

We also find in experiments that our model is less dependent on sentence positions.

Document Summarization Extractive Document Summarization +4

Paper
Code

Unsupervised Fine-tuning for Text Clustering

no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou

Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).

Clustering text-classification +2

Paper
Add Code

Investigating Learning Dynamics of BERT Fine-Tuning

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu

The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.

Language Modelling

Paper
Add Code

UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Canwen Xu, Tao Ge, Chenliang Li, Furu Wei

Chinese and Japanese share many characters with similar surface morphology.

Language Modelling

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Ranked #1 on Key Information Extraction on SROIE

Document Image Classification Document Layout Analysis +6

124,984

Paper
Code

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Relation XLM-R

11,419

Paper
Code

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

Multilingual machine translation enables a single model to translate between different languages.

Language Modelling Machine Translation +2

Paper
Add Code

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

1 code implementation • EMNLP 2021 • Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei

In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.

Sentence Text Infilling

Paper
Code

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.

Multi-Task Learning Representation Learning +3

389

Paper
Code

Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

1 code implementation • NAACL 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei

Cant is important for understanding advertising, comedies and dog-whistle politics.

Common Sense Reasoning World Knowledge

Paper
Code

MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

1 code implementation • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.

Abstractive Text Summarization Machine Translation +7

128

Paper
Code

Knowledge Neurons in Pretrained Transformers

3 code implementations • ACL 2022 • Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei

In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.

489

Paper
Code

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.

Machine Translation NMT +2

Paper
Code

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Ranked #13 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding

124,984

Paper
Code

Memory-Efficient Differentiable Transformer Architecture Search

no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen

To this end, we propose a multi-split reversible network and combine it with DARTS.

Paper
Add Code

Attention Temperature Matters in Abstractive Summarization Distillation

1 code implementation • ACL 2022 • Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei

In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models.

Abstractive Text Summarization

Paper
Code

Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding

1 code implementation • ACL 2021 • Xin Sun, Tao Ge, Furu Wei, Houfeng Wang

In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC).

Grammatical Error Correction

Paper
Code

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei

Video transcript summarization is a fundamental task for video understanding.

Segmentation Text Summarization +1

Paper
Code

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

1 code implementation • ACL 2021 • Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei

The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences.

Denoising Language Modelling +4

Paper
Code

Consistency Regularization for Cross-Lingual Fine-Tuning

1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.

Machine Translation Question Answering +3

Paper
Code

BEiT: BERT Pre-Training of Image Transformers

11 code implementations • ICLR 2022 • Hangbo Bao, Li Dong, Songhao Piao, Furu Wei

We first "tokenize" the original image into visual tokens.

Ranked #10 on Document Layout Analysis on PubLayNet val

Document Image Classification Document Layout Analysis +2

124,984

Paper
Code

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Machine Translation +5

18,329

Paper
Code

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei

In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.

Knowledge Distillation

Paper
Add Code

Learning to Sample Replacements for ELECTRA Pre-Training

no code implementations • Findings (ACL) 2021 • Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei

Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.

Language Modelling Masked Language Modeling

Paper
Add Code

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.

Ranked #1 on Zero-Shot Cross-Lingual Transfer on XTREME

Language Modelling Translation +1

18,327

Paper
Code

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset

no code implementations • 12 Jul 2021 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei

Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

Multilingual Agreement for Multilingual Neural Machine Translation

no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei

Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.

Machine Translation Translation

Paper
Add Code

SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation

no code implementations • ACL 2021 • Shuo Ren, Long Zhou, Shujie Liu, Furu Wei, Ming Zhou, Shuai Ma

While pre-training techniques are working very well in natural language processing, how to pre-train a decoder and effectively use it for neural machine translation (NMT) still remains a tricky issue.

Machine Translation NMT +1

Paper
Add Code

xMoCo: Cross Momentum Contrastive Learning for Open-Domain Question Answering

no code implementations • ACL 2021 • Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, Linjun Yang

Dense passage retrieval has been shown to be an effective approach for information retrieval tasks such as open domain question answering.

Contrastive Learning Open-Domain Question Answering +2

Paper
Add Code

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

1 code implementation • EMNLP 2021 • Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei

Reading order detection is the cornerstone to understanding visually-rich documents (e. g., receipts and forms).

Ranked #2 on Reading Order Detection on ReadingBank

Document Layout Analysis Optical Character Recognition (OCR) +1

18,327

Paper
Code

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

1 code implementation • EMNLP 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei

Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.

Knowledge Distillation Quantization

Paper
Code

Sequence Level Contrastive Learning for Text Summarization

no code implementations • 8 Sep 2021 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei

In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training.

Abstractive Text Summarization Contrastive Learning +2

Paper
Add Code

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

2 code implementations • EMNLP 2021 • Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.

Language Modelling

614

Paper
Code

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Text recognition is a long-standing research problem for document digitalization.

Ranked #3 on Handwritten Text Recognition on IAM

Handwritten Text Recognition Language Modelling +4

124,984

Paper
Code

Jointly Learning to Repair Code and Generate Commit Message

no code implementations • EMNLP 2021 • Jiaqi Bai, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming Zhou, Zhoujun Li

We propose a novel task of jointly repairing program codes and generating commit messages.

Code Repair Translation

Paper
Add Code

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu

We integrate the proposed methods into the HuBERT framework.

Data Augmentation Multi-Task Learning +5

389

Paper
Code

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

3 code implementations • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

124,984

Paper
Code

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Code

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

2 code implementations • 16 Oct 2021 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.

document understanding

124,984

Paper
Code

Improving Non-autoregressive Generation with Mixup Training

1 code implementation • 21 Oct 2021 • Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang

While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge.

Natural Language Understanding Paraphrase Generation +2

Paper
Code

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning

1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei

Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.

Abstractive Text Summarization Question Generation +2

18,327

Paper
Code

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Denoising Self-Supervised Learning +3

18,327

Paper
Code

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei

Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.

Speech Separation

Paper
Add Code

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

2 code implementations • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.

Ranked #2 on Image Retrieval on PhotoChat

Image Retrieval Retrieval +3

18,328

Paper
Code

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.

Machine Translation Translation

Paper
Add Code

Document AI: Benchmarks, Models and Applications

no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.

Document AI Document Image Classification +3

Paper
Add Code

Swin Transformer V2: Scaling Up Capacity and Resolution

19 code implementations • CVPR 2022 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Ranked #4 on Image Classification on ImageNet V2 (using extra training data)

Action Classification Image Classification +3

29,758

Paper
Code

Distilled Dual-Encoder Model for Vision-Language Understanding

2 code implementations • 16 Dec 2021 • Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei

We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.

Question Answering Visual Entailment +2

Paper
Code

SMDT: Selective Memory-Augmented Neural Document Translation

no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei

Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.

Document Level Machine Translation Document Translation +4

Paper
Add Code

PromptBERT: Improving BERT Sentence Embeddings with Prompts

1 code implementation • 12 Jan 2022 • Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, Qi Zhang

We propose PromptBERT, a novel contrastive learning method for learning better sentence representation.

Contrastive Learning Denoising +6

316

Paper
Code

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

1 code implementation • 15 Jan 2022 • Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang

In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.

Language Modelling Question Answering

Paper
Code

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

no code implementations • 26 Jan 2022 • Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.

Grammatical Error Correction Language Modelling +3

Paper
Add Code

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.

Image Classification Semantic Segmentation

Paper
Add Code

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

1 code implementation • 16 Feb 2022 • Tao Ge, Si-Qing Chen, Furu Wei

We introduce EdgeFormer -- a parameter-efficient Transformer for on-device seq2seq generation under the strict computation and memory constraints.

Grammatical Error Correction Knowledge Distillation +2

18,327

Paper
Code

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

no code implementations • 17 Feb 2022 • Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao

With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.

Language Modelling

Paper
Add Code

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt

1 code implementation • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang

To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.

Zero-Shot Cross-Lingual Transfer

Paper
Code

Controllable Natural Language Generation with Contrastive Prefixes

no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen

We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.

Attribute Language Modelling +1

Paper
Add Code

DeepNet: Scaling Transformers to 1,000 Layers

6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.

Translation

48,096

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Ranked #1 on Table Detection on ICDAR 2019

Document AI Document Image Classification +4

124,984

Paper
Code

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment

no code implementations • ACL 2022 • Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei

We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.

Question Answering Visual Entailment +1

Paper
Add Code

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

2 code implementations • 30 Mar 2022 • Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding.

Abstractive Text Summarization Machine Translation +1

Paper
Code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei

In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,018

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

2 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Ranked #1 on Key Information Extraction on EPHOIE

Document AI Document Image Classification +10

124,984

Paper
Code

StableMoE: Stable Routing Strategy for Mixture of Experts

1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei

We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.

Language Modelling Machine Translation

Paper
Code

On the Representation Collapse of Sparse Mixture of Experts

2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

We also present a comprehensive analysis on the representation and routing behaviors of our models.

Clustering Language Modelling

18,327

Paper
Code

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Paper
Add Code

Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization

no code implementations • ACL 2022 • Ruipeng Jia, Xingxing Zhang, Yanan Cao, Shi Wang, Zheng Lin, Furu Wei

In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages.

Extractive Summarization Extractive Text Summarization +1

Paper
Add Code

Prototypical Calibration for Few-shot Learning of Language Models

1 code implementation • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei

In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.

Few-Shot Learning In-Context Learning

Paper
Code

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

2 code implementations • 20 May 2022 • Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei

We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding.

Abstractive Text Summarization Grammatical Error Correction +4

18,327

Paper
Code

Visually-Augmented Language Modeling

1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.

Image Retrieval Language Modelling +1

Paper
Code

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

no code implementations • Findings (ACL) 2022 • Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei

As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).

Privacy Preserving

Paper
Add Code

Task-Specific Expert Pruning for Sparse Mixture-of-Experts

no code implementations • 1 Jun 2022 • Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei

The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity.

Paper
Add Code

VL-BEiT: Generative Vision-Language Pretraining

no code implementations • 2 Jun 2022 • Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei

Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.

Image Classification Language Modelling +7

Paper
Add Code

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

1 code implementation • 12 Jun 2022 • Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, Jinyu Li

The YiTrans system is built on large-scale pre-trained encoder-decoder models.

Data Augmentation Translation

1,018

Paper
Code

Language Models are General-Purpose Interfaces

1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Ranked #2 on Image Captioning on nocaps val

Causal Language Modeling Few-Shot Learning +6

18,327

Paper
Code

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

no code implementations • 21 Jun 2022 • Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

1 code implementation • 6 Jul 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.

Language Modelling Passage Retrieval +1

18,327

Paper
Code

UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation

1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, Furu Wei

Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable.

Machine Translation NMT +1

Paper
Code

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Zhoujun Li, Furu Wei

Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages.

Machine Translation Translation

Paper
Code

MoEC: Mixture of Expert Clusters

no code implementations • 19 Jul 2022 • Yuan Xie, Shaohan Huang, Tianyu Chen, Furu Wei

Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.

Machine Translation Natural Language Understanding

Paper
Add Code

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation

1 code implementation • 29 Jul 2022 • Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li

Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.

Machine Translation Translation

Paper
Code

Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval

1 code implementation • 8 Aug 2022 • Zehan Li, Nan Yang, Liang Wang, Furu Wei

In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions.

Retrieval

Paper
Code

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

2 code implementations • 12 Aug 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.

Ranked #27 on Self-Supervised Image Classification on ImageNet

Knowledge Distillation Representation Learning +2

18,328

Paper
Code

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Ranked #1 on Visual Reasoning on NLVR2 Test

Cross-Modal Retrieval Image Captioning +11

18,329

Paper
Code

Revamping Multilingual Agreement Bidirectionally via Switched Back-translation for Multilingual Neural Machine Translation

no code implementations • 28 Sep 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam

Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance.

Document Level Machine Translation Document Translation +2

Paper
Add Code

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

1,018

Paper
Code

XDoc: Unified Pre-training for Cross-Format Document Understanding

1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

The surge of pre-training has witnessed the rapid development of document understanding recently.

Ranked #7 on Semantic entity labeling on FUNSD

document understanding Semantic entity labeling

18,328

Paper
Code

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

1 code implementation • 7 Oct 2022 • Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, LiRong Dai, Jinyu Li, Furu Wei

The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

1,018

Paper
Code

Foundation Transformers

4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

A big convergence of model architectures across language, vision, speech, and multimodal is emerging.

Language Modelling Machine Translation +1

18,327

Paper
Code

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei

Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.

Cross-Lingual NER Machine Translation +5

Paper
Code

Non-Contrastive Learning Meets Language-Image Pre-Training

no code implementations • CVPR 2023 • Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei

Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align images and texts.

Contrastive Learning domain classification +3

Paper
Add Code

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

no code implementations • 19 Oct 2022 • Hongcheng Guo, Jiaheng Liu, Haoyang Huang, Jian Yang, Zhoujun Li, Dongdong Zhang, Zheng Cui, Furu Wei

To this end, we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.

Multimodal Machine Translation Translation

Paper
Add Code

A Unified View of Masked Image Modeling

1 code implementation • 19 Oct 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.

Image Classification Segmentation +1

Paper
Code

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

no code implementations • 26 Oct 2022 • Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.

Representation Learning

Paper
Add Code

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei

However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.

Speech-to-Speech Translation Translation

1,018

Paper
Code

Latent Prompt Tuning for Text Summarization

no code implementations • 3 Nov 2022 • Yubo Zhang, Xingxing Zhang, Xun Wang, Si-Qing Chen, Furu Wei

In this paper, we propose Lotus (shorthand for Latent Prompt Tuning for Summarization), which is a single model that can be applied in both controlled and uncontrolled (without control signals) modes.

Contrastive Learning Text Summarization

Paper
Add Code

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

no code implementations • 21 Nov 2022 • Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, LiRong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e. g., vision, text.

Audio-Visual Speech Recognition Language Modelling +3

Paper
Add Code

TorchScale: Transformers at Scale

1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

Large Transformers have achieved state-of-the-art performance across many tasks.

Language Modelling Machine Translation +1

2,918

Paper
Code

Extensible Prompts for Language Models on Zero-shot Language Style Customization

no code implementations • NeurIPS 2023 • Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei

We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL).

Descriptive Language Modelling +1

Paper
Add Code

Text Embeddings by Weakly-Supervised Contrastive Pre-training

1 code implementation • 7 Dec 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.

Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Only Connect Walls Dataset Task 1 (Grouping) Retrieval

18,327

Paper
Code

Momentum Calibration for Text Generation

no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.

Ranked #2 on Text Summarization on SAMSum

Abstractive Text Summarization Text Generation

Paper
Add Code

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

1 code implementation • 13 Dec 2022 • Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei

Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.

In-Context Learning

3,180

Paper
Code

Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models

no code implementations • 15 Dec 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei

Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +4

Paper
Add Code

BEATs: Audio Pre-Training with Acoustic Tokenizers

2 code implementations • 18 Dec 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

Ranked #1 on Audio Classification on Balanced Audio Set

Audio Classification Self-Supervised Learning

18,328

Paper
Code

Optimizing Prompts for Text-to-Image Generation

2 code implementations • NeurIPS 2023 • Yaru Hao, Zewen Chi, Li Dong, Furu Wei

Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.

Language Modelling Prompt Engineering +2

3,180

Paper
Code

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.

Denoising Sentence +1

Paper
Code

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.

In-Context Learning Open-Ended Question Answering

3,180

Paper
Code