TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	TrecQA	TANDA DeBERTa-V3-Large + ALL	MAP	0.954	# 1
Question Answering	TrecQA	TANDA DeBERTa-V3-Large + ALL	MRR	0.984	# 2
Question Answering	WikiQA	TANDA-DeBERTa-V3-Large + ALL	MAP	0.927	# 1
Question Answering	WikiQA	TANDA-DeBERTa-V3-Large + ALL	MRR	0.939	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structural-self-supervised-objectives-for/question-answering-on-trecqa)](https://paperswithcode.com/sota/question-answering-on-trecqa?p=structural-self-supervised-objectives-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structural-self-supervised-objectives-for/question-answering-on-wikiqa)](https://paperswithcode.com/sota/question-answering-on-wikiqa?p=structural-self-supervised-objectives-for)`

Structural Self-Supervised Objectives for Transformers

15 Sep 2023 · Luca Di Liello ·

This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swapping instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications, reducing the need for labeled data. We use large corpora like Wikipedia and CC-News to train models to recognize if text spans originate from the same paragraph or document in several ways. By doing continuous pre-training, starting from existing models like RoBERTa, ELECTRA, DeBERTa, BART, and T5, we demonstrate significant performance improvements in tasks like Fact Verification, Answer Sentence Selection, and Summarization. These improvements are especially pronounced when limited annotation data is available. The proposed objectives also achieve state-of-the-art results on various benchmark datasets, including FEVER (dev set), ASNQ, WikiQA, and TREC-QA, as well as enhancing the quality of summaries. Importantly, these techniques can be easily integrated with other methods without altering the internal structure of Transformer models, making them versatile for various NLP applications.

PDF Abstract

Code

Add Remove Mark official

lucadiliello/transformers-framework official

Tasks

Add Remove

Fact Verification

Language Modelling

Masked Language Modeling

Question Answering

Sentence

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

Natural Questions

MRPC

TriviaQA

CoLA

FEVER

SuperGLUE

WebText

BookCorpus

The Pile

NewsQA

WikiQA

MRQA

TrecQA ASNQ

Results from the Paper

Edit

Ranked #1 on Question Answering on TrecQA (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	TrecQA	TANDA DeBERTa-V3-Large + ALL	MAP	0.954	# 1	Compare
Question Answering	TrecQA	TANDA DeBERTa-V3-Large + ALL	MRR	0.984	# 2	Compare
Question Answering	WikiQA	TANDA-DeBERTa-V3-Large + ALL	MAP	0.927	# 1	Compare
Question Answering	WikiQA	TANDA-DeBERTa-V3-Large + ALL	MRR	0.939	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adafactor • Adam • ALIGN • Attention Dropout • BART • BERT • BPE • DeBERTa • Dense Connections • Disentangled Attention Mechanism • Dropout • ELECTRA • GELU • GLU • Inverse Square Root Schedule • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • RoBERTa • Scaled Dot-Product Attention • SentencePiece • Softmax • T5 • Transformer • Weight Decay • WordPiece

Edit Social Preview

Structural Self-Supervised Objectives for Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove