TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	RoBERTa+TDA	Accuracy	87.3%	# 3
Linguistic Acceptability	CoLA	RoBERTa+TDA	MCC	0.695	# 2
Linguistic Acceptability	CoLA	BERT+TDA	Accuracy	88.2%	# 2
Linguistic Acceptability	CoLA	BERT+TDA	MCC	0.726	# 1
Linguistic Acceptability	RuCoLA	Ru-RoBERTa+TDA	Accuracy	85.7	# 1
Linguistic Acceptability	RuCoLA	Ru-RoBERTa+TDA	MCC	0.594	# 1
Linguistic Acceptability	RuCoLA	Ru-BERT+TDA	Accuracy	80.1	# 2
Linguistic Acceptability	RuCoLA	Ru-BERT+TDA	MCC	0.478	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/can-bert-eat-rucola-topological-data-analysis/linguistic-acceptability-on-rucola)](https://paperswithcode.com/sota/linguistic-acceptability-on-rucola?p=can-bert-eat-rucola-topological-data-analysis)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/can-bert-eat-rucola-topological-data-analysis/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=can-bert-eat-rucola-topological-data-analysis)`

Can BERT eat RuCoLA? Topological Data Analysis to Explain

4 Apr 2023 · Irina Proskurina, Irina Piontkovskaya, Ekaterina Artemova ·

This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. Our approach uses the best practices of topological data analysis (TDA) in NLP: we construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers. We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines. We experiment with two datasets, CoLA and RuCoLA in English and Russian, typologically different languages. On top of that, we propose several black-box introspection techniques aimed at detecting changes in the attention mode of the LMs during fine-tuning, defining the LM's prediction confidences, and associating individual heads with fine-grained grammar phenomena. Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs. We release the code and the experimental results for further uptake.

PDF Abstract

Code

Add Remove Mark official

upunaprosk/la-tda official

upunaprosk/la-tda

Tasks

Add Remove

CoLA

Linguistic Acceptability

Text Classification

Topological Data Analysis

Datasets

GLUE

CoLA

RuCoLA

Results from the Paper

Edit

Ranked #1 on Linguistic Acceptability on RuCoLA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	RoBERTa+TDA	Accuracy	87.3%	# 3	Compare
Linguistic Acceptability	CoLA	RoBERTa+TDA	MCC	0.695	# 2	Compare
Linguistic Acceptability	CoLA	BERT+TDA	Accuracy	88.2%	# 2	Compare
Linguistic Acceptability	CoLA	BERT+TDA	MCC	0.726	# 1	Compare
Linguistic Acceptability	RuCoLA	Ru-RoBERTa+TDA	Accuracy	85.7	# 1	Compare
Linguistic Acceptability	RuCoLA	Ru-RoBERTa+TDA	MCC	0.594	# 1	Compare
Linguistic Acceptability	RuCoLA	Ru-BERT+TDA	Accuracy	80.1	# 2	Compare
Linguistic Acceptability	RuCoLA	Ru-BERT+TDA	MCC	0.478	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • COLA • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Can BERT eat RuCoLA? Topological Data Analysis to Explain

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove